linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/29] BPF token
@ 2024-01-03 22:20 Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 01/29] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
                   ` (29 more replies)
  0 siblings, 30 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

This patch set is a combination of three BPF token-related patch sets ([0],
[1], [2]) with fixes ([3]) to kernel-side token_fd passing APIs incorporated
into relevant patches, and necessary libbpf and BPF selftests side adjustments.

This patch set introduces an ability to delegate a subset of BPF subsystem
functionality from privileged system-wide daemon (e.g., systemd or any other
container manager) through special mount options for userns-bound BPF FS to
a *trusted* unprivileged application. Trust is the key here. This
functionality is not about allowing unconditional unprivileged BPF usage.
Establishing trust, though, is completely up to the discretion of respective
privileged application that would create and mount a BPF FS instance with
delegation enabled, as different production setups can and do achieve it
through a combination of different means (signing, LSM, code reviews, etc),
and it's undesirable and infeasible for kernel to enforce any particular way
of validating trustworthiness of particular process.

The main motivation for this work is a desire to enable containerized BPF
applications to be used together with user namespaces. This is currently
impossible, as CAP_BPF, required for BPF subsystem usage, cannot be namespaced
or sandboxed, as a general rule. E.g., tracing BPF programs, thanks to BPF
helpers like bpf_probe_read_kernel() and bpf_probe_read_user() can safely read
arbitrary memory, and it's impossible to ensure that they only read memory of
processes belonging to any given namespace. This means that it's impossible to
have a mechanically verifiable namespace-aware CAP_BPF capability, and as such
another mechanism to allow safe usage of BPF functionality is necessary.

BPF FS delegation mount options and BPF token derived from such BPF FS instance
is such a mechanism. Kernel makes no assumption about what "trusted"
constitutes in any particular case, and it's up to specific privileged
applications and their surrounding infrastructure to decide that. What kernel
provides is a set of APIs to setup and mount special BPF FS instance and
derive BPF tokens from it. BPF FS and BPF token are both bound to its owning
userns and in such a way are constrained inside intended container. Users can
then pass BPF token FD to privileged bpf() syscall commands, like BPF map
creation and BPF program loading, to perform such operations without having
init userns privileges.

This version incorporates feedback and suggestions ([4]) received on earlier
iterations of BPF token approach, and instead of allowing to create BPF tokens
directly assuming capable(CAP_SYS_ADMIN), we instead enhance BPF FS to accept
a few new delegation mount options. If these options are used and BPF FS itself
is properly created, set up, and mounted inside the user namespaced container,
user application is able to derive a BPF token object from BPF FS instance, and
pass that token to bpf() syscall. As explained in patch #3, BPF token itself
doesn't grant access to BPF functionality, but instead allows kernel to do
namespaced capabilities checks (ns_capable() vs capable()) for CAP_BPF,
CAP_PERFMON, CAP_NET_ADMIN, and CAP_SYS_ADMIN, as applicable. So it forms one
half of a puzzle and allows container managers and sys admins to have safe and
flexible configuration options: determining which containers get delegation of
BPF functionality through BPF FS, and then which applications within such
containers are allowed to perform bpf() commands, based on namespaces
capabilities.

Previous attempt at addressing this very same problem ([5]) attempted to
utilize authoritative LSM approach, but was conclusively rejected by upstream
LSM maintainers. BPF token concept is not changing anything about LSM
approach, but can be combined with LSM hooks for very fine-grained security
policy. Some ideas about making BPF token more convenient to use with LSM (in
particular custom BPF LSM programs) was briefly described in recent LSF/MM/BPF
2023 presentation ([6]). E.g., an ability to specify user-provided data
(context), which in combination with BPF LSM would allow implementing a very
dynamic and fine-granular custom security policies on top of BPF token. In the
interest of minimizing API surface area and discussions this was relegated to
follow up patches, as it's not essential to the fundamental concept of
delegatable BPF token.

It should be noted that BPF token is conceptually quite similar to the idea of
/dev/bpf device file, proposed by Song a while ago ([7]). The biggest
difference is the idea of using virtual anon_inode file to hold BPF token and
allowing multiple independent instances of them, each (potentially) with its
own set of restrictions. And also, crucially, BPF token approach is not using
any special stateful task-scoped flags. Instead, bpf() syscall accepts
token_fd parameters explicitly for each relevant BPF command. This addresses
main concerns brought up during the /dev/bpf discussion, and fits better with
overall BPF subsystem design.

Second part of this patch set adds full support for BPF token in libbpf's BPF
object high-level API. Good chunk of the changes rework libbpf feature
detection internals, which are the most affected by BPF token presence.

Besides internal refactorings, libbpf allows to pass location of BPF FS from
which BPF token should be created by libbpf. This can be done explicitly though
a new bpf_object_open_opts.bpf_token_path field. But we also add implicit BPF
token creation logic to BPF object load step, even without any explicit
involvement of the user. If the environment is setup properly, BPF token will
be created transparently and used implicitly. This allows for all existing
application to gain BPF token support by just linking with latest version of
libbpf library. No source code modifications are required.  All that under
assumption that privileged container management agent properly set up default
BPF FS instance at /sys/bpf/fs to allow BPF token creation.

libbpf adds support to override default BPF FS location for BPF token creation
through LIBBPF_BPF_TOKEN_PATH envvar knowledge. This allows admins or container
managers to mount BPF token-enabled BPF FS at non-standard location without the
need to coordinate with applications.  LIBBPF_BPF_TOKEN_PATH can also be used
to disable BPF token implicit creation by setting it to an empty value.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=805707&state=*
  [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810260&state=*
  [2] https://patchwork.kernel.org/project/netdevbpf/list/?series=809800&state=*
  [3] https://patchwork.kernel.org/project/netdevbpf/patch/20231219053150.336991-1-andrii@kernel.org/
  [4] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
  [5] https://lore.kernel.org/bpf/20230412043300.360803-1-andrii@kernel.org/
  [6] http://vger.kernel.org/bpfconf2023_material/Trusted_unprivileged_BPF_LSFMM2023.pdf
  [7] https://lore.kernel.org/bpf/20190627201923.2589391-2-songliubraving@fb.com/

Andrii Nakryiko (29):
  bpf: align CAP_NET_ADMIN checks with bpf_capable() approach
  bpf: add BPF token delegation mount options to BPF FS
  bpf: introduce BPF token object
  bpf: add BPF token support to BPF_MAP_CREATE command
  bpf: add BPF token support to BPF_BTF_LOAD command
  bpf: add BPF token support to BPF_PROG_LOAD command
  bpf: take into account BPF token when fetching helper protos
  bpf: consistently use BPF token throughout BPF verifier logic
  bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks
  bpf,lsm: refactor bpf_map_alloc/bpf_map_free LSM hooks
  bpf,lsm: add BPF token LSM hooks
  libbpf: add bpf_token_create() API
  libbpf: add BPF token support to bpf_map_create() API
  libbpf: add BPF token support to bpf_btf_load() API
  libbpf: add BPF token support to bpf_prog_load() API
  selftests/bpf: add BPF token-enabled tests
  bpf,selinux: allocate bpf_security_struct per BPF token
  bpf: fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS
  bpf: support symbolic BPF FS delegation mount options
  selftests/bpf: utilize string values for delegate_xxx mount options
  libbpf: split feature detectors definitions from cached results
  libbpf: further decouple feature checking logic from bpf_object
  libbpf: move feature detection code into its own file
  libbpf: wire up token_fd into feature probing logic
  libbpf: wire up BPF token support at BPF object level
  selftests/bpf: add BPF object loading tests with explicit token
    passing
  selftests/bpf: add tests for BPF object load with implicit token
  libbpf: support BPF token path setting through LIBBPF_BPF_TOKEN_PATH
    envvar
  selftests/bpf: add tests for LIBBPF_BPF_TOKEN_PATH envvar

 drivers/media/rc/bpf-lirc.c                   |   2 +-
 include/linux/bpf.h                           |  85 +-
 include/linux/filter.h                        |   2 +-
 include/linux/lsm_hook_defs.h                 |  15 +-
 include/linux/security.h                      |  43 +-
 include/uapi/linux/bpf.h                      |  54 +
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/arraymap.c                         |   2 +-
 kernel/bpf/bpf_lsm.c                          |  15 +-
 kernel/bpf/cgroup.c                           |   6 +-
 kernel/bpf/core.c                             |   3 +-
 kernel/bpf/helpers.c                          |   6 +-
 kernel/bpf/inode.c                            | 276 ++++-
 kernel/bpf/syscall.c                          | 228 +++-
 kernel/bpf/token.c                            | 271 +++++
 kernel/bpf/verifier.c                         |  13 +-
 kernel/trace/bpf_trace.c                      |   2 +-
 net/core/filter.c                             |  36 +-
 net/ipv4/bpf_tcp_ca.c                         |   2 +-
 net/netfilter/nf_bpf_link.c                   |   2 +-
 security/security.c                           | 101 +-
 security/selinux/hooks.c                      |  47 +-
 tools/include/uapi/linux/bpf.h                |  55 +
 tools/lib/bpf/Build                           |   2 +-
 tools/lib/bpf/bpf.c                           |  41 +-
 tools/lib/bpf/bpf.h                           |  37 +-
 tools/lib/bpf/btf.c                           |  10 +-
 tools/lib/bpf/elf.c                           |   2 -
 tools/lib/bpf/features.c                      | 503 +++++++++
 tools/lib/bpf/libbpf.c                        | 557 ++--------
 tools/lib/bpf/libbpf.h                        |  21 +-
 tools/lib/bpf/libbpf.map                      |   1 +
 tools/lib/bpf/libbpf_internal.h               |  36 +-
 tools/lib/bpf/libbpf_probes.c                 |  11 +-
 tools/lib/bpf/str_error.h                     |   3 +
 .../selftests/bpf/prog_tests/libbpf_probes.c  |   4 +
 .../selftests/bpf/prog_tests/libbpf_str.c     |   6 +
 .../testing/selftests/bpf/prog_tests/token.c  | 997 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/priv_map.c  |  13 +
 tools/testing/selftests/bpf/progs/priv_prog.c |  13 +
 40 files changed, 2884 insertions(+), 641 deletions(-)
 create mode 100644 kernel/bpf/token.c
 create mode 100644 tools/lib/bpf/features.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/token.c
 create mode 100644 tools/testing/selftests/bpf/progs/priv_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/priv_prog.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 01/29] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 02/29] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Within BPF syscall handling code CAP_NET_ADMIN checks stand out a bit
compared to CAP_BPF and CAP_PERFMON checks. For the latter, CAP_BPF or
CAP_PERFMON are checked first, but if they are not set, CAP_SYS_ADMIN
takes over and grants whatever part of BPF syscall is required.

Similar kind of checks that involve CAP_NET_ADMIN are not so consistent.
One out of four uses does follow CAP_BPF/CAP_PERFMON model: during
BPF_PROG_LOAD, if the type of BPF program is "network-related" either
CAP_NET_ADMIN or CAP_SYS_ADMIN is required to proceed.

But in three other cases CAP_NET_ADMIN is required even if CAP_SYS_ADMIN
is set:
  - when creating DEVMAP/XDKMAP/CPU_MAP maps;
  - when attaching CGROUP_SKB programs;
  - when handling BPF_PROG_QUERY command.

This patch is changing the latter three cases to follow BPF_PROG_LOAD
model, that is allowing to proceed under either CAP_NET_ADMIN or
CAP_SYS_ADMIN.

This also makes it cleaner in subsequent BPF token patches to switch
wholesomely to a generic bpf_token_capable(int cap) check, that always
falls back to CAP_SYS_ADMIN if requested capability is missing.

Cc: Jakub Kicinski <kuba@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/syscall.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1bf9805ee185..2392e1802364 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1123,6 +1123,11 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	return ret;
 }
 
+static bool bpf_net_capable(void)
+{
+	return capable(CAP_NET_ADMIN) || capable(CAP_SYS_ADMIN);
+}
+
 #define BPF_MAP_CREATE_LAST_FIELD map_extra
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
@@ -1226,7 +1231,7 @@ static int map_create(union bpf_attr *attr)
 	case BPF_MAP_TYPE_DEVMAP:
 	case BPF_MAP_TYPE_DEVMAP_HASH:
 	case BPF_MAP_TYPE_XSKMAP:
-		if (!capable(CAP_NET_ADMIN))
+		if (!bpf_net_capable())
 			return -EPERM;
 		break;
 	default:
@@ -2636,7 +2641,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	    !bpf_capable())
 		return -EPERM;
 
-	if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) && !capable(CAP_SYS_ADMIN))
+	if (is_net_admin_prog_type(type) && !bpf_net_capable())
 		return -EPERM;
 	if (is_perfmon_prog_type(type) && !perfmon_capable())
 		return -EPERM;
@@ -3788,7 +3793,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	case BPF_PROG_TYPE_SK_LOOKUP:
 		return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
 	case BPF_PROG_TYPE_CGROUP_SKB:
-		if (!capable(CAP_NET_ADMIN))
+		if (!bpf_net_capable())
 			/* cg-skb progs can be loaded by unpriv user.
 			 * check permissions at attach time.
 			 */
@@ -3991,7 +3996,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 static int bpf_prog_query(const union bpf_attr *attr,
 			  union bpf_attr __user *uattr)
 {
-	if (!capable(CAP_NET_ADMIN))
+	if (!bpf_net_capable())
 		return -EPERM;
 	if (CHECK_ATTR(BPF_PROG_QUERY))
 		return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 02/29] bpf: add BPF token delegation mount options to BPF FS
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 01/29] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 03/29] bpf: introduce BPF token object Andrii Nakryiko
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Add few new mount options to BPF FS that allow to specify that a given
BPF FS instance allows creation of BPF token (added in the next patch),
and what sort of operations are allowed under BPF token. As such, we get
4 new mount options, each is a bit mask
  - `delegate_cmds` allow to specify which bpf() syscall commands are
    allowed with BPF token derived from this BPF FS instance;
  - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
    a set of allowable BPF map types that could be created with BPF token;
  - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
    a set of allowable BPF program types that could be loaded with BPF token;
  - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
    a set of allowable BPF program attach types that could be loaded with
    BPF token; delegate_progs and delegate_attachs are meant to be used
    together, as full BPF program type is, in general, determined
    through both program type and program attach type.

Currently, these mount options accept the following forms of values:
  - a special value "any", that enables all possible values of a given
  bit set;
  - numeric value (decimal or hexadecimal, determined by kernel
  automatically) that specifies a bit mask value directly;
  - all the values for a given mount option are combined, if specified
  multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
  delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
  mask.

Ideally, more convenient (for humans) symbolic form derived from
corresponding UAPI enums would be accepted (e.g., `-o
delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
it requires a bunch of UAPI header churn, so I postponed it until this
feature lands upstream or at least there is a definite consensus that
this feature is acceptable and is going to make it, just to minimize
amount of wasted effort and not increase amount of non-essential code to
be reviewed.

Attentive reader will notice that BPF FS is now marked as
FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
user namespace as long as the process has sufficient *namespaced*
capabilities within that user namespace. But in reality we still
restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
to allow creating BPF FS context object (i.e., fsopen("bpf")) from
inside unprivileged process inside non-init userns, to capture that
userns as the owning userns. It will still be required to pass this
context object back to privileged process to instantiate and mount it.

This manipulation is important, because capturing non-init userns as the
owning userns of BPF FS instance (super block) allows to use that userns
to constraint BPF token to that userns later on (see next patch). So
creating BPF FS with delegation inside unprivileged userns will restrict
derived BPF token objects to only "work" inside that intended userns,
making it scoped to a intended "container". Also, setting these
delegation options requires capable(CAP_SYS_ADMIN), so unprivileged
process cannot set this up without involvement of a privileged process.

There is a set of selftests at the end of the patch set that simulates
this sequence of steps and validates that everything works as intended.
But careful review is requested to make sure there are no missed gaps in
the implementation and testing.

This somewhat subtle set of aspects is the result of previous
discussions ([0]) about various user namespace implications and
interactions with BPF token functionality and is necessary to contain
BPF token inside intended user namespace.

  [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/

Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h | 12 ++++++
 kernel/bpf/inode.c  | 90 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 90 insertions(+), 12 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7671530d6e4e..d3658bd959f2 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1608,6 +1608,18 @@ struct bpf_link_primer {
 	u32 id;
 };
 
+struct bpf_mount_opts {
+	kuid_t uid;
+	kgid_t gid;
+	umode_t mode;
+
+	/* BPF token-related delegation options */
+	u64 delegate_cmds;
+	u64 delegate_maps;
+	u64 delegate_progs;
+	u64 delegate_attachs;
+};
+
 struct bpf_struct_ops_value;
 struct btf_member;
 
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 41e0a55c35f5..70b748f6228c 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -20,6 +20,7 @@
 #include <linux/filter.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
+#include <linux/kstrtox.h>
 #include "preload/bpf_preload.h"
 
 enum bpf_type {
@@ -601,6 +602,7 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 {
 	struct inode *inode = d_inode(root);
 	umode_t mode = inode->i_mode & S_IALLUGO & ~S_ISVTX;
+	struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
 
 	if (!uid_eq(inode->i_uid, GLOBAL_ROOT_UID))
 		seq_printf(m, ",uid=%u",
@@ -610,6 +612,26 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 			   from_kgid_munged(&init_user_ns, inode->i_gid));
 	if (mode != S_IRWXUGO)
 		seq_printf(m, ",mode=%o", mode);
+
+	if (opts->delegate_cmds == ~0ULL)
+		seq_printf(m, ",delegate_cmds=any");
+	else if (opts->delegate_cmds)
+		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
+
+	if (opts->delegate_maps == ~0ULL)
+		seq_printf(m, ",delegate_maps=any");
+	else if (opts->delegate_maps)
+		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
+
+	if (opts->delegate_progs == ~0ULL)
+		seq_printf(m, ",delegate_progs=any");
+	else if (opts->delegate_progs)
+		seq_printf(m, ",delegate_progs=0x%llx", opts->delegate_progs);
+
+	if (opts->delegate_attachs == ~0ULL)
+		seq_printf(m, ",delegate_attachs=any");
+	else if (opts->delegate_attachs)
+		seq_printf(m, ",delegate_attachs=0x%llx", opts->delegate_attachs);
 	return 0;
 }
 
@@ -635,28 +657,31 @@ enum {
 	OPT_UID,
 	OPT_GID,
 	OPT_MODE,
+	OPT_DELEGATE_CMDS,
+	OPT_DELEGATE_MAPS,
+	OPT_DELEGATE_PROGS,
+	OPT_DELEGATE_ATTACHS,
 };
 
 static const struct fs_parameter_spec bpf_fs_parameters[] = {
 	fsparam_u32	("uid",				OPT_UID),
 	fsparam_u32	("gid",				OPT_GID),
 	fsparam_u32oct	("mode",			OPT_MODE),
+	fsparam_string	("delegate_cmds",		OPT_DELEGATE_CMDS),
+	fsparam_string	("delegate_maps",		OPT_DELEGATE_MAPS),
+	fsparam_string	("delegate_progs",		OPT_DELEGATE_PROGS),
+	fsparam_string	("delegate_attachs",		OPT_DELEGATE_ATTACHS),
 	{}
 };
 
-struct bpf_mount_opts {
-	kuid_t uid;
-	kgid_t gid;
-	umode_t mode;
-};
-
 static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 {
-	struct bpf_mount_opts *opts = fc->fs_private;
+	struct bpf_mount_opts *opts = fc->s_fs_info;
 	struct fs_parse_result result;
 	kuid_t uid;
 	kgid_t gid;
-	int opt;
+	int opt, err;
+	u64 msk;
 
 	opt = fs_parse(fc, bpf_fs_parameters, param, &result);
 	if (opt < 0) {
@@ -708,6 +733,28 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 	case OPT_MODE:
 		opts->mode = result.uint_32 & S_IALLUGO;
 		break;
+	case OPT_DELEGATE_CMDS:
+	case OPT_DELEGATE_MAPS:
+	case OPT_DELEGATE_PROGS:
+	case OPT_DELEGATE_ATTACHS:
+		if (strcmp(param->string, "any") == 0) {
+			msk = ~0ULL;
+		} else {
+			err = kstrtou64(param->string, 0, &msk);
+			if (err)
+				return err;
+		}
+		/* Setting delegation mount options requires privileges */
+		if (msk && !capable(CAP_SYS_ADMIN))
+			return -EPERM;
+		switch (opt) {
+		case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
+		case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
+		case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
+		case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
+		default: return -EINVAL;
+		}
+		break;
 	}
 
 	return 0;
@@ -784,10 +831,14 @@ static int populate_bpffs(struct dentry *parent)
 static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
 {
 	static const struct tree_descr bpf_rfiles[] = { { "" } };
-	struct bpf_mount_opts *opts = fc->fs_private;
+	struct bpf_mount_opts *opts = sb->s_fs_info;
 	struct inode *inode;
 	int ret;
 
+	/* Mounting an instance of BPF FS requires privileges */
+	if (fc->user_ns != &init_user_ns && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	ret = simple_fill_super(sb, BPF_FS_MAGIC, bpf_rfiles);
 	if (ret)
 		return ret;
@@ -811,7 +862,7 @@ static int bpf_get_tree(struct fs_context *fc)
 
 static void bpf_free_fc(struct fs_context *fc)
 {
-	kfree(fc->fs_private);
+	kfree(fc->s_fs_info);
 }
 
 static const struct fs_context_operations bpf_context_ops = {
@@ -835,17 +886,32 @@ static int bpf_init_fs_context(struct fs_context *fc)
 	opts->uid = current_fsuid();
 	opts->gid = current_fsgid();
 
-	fc->fs_private = opts;
+	/* start out with no BPF token delegation enabled */
+	opts->delegate_cmds = 0;
+	opts->delegate_maps = 0;
+	opts->delegate_progs = 0;
+	opts->delegate_attachs = 0;
+
+	fc->s_fs_info = opts;
 	fc->ops = &bpf_context_ops;
 	return 0;
 }
 
+static void bpf_kill_super(struct super_block *sb)
+{
+	struct bpf_mount_opts *opts = sb->s_fs_info;
+
+	kill_litter_super(sb);
+	kfree(opts);
+}
+
 static struct file_system_type bpf_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "bpf",
 	.init_fs_context = bpf_init_fs_context,
 	.parameters	= bpf_fs_parameters,
-	.kill_sb	= kill_litter_super,
+	.kill_sb	= bpf_kill_super,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 static int __init bpf_init(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 01/29] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 02/29] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-05 20:25   ` Linus Torvalds
                     ` (2 more replies)
  2024-01-03 22:20 ` [PATCH bpf-next 04/29] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
                   ` (26 subsequent siblings)
  29 siblings, 3 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Add new kind of BPF kernel object, BPF token. BPF token is meant to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while having a good amount of control over which
privileged operations could be performed using provided BPF token.

This is achieved through mounting BPF FS instance with extra delegation
mount options, which determine what operations are delegatable, and also
constraining it to the owning user namespace (as mentioned in the
previous patch).

BPF token itself is just a derivative from BPF FS and can be created
through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
FS FD, which can be attained through open() API by opening BPF FS mount
point. Currently, BPF token "inherits" delegated command, map types,
prog type, and attach type bit sets from BPF FS as is. In the future,
having an BPF token as a separate object with its own FD, we can allow
to further restrict BPF token's allowable set of things either at the
creation time or after the fact, allowing the process to guard itself
further from unintentionally trying to load undesired kind of BPF
programs. But for now we keep things simple and just copy bit sets as is.

When BPF token is created from BPF FS mount, we take reference to the
BPF super block's owning user namespace, and then use that namespace for
checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
capabilities that are normally only checked against init userns (using
capable()), but now we check them using ns_capable() instead (if BPF
token is provided). See bpf_token_capable() for details.

Such setup means that BPF token in itself is not sufficient to grant BPF
functionality. User namespaced process has to *also* have necessary
combination of capabilities inside that user namespace. So while
previously CAP_BPF was useless when granted within user namespace, now
it gains a meaning and allows container managers and sys admins to have
a flexible control over which processes can and need to use BPF
functionality within the user namespace (i.e., container in practice).
And BPF FS delegation mount options and derived BPF tokens serve as
a per-container "flag" to grant overall ability to use bpf() (plus further
restrict on which parts of bpf() syscalls are treated as namespaced).

Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
within the BPF FS owning user namespace, rounding up the ns_capable()
story of BPF token.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h            |  41 +++++++
 include/uapi/linux/bpf.h       |  37 ++++++
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/inode.c             |  12 +-
 kernel/bpf/syscall.c           |  17 +++
 kernel/bpf/token.c             | 214 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  37 ++++++
 7 files changed, 354 insertions(+), 6 deletions(-)
 create mode 100644 kernel/bpf/token.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d3658bd959f2..e87fe928645f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -52,6 +52,10 @@ struct module;
 struct bpf_func_state;
 struct ftrace_ops;
 struct cgroup;
+struct bpf_token;
+struct user_namespace;
+struct super_block;
+struct inode;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -1620,6 +1624,13 @@ struct bpf_mount_opts {
 	u64 delegate_attachs;
 };
 
+struct bpf_token {
+	struct work_struct work;
+	atomic64_t refcnt;
+	struct user_namespace *userns;
+	u64 allowed_cmds;
+};
+
 struct bpf_struct_ops_value;
 struct btf_member;
 
@@ -2079,6 +2090,7 @@ static inline void bpf_enable_instrumentation(void)
 	migrate_enable();
 }
 
+extern const struct super_operations bpf_super_ops;
 extern const struct file_operations bpf_map_fops;
 extern const struct file_operations bpf_prog_fops;
 extern const struct file_operations bpf_iter_fops;
@@ -2213,6 +2225,8 @@ static inline void bpf_map_dec_elem_count(struct bpf_map *map)
 
 extern int sysctl_unprivileged_bpf_disabled;
 
+bool bpf_token_capable(const struct bpf_token *token, int cap);
+
 static inline bool bpf_allow_ptr_leaks(void)
 {
 	return perfmon_capable();
@@ -2247,8 +2261,17 @@ int bpf_link_new_fd(struct bpf_link *link);
 struct bpf_link *bpf_link_get_from_fd(u32 ufd);
 struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
 
+void bpf_token_inc(struct bpf_token *token);
+void bpf_token_put(struct bpf_token *token);
+int bpf_token_create(union bpf_attr *attr);
+struct bpf_token *bpf_token_get_from_fd(u32 ufd);
+
+bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
+
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
+struct inode *bpf_get_inode(struct super_block *sb, const struct inode *dir,
+			    umode_t mode);
 
 #define BPF_ITER_FUNC_PREFIX "bpf_iter_"
 #define DEFINE_BPF_ITER_FUNC(target, args...)			\
@@ -2606,6 +2629,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
 	return -EOPNOTSUPP;
 }
 
+static inline bool bpf_token_capable(const struct bpf_token *token, int cap)
+{
+	return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
+}
+
+static inline void bpf_token_inc(struct bpf_token *token)
+{
+}
+
+static inline void bpf_token_put(struct bpf_token *token)
+{
+}
+
+static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+
 static inline void __dev_flush(void)
 {
 }
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 754e68ca8744..ab48e037d543 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -847,6 +847,36 @@ union bpf_iter_link_info {
  *		Returns zero on success. On error, -1 is returned and *errno*
  *		is set appropriately.
  *
+ * BPF_TOKEN_CREATE
+ *	Description
+ *		Create BPF token with embedded information about what
+ *		BPF-related functionality it allows:
+ *		- a set of allowed bpf() syscall commands;
+ *		- a set of allowed BPF map types to be created with
+ *		BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
+ *		- a set of allowed BPF program types and BPF program attach
+ *		types to be loaded with BPF_PROG_LOAD command, if
+ *		BPF_PROG_LOAD itself is allowed.
+ *
+ *		BPF token is created (derived) from an instance of BPF FS,
+ *		assuming it has necessary delegation mount options specified.
+ *		This BPF token can be passed as an extra parameter to various
+ *		bpf() syscall commands to grant BPF subsystem functionality to
+ *		unprivileged processes.
+ *
+ *		When created, BPF token is "associated" with the owning
+ *		user namespace of BPF FS instance (super block) that it was
+ *		derived from, and subsequent BPF operations performed with
+ *		BPF token would be performing capabilities checks (i.e.,
+ *		CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
+ *		that user namespace. Without BPF token, such capabilities
+ *		have to be granted in init user namespace, making bpf()
+ *		syscall incompatible with user namespace, for the most part.
+ *
+ *	Return
+ *		A new file descriptor (a nonnegative integer), or -1 if an
+ *		error occurred (in which case, *errno* is set appropriately).
+ *
  * NOTES
  *	eBPF objects (maps and programs) can be shared between processes.
  *
@@ -901,6 +931,8 @@ enum bpf_cmd {
 	BPF_ITER_CREATE,
 	BPF_LINK_DETACH,
 	BPF_PROG_BIND_MAP,
+	BPF_TOKEN_CREATE,
+	__MAX_BPF_CMD,
 };
 
 enum bpf_map_type {
@@ -1714,6 +1746,11 @@ union bpf_attr {
 		__u32		flags;		/* extra flags */
 	} prog_bind_map;
 
+	struct { /* struct used by BPF_TOKEN_CREATE command */
+		__u32		flags;
+		__u32		bpffs_fd;
+	} token_create;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index f526b7573e97..4ce95acfcaa7 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -6,7 +6,7 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
 endif
 CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
-obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
+obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 70b748f6228c..565be1f3f1ea 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -99,9 +99,9 @@ static const struct inode_operations bpf_prog_iops = { };
 static const struct inode_operations bpf_map_iops  = { };
 static const struct inode_operations bpf_link_iops  = { };
 
-static struct inode *bpf_get_inode(struct super_block *sb,
-				   const struct inode *dir,
-				   umode_t mode)
+struct inode *bpf_get_inode(struct super_block *sb,
+			    const struct inode *dir,
+			    umode_t mode)
 {
 	struct inode *inode;
 
@@ -603,6 +603,7 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	struct inode *inode = d_inode(root);
 	umode_t mode = inode->i_mode & S_IALLUGO & ~S_ISVTX;
 	struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
+	u64 mask;
 
 	if (!uid_eq(inode->i_uid, GLOBAL_ROOT_UID))
 		seq_printf(m, ",uid=%u",
@@ -613,7 +614,8 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	if (mode != S_IRWXUGO)
 		seq_printf(m, ",mode=%o", mode);
 
-	if (opts->delegate_cmds == ~0ULL)
+	mask = (1ULL << __MAX_BPF_CMD) - 1;
+	if ((opts->delegate_cmds & mask) == mask)
 		seq_printf(m, ",delegate_cmds=any");
 	else if (opts->delegate_cmds)
 		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
@@ -646,7 +648,7 @@ static void bpf_free_inode(struct inode *inode)
 	free_inode_nonrcu(inode);
 }
 
-static const struct super_operations bpf_super_ops = {
+const struct super_operations bpf_super_ops = {
 	.statfs		= simple_statfs,
 	.drop_inode	= generic_delete_inode,
 	.show_options	= bpf_show_options,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 2392e1802364..423702f33ba6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5390,6 +5390,20 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
 	return ret;
 }
 
+#define BPF_TOKEN_CREATE_LAST_FIELD token_create.bpffs_fd
+
+static int token_create(union bpf_attr *attr)
+{
+	if (CHECK_ATTR(BPF_TOKEN_CREATE))
+		return -EINVAL;
+
+	/* no flags are supported yet */
+	if (attr->token_create.flags)
+		return -EINVAL;
+
+	return bpf_token_create(attr);
+}
+
 static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 {
 	union bpf_attr attr;
@@ -5523,6 +5537,9 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 	case BPF_PROG_BIND_MAP:
 		err = bpf_prog_bind_map(&attr);
 		break;
+	case BPF_TOKEN_CREATE:
+		err = token_create(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
new file mode 100644
index 000000000000..e18aaecc67e9
--- /dev/null
+++ b/kernel/bpf/token.c
@@ -0,0 +1,214 @@
+#include <linux/bpf.h>
+#include <linux/vmalloc.h>
+#include <linux/fdtable.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/idr.h>
+#include <linux/namei.h>
+#include <linux/user_namespace.h>
+
+bool bpf_token_capable(const struct bpf_token *token, int cap)
+{
+	/* BPF token allows ns_capable() level of capabilities, but only if
+	 * token's userns is *exactly* the same as current user's userns
+	 */
+	if (token && current_user_ns() == token->userns) {
+		if (ns_capable(token->userns, cap))
+			return true;
+		if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
+			return true;
+	}
+	/* otherwise fallback to capable() checks */
+	return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
+}
+
+void bpf_token_inc(struct bpf_token *token)
+{
+	atomic64_inc(&token->refcnt);
+}
+
+static void bpf_token_free(struct bpf_token *token)
+{
+	put_user_ns(token->userns);
+	kvfree(token);
+}
+
+static void bpf_token_put_deferred(struct work_struct *work)
+{
+	struct bpf_token *token = container_of(work, struct bpf_token, work);
+
+	bpf_token_free(token);
+}
+
+void bpf_token_put(struct bpf_token *token)
+{
+	if (!token)
+		return;
+
+	if (!atomic64_dec_and_test(&token->refcnt))
+		return;
+
+	INIT_WORK(&token->work, bpf_token_put_deferred);
+	schedule_work(&token->work);
+}
+
+static int bpf_token_release(struct inode *inode, struct file *filp)
+{
+	struct bpf_token *token = filp->private_data;
+
+	bpf_token_put(token);
+	return 0;
+}
+
+static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
+{
+	struct bpf_token *token = filp->private_data;
+	u64 mask;
+
+	BUILD_BUG_ON(__MAX_BPF_CMD >= 64);
+	mask = (1ULL << __MAX_BPF_CMD) - 1;
+	if ((token->allowed_cmds & mask) == mask)
+		seq_printf(m, "allowed_cmds:\tany\n");
+	else
+		seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds);
+}
+
+#define BPF_TOKEN_INODE_NAME "bpf-token"
+
+static const struct inode_operations bpf_token_iops = { };
+
+static const struct file_operations bpf_token_fops = {
+	.release	= bpf_token_release,
+	.show_fdinfo	= bpf_token_show_fdinfo,
+};
+
+int bpf_token_create(union bpf_attr *attr)
+{
+	struct bpf_mount_opts *mnt_opts;
+	struct bpf_token *token = NULL;
+	struct user_namespace *userns;
+	struct inode *inode;
+	struct file *file;
+	struct path path;
+	struct fd f;
+	umode_t mode;
+	int err, fd;
+
+	f = fdget(attr->token_create.bpffs_fd);
+	if (!f.file)
+		return -EBADF;
+
+	path = f.file->f_path;
+	path_get(&path);
+	fdput(f);
+
+	if (path.dentry != path.mnt->mnt_sb->s_root) {
+		err = -EINVAL;
+		goto out_path;
+	}
+	if (path.mnt->mnt_sb->s_op != &bpf_super_ops) {
+		err = -EINVAL;
+		goto out_path;
+	}
+	err = path_permission(&path, MAY_ACCESS);
+	if (err)
+		goto out_path;
+
+	userns = path.dentry->d_sb->s_user_ns;
+	/*
+	 * Enforce that creators of BPF tokens are in the same user
+	 * namespace as the BPF FS instance. This makes reasoning about
+	 * permissions a lot easier and we can always relax this later.
+	 */
+	if (current_user_ns() != userns) {
+		err = -EPERM;
+		goto out_path;
+	}
+	if (!ns_capable(userns, CAP_BPF)) {
+		err = -EPERM;
+		goto out_path;
+	}
+
+	mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
+	inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
+	if (IS_ERR(inode)) {
+		err = PTR_ERR(inode);
+		goto out_path;
+	}
+
+	inode->i_op = &bpf_token_iops;
+	inode->i_fop = &bpf_token_fops;
+	clear_nlink(inode); /* make sure it is unlinked */
+
+	file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
+	if (IS_ERR(file)) {
+		iput(inode);
+		err = PTR_ERR(file);
+		goto out_path;
+	}
+
+	token = kvzalloc(sizeof(*token), GFP_USER);
+	if (!token) {
+		err = -ENOMEM;
+		goto out_file;
+	}
+
+	atomic64_set(&token->refcnt, 1);
+
+	/* remember bpffs owning userns for future ns_capable() checks */
+	token->userns = get_user_ns(userns);
+
+	mnt_opts = path.dentry->d_sb->s_fs_info;
+	token->allowed_cmds = mnt_opts->delegate_cmds;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		err = fd;
+		goto out_token;
+	}
+
+	file->private_data = token;
+	fd_install(fd, file);
+
+	path_put(&path);
+	return fd;
+
+out_token:
+	bpf_token_free(token);
+out_file:
+	fput(file);
+out_path:
+	path_put(&path);
+	return err;
+}
+
+struct bpf_token *bpf_token_get_from_fd(u32 ufd)
+{
+	struct fd f = fdget(ufd);
+	struct bpf_token *token;
+
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+	if (f.file->f_op != &bpf_token_fops) {
+		fdput(f);
+		return ERR_PTR(-EINVAL);
+	}
+
+	token = f.file->private_data;
+	bpf_token_inc(token);
+	fdput(f);
+
+	return token;
+}
+
+bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
+{
+	/* BPF token can be used only within exactly the same userns in which
+	 * it was created
+	 */
+	if (!token || current_user_ns() != token->userns)
+		return false;
+
+	return token->allowed_cmds & (1ULL << cmd);
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7f24d898efbb..57487d0c0b73 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -847,6 +847,36 @@ union bpf_iter_link_info {
  *		Returns zero on success. On error, -1 is returned and *errno*
  *		is set appropriately.
  *
+ * BPF_TOKEN_CREATE
+ *	Description
+ *		Create BPF token with embedded information about what
+ *		BPF-related functionality it allows:
+ *		- a set of allowed bpf() syscall commands;
+ *		- a set of allowed BPF map types to be created with
+ *		BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
+ *		- a set of allowed BPF program types and BPF program attach
+ *		types to be loaded with BPF_PROG_LOAD command, if
+ *		BPF_PROG_LOAD itself is allowed.
+ *
+ *		BPF token is created (derived) from an instance of BPF FS,
+ *		assuming it has necessary delegation mount options specified.
+ *		This BPF token can be passed as an extra parameter to various
+ *		bpf() syscall commands to grant BPF subsystem functionality to
+ *		unprivileged processes.
+ *
+ *		When created, BPF token is "associated" with the owning
+ *		user namespace of BPF FS instance (super block) that it was
+ *		derived from, and subsequent BPF operations performed with
+ *		BPF token would be performing capabilities checks (i.e.,
+ *		CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
+ *		that user namespace. Without BPF token, such capabilities
+ *		have to be granted in init user namespace, making bpf()
+ *		syscall incompatible with user namespace, for the most part.
+ *
+ *	Return
+ *		A new file descriptor (a nonnegative integer), or -1 if an
+ *		error occurred (in which case, *errno* is set appropriately).
+ *
  * NOTES
  *	eBPF objects (maps and programs) can be shared between processes.
  *
@@ -901,6 +931,8 @@ enum bpf_cmd {
 	BPF_ITER_CREATE,
 	BPF_LINK_DETACH,
 	BPF_PROG_BIND_MAP,
+	BPF_TOKEN_CREATE,
+	__MAX_BPF_CMD,
 };
 
 enum bpf_map_type {
@@ -1714,6 +1746,11 @@ union bpf_attr {
 		__u32		flags;		/* extra flags */
 	} prog_bind_map;
 
+	struct { /* struct used by BPF_TOKEN_CREATE command */
+		__u32		flags;
+		__u32		bpffs_fd;
+	} token_create;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 04/29] bpf: add BPF token support to BPF_MAP_CREATE command
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 03/29] bpf: introduce BPF token object Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 05/29] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
BPF map creation from unprivileged process through delegated BPF token.
New BPF_F_TOKEN_FD flag is added to specify together with BPF token FD
for BPF_MAP_CREATE command.

Wire through a set of allowed BPF map types to BPF token, derived from
BPF FS at BPF token creation time. This, in combination with allowed_cmds
allows to create a narrowly-focused BPF token (controlled by privileged
agent) with a restrictive set of BPF maps that application can attempt
to create.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h                           |  2 +
 include/uapi/linux/bpf.h                      |  7 +++
 kernel/bpf/inode.c                            |  3 +-
 kernel/bpf/syscall.c                          | 59 ++++++++++++++-----
 kernel/bpf/token.c                            | 16 +++++
 tools/include/uapi/linux/bpf.h                |  8 +++
 .../selftests/bpf/prog_tests/libbpf_probes.c  |  2 +
 .../selftests/bpf/prog_tests/libbpf_str.c     |  3 +
 8 files changed, 85 insertions(+), 15 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e87fe928645f..ad51f3d9f3f7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1629,6 +1629,7 @@ struct bpf_token {
 	atomic64_t refcnt;
 	struct user_namespace *userns;
 	u64 allowed_cmds;
+	u64 allowed_maps;
 };
 
 struct bpf_struct_ops_value;
@@ -2267,6 +2268,7 @@ int bpf_token_create(union bpf_attr *attr);
 struct bpf_token *bpf_token_get_from_fd(u32 ufd);
 
 bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
+bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type);
 
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ab48e037d543..f493f14ce6ef 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -983,6 +983,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_BLOOM_FILTER,
 	BPF_MAP_TYPE_USER_RINGBUF,
 	BPF_MAP_TYPE_CGRP_STORAGE,
+	__MAX_BPF_MAP_TYPE
 };
 
 /* Note that tracing related programs such as
@@ -1362,6 +1363,8 @@ enum {
 
 /* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
 	BPF_F_PATH_FD		= (1U << 14),
+/* BPF token FD is passed in a corresponding command's token_fd field */
+	BPF_F_TOKEN_FD          = (1U << 15),
 };
 
 /* Flags for BPF_PROG_QUERY. */
@@ -1435,6 +1438,10 @@ union bpf_attr {
 		 * to using 5 hash functions).
 		 */
 		__u64	map_extra;
+		/* BPF token FD to use with BPF_MAP_CREATE operation.
+		 * If provided, map_flags should have BPF_F_TOKEN_FD flag set.
+		 */
+		__s32	map_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 565be1f3f1ea..034b7e4d8f19 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -620,7 +620,8 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	else if (opts->delegate_cmds)
 		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
 
-	if (opts->delegate_maps == ~0ULL)
+	mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1;
+	if ((opts->delegate_maps & mask) == mask)
 		seq_printf(m, ",delegate_maps=any");
 	else if (opts->delegate_maps)
 		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 423702f33ba6..65c9feafb02f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1011,8 +1011,8 @@ int map_check_no_btf(const struct bpf_map *map,
 	return -ENOTSUPP;
 }
 
-static int map_check_btf(struct bpf_map *map, const struct btf *btf,
-			 u32 btf_key_id, u32 btf_value_id)
+static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
+			 const struct btf *btf, u32 btf_key_id, u32 btf_value_id)
 {
 	const struct btf_type *key_type, *value_type;
 	u32 key_size, value_size;
@@ -1040,7 +1040,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	if (!IS_ERR_OR_NULL(map->record)) {
 		int i;
 
-		if (!bpf_capable()) {
+		if (!bpf_token_capable(token, CAP_BPF)) {
 			ret = -EPERM;
 			goto free_map_tab;
 		}
@@ -1128,14 +1128,16 @@ static bool bpf_net_capable(void)
 	return capable(CAP_NET_ADMIN) || capable(CAP_SYS_ADMIN);
 }
 
-#define BPF_MAP_CREATE_LAST_FIELD map_extra
+#define BPF_MAP_CREATE_LAST_FIELD map_token_fd
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
 {
 	const struct bpf_map_ops *ops;
+	struct bpf_token *token = NULL;
 	int numa_node = bpf_map_attr_numa_node(attr);
 	u32 map_type = attr->map_type;
 	struct bpf_map *map;
+	bool token_flag;
 	int f_flags;
 	int err;
 
@@ -1143,6 +1145,12 @@ static int map_create(union bpf_attr *attr)
 	if (err)
 		return -EINVAL;
 
+	/* check BPF_F_TOKEN_FD flag, remember if it's set, and then clear it
+	 * to avoid per-map type checks tripping on unknown flag
+	 */
+	token_flag = attr->map_flags & BPF_F_TOKEN_FD;
+	attr->map_flags &= ~BPF_F_TOKEN_FD;
+
 	if (attr->btf_vmlinux_value_type_id) {
 		if (attr->map_type != BPF_MAP_TYPE_STRUCT_OPS ||
 		    attr->btf_key_type_id || attr->btf_value_type_id)
@@ -1183,14 +1191,32 @@ static int map_create(union bpf_attr *attr)
 	if (!ops->map_mem_usage)
 		return -EINVAL;
 
+	if (token_flag) {
+		token = bpf_token_get_from_fd(attr->map_token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+
+		/* if current token doesn't grant map creation permissions,
+		 * then we can't use this token, so ignore it and rely on
+		 * system-wide capabilities checks
+		 */
+		if (!bpf_token_allow_cmd(token, BPF_MAP_CREATE) ||
+		    !bpf_token_allow_map_type(token, attr->map_type)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	err = -EPERM;
+
 	/* Intent here is for unprivileged_bpf_disabled to block BPF map
 	 * creation for unprivileged users; other actions depend
 	 * on fd availability and access to bpffs, so are dependent on
 	 * object creation success. Even with unprivileged BPF disabled,
 	 * capability checks are still carried out.
 	 */
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
-		return -EPERM;
+	if (sysctl_unprivileged_bpf_disabled && !bpf_token_capable(token, CAP_BPF))
+		goto put_token;
 
 	/* check privileged map type permissions */
 	switch (map_type) {
@@ -1223,25 +1249,27 @@ static int map_create(union bpf_attr *attr)
 	case BPF_MAP_TYPE_LRU_PERCPU_HASH:
 	case BPF_MAP_TYPE_STRUCT_OPS:
 	case BPF_MAP_TYPE_CPUMAP:
-		if (!bpf_capable())
-			return -EPERM;
+		if (!bpf_token_capable(token, CAP_BPF))
+			goto put_token;
 		break;
 	case BPF_MAP_TYPE_SOCKMAP:
 	case BPF_MAP_TYPE_SOCKHASH:
 	case BPF_MAP_TYPE_DEVMAP:
 	case BPF_MAP_TYPE_DEVMAP_HASH:
 	case BPF_MAP_TYPE_XSKMAP:
-		if (!bpf_net_capable())
-			return -EPERM;
+		if (!bpf_token_capable(token, CAP_NET_ADMIN))
+			goto put_token;
 		break;
 	default:
 		WARN(1, "unsupported map type %d", map_type);
-		return -EPERM;
+		goto put_token;
 	}
 
 	map = ops->map_alloc(attr);
-	if (IS_ERR(map))
-		return PTR_ERR(map);
+	if (IS_ERR(map)) {
+		err = PTR_ERR(map);
+		goto put_token;
+	}
 	map->ops = ops;
 	map->map_type = map_type;
 
@@ -1278,7 +1306,7 @@ static int map_create(union bpf_attr *attr)
 		map->btf = btf;
 
 		if (attr->btf_value_type_id) {
-			err = map_check_btf(map, btf, attr->btf_key_type_id,
+			err = map_check_btf(map, token, btf, attr->btf_key_type_id,
 					    attr->btf_value_type_id);
 			if (err)
 				goto free_map;
@@ -1299,6 +1327,7 @@ static int map_create(union bpf_attr *attr)
 		goto free_map_sec;
 
 	bpf_map_save_memcg(map);
+	bpf_token_put(token);
 
 	err = bpf_map_new_fd(map, f_flags);
 	if (err < 0) {
@@ -1319,6 +1348,8 @@ static int map_create(union bpf_attr *attr)
 free_map:
 	btf_put(map->btf);
 	map->ops->map_free(map);
+put_token:
+	bpf_token_put(token);
 	return err;
 }
 
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index e18aaecc67e9..06c34dae658e 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -72,6 +72,13 @@ static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
 		seq_printf(m, "allowed_cmds:\tany\n");
 	else
 		seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds);
+
+	BUILD_BUG_ON(__MAX_BPF_MAP_TYPE >= 64);
+	mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1;
+	if ((token->allowed_maps & mask) == mask)
+		seq_printf(m, "allowed_maps:\tany\n");
+	else
+		seq_printf(m, "allowed_maps:\t0x%llx\n", token->allowed_maps);
 }
 
 #define BPF_TOKEN_INODE_NAME "bpf-token"
@@ -161,6 +168,7 @@ int bpf_token_create(union bpf_attr *attr)
 
 	mnt_opts = path.dentry->d_sb->s_fs_info;
 	token->allowed_cmds = mnt_opts->delegate_cmds;
+	token->allowed_maps = mnt_opts->delegate_maps;
 
 	fd = get_unused_fd_flags(O_CLOEXEC);
 	if (fd < 0) {
@@ -212,3 +220,11 @@ bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
 
 	return token->allowed_cmds & (1ULL << cmd);
 }
+
+bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type)
+{
+	if (!token || type >= __MAX_BPF_MAP_TYPE)
+		return false;
+
+	return token->allowed_maps & (1ULL << type);
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 57487d0c0b73..f493f14ce6ef 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -983,6 +983,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_BLOOM_FILTER,
 	BPF_MAP_TYPE_USER_RINGBUF,
 	BPF_MAP_TYPE_CGRP_STORAGE,
+	__MAX_BPF_MAP_TYPE
 };
 
 /* Note that tracing related programs such as
@@ -1362,6 +1363,8 @@ enum {
 
 /* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
 	BPF_F_PATH_FD		= (1U << 14),
+/* BPF token FD is passed in a corresponding command's token_fd field */
+	BPF_F_TOKEN_FD          = (1U << 15),
 };
 
 /* Flags for BPF_PROG_QUERY. */
@@ -1435,6 +1438,10 @@ union bpf_attr {
 		 * to using 5 hash functions).
 		 */
 		__u64	map_extra;
+		/* BPF token FD to use with BPF_MAP_CREATE operation.
+		 * If provided, map_flags should have BPF_F_TOKEN_FD flag set.
+		 */
+		__s32	map_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -6941,6 +6948,7 @@ enum {
 	BPF_TCP_LISTEN,
 	BPF_TCP_CLOSING,	/* Now a valid state */
 	BPF_TCP_NEW_SYN_RECV,
+	BPF_TCP_BOUND_INACTIVE,
 
 	BPF_TCP_MAX_STATES	/* Leave at the end! */
 };
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
index 9f766ddd946a..573249a2814d 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
@@ -68,6 +68,8 @@ void test_libbpf_probe_map_types(void)
 
 		if (map_type == BPF_MAP_TYPE_UNSPEC)
 			continue;
+		if (strcmp(map_type_name, "__MAX_BPF_MAP_TYPE") == 0)
+			continue;
 
 		if (!test__start_subtest(map_type_name))
 			continue;
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
index eb34d612d6f8..1f328c0d8aff 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
@@ -132,6 +132,9 @@ static void test_libbpf_bpf_map_type_str(void)
 		const char *map_type_str;
 		char buf[256];
 
+		if (map_type == __MAX_BPF_MAP_TYPE)
+			continue;
+
 		map_type_name = btf__str_by_offset(btf, e->name_off);
 		map_type_str = libbpf_bpf_map_type_str(map_type);
 		ASSERT_OK_PTR(map_type_str, map_type_name);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 05/29] bpf: add BPF token support to BPF_BTF_LOAD command
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 04/29] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 06/29] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
through delegated BPF token. BPF_F_TOKEN_FD flag has to be specified
when passing BPF token FD. Given BPF_BTF_LOAD command didn't have flags
field before, we also add btf_flags field.

BTF loading is a pretty straightforward operation, so as long as BPF
token is created with allow_cmds granting BPF_BTF_LOAD command, kernel
proceeds to parsing BTF data and creating BTF object.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/uapi/linux/bpf.h       |  5 +++++
 kernel/bpf/syscall.c           | 23 +++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h |  5 +++++
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f493f14ce6ef..71fb04a3fe00 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1623,6 +1623,11 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		btf_log_true_size;
+		__u32		btf_flags;
+		/* BPF token FD to use with BPF_BTF_LOAD operation.
+		 * If provided, btf_flags should have BPF_F_TOKEN_FD flag set.
+		 */
+		__s32		btf_token_fd;
 	};
 
 	struct {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 65c9feafb02f..59b8e754c42d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4795,15 +4795,34 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 	return err;
 }
 
-#define BPF_BTF_LOAD_LAST_FIELD btf_log_true_size
+#define BPF_BTF_LOAD_LAST_FIELD btf_token_fd
 
 static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
 {
+	struct bpf_token *token = NULL;
+
 	if (CHECK_ATTR(BPF_BTF_LOAD))
 		return -EINVAL;
 
-	if (!bpf_capable())
+	if (attr->btf_flags & ~BPF_F_TOKEN_FD)
+		return -EINVAL;
+
+	if (attr->btf_flags & BPF_F_TOKEN_FD) {
+		token = bpf_token_get_from_fd(attr->btf_token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+		if (!bpf_token_allow_cmd(token, BPF_BTF_LOAD)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	if (!bpf_token_capable(token, CAP_BPF)) {
+		bpf_token_put(token);
 		return -EPERM;
+	}
+
+	bpf_token_put(token);
 
 	return btf_new_fd(attr, uattr, uattr_size);
 }
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f493f14ce6ef..71fb04a3fe00 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1623,6 +1623,11 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		btf_log_true_size;
+		__u32		btf_flags;
+		/* BPF token FD to use with BPF_BTF_LOAD operation.
+		 * If provided, btf_flags should have BPF_F_TOKEN_FD flag set.
+		 */
+		__s32		btf_token_fd;
 	};
 
 	struct {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 06/29] bpf: add BPF token support to BPF_PROG_LOAD command
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 05/29] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 07/29] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Add basic support of BPF token to BPF_PROG_LOAD. BPF_F_TOKEN_FD flag
should be set in prog_flags field when providing prog_token_fd.

Wire through a set of allowed BPF program types and attach types,
derived from BPF FS at BPF token creation time. Then make sure we
perform bpf_token_capable() checks everywhere where it's relevant.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h                           |  6 ++
 include/uapi/linux/bpf.h                      |  5 ++
 kernel/bpf/core.c                             |  1 +
 kernel/bpf/inode.c                            |  6 +-
 kernel/bpf/syscall.c                          | 90 +++++++++++++------
 kernel/bpf/token.c                            | 27 ++++++
 tools/include/uapi/linux/bpf.h                |  5 ++
 .../selftests/bpf/prog_tests/libbpf_probes.c  |  2 +
 .../selftests/bpf/prog_tests/libbpf_str.c     |  3 +
 9 files changed, 118 insertions(+), 27 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ad51f3d9f3f7..4bcdb01c6619 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1488,6 +1488,7 @@ struct bpf_prog_aux {
 #ifdef CONFIG_SECURITY
 	void *security;
 #endif
+	struct bpf_token *token;
 	struct bpf_prog_offload *offload;
 	struct btf *btf;
 	struct bpf_func_info *func_info;
@@ -1630,6 +1631,8 @@ struct bpf_token {
 	struct user_namespace *userns;
 	u64 allowed_cmds;
 	u64 allowed_maps;
+	u64 allowed_progs;
+	u64 allowed_attachs;
 };
 
 struct bpf_struct_ops_value;
@@ -2269,6 +2272,9 @@ struct bpf_token *bpf_token_get_from_fd(u32 ufd);
 
 bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
 bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type);
+bool bpf_token_allow_prog_type(const struct bpf_token *token,
+			       enum bpf_prog_type prog_type,
+			       enum bpf_attach_type attach_type);
 
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 71fb04a3fe00..3eaf6c00f624 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1028,6 +1028,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
 	BPF_PROG_TYPE_NETFILTER,
+	__MAX_BPF_PROG_TYPE
 };
 
 enum bpf_attach_type {
@@ -1511,6 +1512,10 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		log_true_size;
+		/* BPF token FD to use with BPF_PROG_LOAD operation.
+		 * If provided, prog_flags should have BPF_F_TOKEN_FD flag set.
+		 */
+		__s32		prog_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index ea6843be2616..62e21ba90230 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2779,6 +2779,7 @@ void bpf_prog_free(struct bpf_prog *fp)
 
 	if (aux->dst_prog)
 		bpf_prog_put(aux->dst_prog);
+	bpf_token_put(aux->token);
 	INIT_WORK(&aux->work, bpf_prog_free_deferred);
 	schedule_work(&aux->work);
 }
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 034b7e4d8f19..5fb10da5717f 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -626,12 +626,14 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	else if (opts->delegate_maps)
 		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
 
-	if (opts->delegate_progs == ~0ULL)
+	mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1;
+	if ((opts->delegate_progs & mask) == mask)
 		seq_printf(m, ",delegate_progs=any");
 	else if (opts->delegate_progs)
 		seq_printf(m, ",delegate_progs=0x%llx", opts->delegate_progs);
 
-	if (opts->delegate_attachs == ~0ULL)
+	mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1;
+	if ((opts->delegate_attachs & mask) == mask)
 		seq_printf(m, ",delegate_attachs=any");
 	else if (opts->delegate_attachs)
 		seq_printf(m, ",delegate_attachs=0x%llx", opts->delegate_attachs);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 59b8e754c42d..91b2a2dc4fb0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2626,13 +2626,15 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
 }
 
 /* last field in 'union bpf_attr' used by this command */
-#define	BPF_PROG_LOAD_LAST_FIELD log_true_size
+#define BPF_PROG_LOAD_LAST_FIELD prog_token_fd
 
 static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 {
 	enum bpf_prog_type type = attr->prog_type;
 	struct bpf_prog *prog, *dst_prog = NULL;
 	struct btf *attach_btf = NULL;
+	struct bpf_token *token = NULL;
+	bool bpf_cap;
 	int err;
 	char license[128];
 
@@ -2646,13 +2648,35 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 				 BPF_F_TEST_RND_HI32 |
 				 BPF_F_XDP_HAS_FRAGS |
 				 BPF_F_XDP_DEV_BOUND_ONLY |
-				 BPF_F_TEST_REG_INVARIANTS))
+				 BPF_F_TEST_REG_INVARIANTS |
+				 BPF_F_TOKEN_FD))
 		return -EINVAL;
 
+	bpf_prog_load_fixup_attach_type(attr);
+
+	if (attr->prog_flags & BPF_F_TOKEN_FD) {
+		token = bpf_token_get_from_fd(attr->prog_token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+		/* if current token doesn't grant prog loading permissions,
+		 * then we can't use this token, so ignore it and rely on
+		 * system-wide capabilities checks
+		 */
+		if (!bpf_token_allow_cmd(token, BPF_PROG_LOAD) ||
+		    !bpf_token_allow_prog_type(token, attr->prog_type,
+					       attr->expected_attach_type)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	bpf_cap = bpf_token_capable(token, CAP_BPF);
+	err = -EPERM;
+
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
 	    (attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
-	    !bpf_capable())
-		return -EPERM;
+	    !bpf_cap)
+		goto put_token;
 
 	/* Intent here is for unprivileged_bpf_disabled to block BPF program
 	 * creation for unprivileged users; other actions depend
@@ -2661,21 +2685,23 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	 * capability checks are still carried out for these
 	 * and other operations.
 	 */
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
-		return -EPERM;
+	if (sysctl_unprivileged_bpf_disabled && !bpf_cap)
+		goto put_token;
 
 	if (attr->insn_cnt == 0 ||
-	    attr->insn_cnt > (bpf_capable() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
-		return -E2BIG;
+	    attr->insn_cnt > (bpf_cap ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS)) {
+		err = -E2BIG;
+		goto put_token;
+	}
 	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
 	    type != BPF_PROG_TYPE_CGROUP_SKB &&
-	    !bpf_capable())
-		return -EPERM;
+	    !bpf_cap)
+		goto put_token;
 
-	if (is_net_admin_prog_type(type) && !bpf_net_capable())
-		return -EPERM;
-	if (is_perfmon_prog_type(type) && !perfmon_capable())
-		return -EPERM;
+	if (is_net_admin_prog_type(type) && !bpf_token_capable(token, CAP_NET_ADMIN))
+		goto put_token;
+	if (is_perfmon_prog_type(type) && !bpf_token_capable(token, CAP_PERFMON))
+		goto put_token;
 
 	/* attach_prog_fd/attach_btf_obj_fd can specify fd of either bpf_prog
 	 * or btf, we need to check which one it is
@@ -2685,27 +2711,33 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 		if (IS_ERR(dst_prog)) {
 			dst_prog = NULL;
 			attach_btf = btf_get_by_fd(attr->attach_btf_obj_fd);
-			if (IS_ERR(attach_btf))
-				return -EINVAL;
+			if (IS_ERR(attach_btf)) {
+				err = -EINVAL;
+				goto put_token;
+			}
 			if (!btf_is_kernel(attach_btf)) {
 				/* attaching through specifying bpf_prog's BTF
 				 * objects directly might be supported eventually
 				 */
 				btf_put(attach_btf);
-				return -ENOTSUPP;
+				err = -ENOTSUPP;
+				goto put_token;
 			}
 		}
 	} else if (attr->attach_btf_id) {
 		/* fall back to vmlinux BTF, if BTF type ID is specified */
 		attach_btf = bpf_get_btf_vmlinux();
-		if (IS_ERR(attach_btf))
-			return PTR_ERR(attach_btf);
-		if (!attach_btf)
-			return -EINVAL;
+		if (IS_ERR(attach_btf)) {
+			err = PTR_ERR(attach_btf);
+			goto put_token;
+		}
+		if (!attach_btf) {
+			err = -EINVAL;
+			goto put_token;
+		}
 		btf_get(attach_btf);
 	}
 
-	bpf_prog_load_fixup_attach_type(attr);
 	if (bpf_prog_load_check_attach(type, attr->expected_attach_type,
 				       attach_btf, attr->attach_btf_id,
 				       dst_prog)) {
@@ -2713,7 +2745,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 			bpf_prog_put(dst_prog);
 		if (attach_btf)
 			btf_put(attach_btf);
-		return -EINVAL;
+		err = -EINVAL;
+		goto put_token;
 	}
 
 	/* plain bpf_prog allocation */
@@ -2723,7 +2756,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 			bpf_prog_put(dst_prog);
 		if (attach_btf)
 			btf_put(attach_btf);
-		return -ENOMEM;
+		err = -EINVAL;
+		goto put_token;
 	}
 
 	prog->expected_attach_type = attr->expected_attach_type;
@@ -2734,6 +2768,10 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
 	prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
 
+	/* move token into prog->aux, reuse taken refcnt */
+	prog->aux->token = token;
+	token = NULL;
+
 	err = security_bpf_prog_alloc(prog->aux);
 	if (err)
 		goto free_prog;
@@ -2835,6 +2873,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	if (prog->aux->attach_btf)
 		btf_put(prog->aux->attach_btf);
 	bpf_prog_free(prog);
+put_token:
+	bpf_token_put(token);
 	return err;
 }
 
@@ -3824,7 +3864,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	case BPF_PROG_TYPE_SK_LOOKUP:
 		return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
 	case BPF_PROG_TYPE_CGROUP_SKB:
-		if (!bpf_net_capable())
+		if (!bpf_token_capable(prog->aux->token, CAP_NET_ADMIN))
 			/* cg-skb progs can be loaded by unpriv user.
 			 * check permissions at attach time.
 			 */
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index 06c34dae658e..5a51e6b8f6bf 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -79,6 +79,20 @@ static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
 		seq_printf(m, "allowed_maps:\tany\n");
 	else
 		seq_printf(m, "allowed_maps:\t0x%llx\n", token->allowed_maps);
+
+	BUILD_BUG_ON(__MAX_BPF_PROG_TYPE >= 64);
+	mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1;
+	if ((token->allowed_progs & mask) == mask)
+		seq_printf(m, "allowed_progs:\tany\n");
+	else
+		seq_printf(m, "allowed_progs:\t0x%llx\n", token->allowed_progs);
+
+	BUILD_BUG_ON(__MAX_BPF_ATTACH_TYPE >= 64);
+	mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1;
+	if ((token->allowed_attachs & mask) == mask)
+		seq_printf(m, "allowed_attachs:\tany\n");
+	else
+		seq_printf(m, "allowed_attachs:\t0x%llx\n", token->allowed_attachs);
 }
 
 #define BPF_TOKEN_INODE_NAME "bpf-token"
@@ -169,6 +183,8 @@ int bpf_token_create(union bpf_attr *attr)
 	mnt_opts = path.dentry->d_sb->s_fs_info;
 	token->allowed_cmds = mnt_opts->delegate_cmds;
 	token->allowed_maps = mnt_opts->delegate_maps;
+	token->allowed_progs = mnt_opts->delegate_progs;
+	token->allowed_attachs = mnt_opts->delegate_attachs;
 
 	fd = get_unused_fd_flags(O_CLOEXEC);
 	if (fd < 0) {
@@ -228,3 +244,14 @@ bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type t
 
 	return token->allowed_maps & (1ULL << type);
 }
+
+bool bpf_token_allow_prog_type(const struct bpf_token *token,
+			       enum bpf_prog_type prog_type,
+			       enum bpf_attach_type attach_type)
+{
+	if (!token || prog_type >= __MAX_BPF_PROG_TYPE || attach_type >= __MAX_BPF_ATTACH_TYPE)
+		return false;
+
+	return (token->allowed_progs & (1ULL << prog_type)) &&
+	       (token->allowed_attachs & (1ULL << attach_type));
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 71fb04a3fe00..3eaf6c00f624 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1028,6 +1028,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
 	BPF_PROG_TYPE_NETFILTER,
+	__MAX_BPF_PROG_TYPE
 };
 
 enum bpf_attach_type {
@@ -1511,6 +1512,10 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		log_true_size;
+		/* BPF token FD to use with BPF_PROG_LOAD operation.
+		 * If provided, prog_flags should have BPF_F_TOKEN_FD flag set.
+		 */
+		__s32		prog_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
index 573249a2814d..4ed46ed58a7b 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
@@ -30,6 +30,8 @@ void test_libbpf_probe_prog_types(void)
 
 		if (prog_type == BPF_PROG_TYPE_UNSPEC)
 			continue;
+		if (strcmp(prog_type_name, "__MAX_BPF_PROG_TYPE") == 0)
+			continue;
 
 		if (!test__start_subtest(prog_type_name))
 			continue;
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
index 1f328c0d8aff..62ea855ec4d0 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
@@ -189,6 +189,9 @@ static void test_libbpf_bpf_prog_type_str(void)
 		const char *prog_type_str;
 		char buf[256];
 
+		if (prog_type == __MAX_BPF_PROG_TYPE)
+			continue;
+
 		prog_type_name = btf__str_by_offset(btf, e->name_off);
 		prog_type_str = libbpf_bpf_prog_type_str(prog_type);
 		ASSERT_OK_PTR(prog_type_str, prog_type_name);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 07/29] bpf: take into account BPF token when fetching helper protos
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 06/29] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 08/29] bpf: consistently use BPF token throughout BPF verifier logic Andrii Nakryiko
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Instead of performing unconditional system-wide bpf_capable() and
perfmon_capable() calls inside bpf_base_func_proto() function (and other
similar ones) to determine eligibility of a given BPF helper for a given
program, use previously recorded BPF token during BPF_PROG_LOAD command
handling to inform the decision.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 drivers/media/rc/bpf-lirc.c |  2 +-
 include/linux/bpf.h         |  5 +++--
 kernel/bpf/cgroup.c         |  6 +++---
 kernel/bpf/helpers.c        |  6 +++---
 kernel/bpf/syscall.c        |  5 +++--
 kernel/trace/bpf_trace.c    |  2 +-
 net/core/filter.c           | 32 ++++++++++++++++----------------
 net/ipv4/bpf_tcp_ca.c       |  2 +-
 net/netfilter/nf_bpf_link.c |  2 +-
 9 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index fe17c7f98e81..6d07693c6b9f 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -110,7 +110,7 @@ lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_get_prandom_u32:
 		return &bpf_get_prandom_u32_proto;
 	case BPF_FUNC_trace_printk:
-		if (perfmon_capable())
+		if (bpf_token_capable(prog->aux->token, CAP_PERFMON))
 			return bpf_get_trace_printk_proto();
 		fallthrough;
 	default:
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4bcdb01c6619..f7c5aa01bb7b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2518,7 +2518,8 @@ const char *btf_find_decl_tag_value(const struct btf *btf, const struct btf_type
 struct bpf_prog *bpf_prog_by_id(u32 id);
 struct bpf_link *bpf_link_by_id(u32 id);
 
-const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
+const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id,
+						 const struct bpf_prog *prog);
 void bpf_task_storage_free(struct task_struct *task);
 void bpf_cgrp_storage_free(struct cgroup *cgroup);
 bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
@@ -2778,7 +2779,7 @@ static inline int btf_struct_access(struct bpf_verifier_log *log,
 }
 
 static inline const struct bpf_func_proto *
-bpf_base_func_proto(enum bpf_func_id func_id)
+bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	return NULL;
 }
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 491d20038cbe..98e0e3835b28 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1630,7 +1630,7 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -2191,7 +2191,7 @@ sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -2348,7 +2348,7 @@ cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index be72824f32b2..07fd4b5704f3 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1679,7 +1679,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak;
 const struct bpf_func_proto bpf_task_pt_regs_proto __weak;
 
 const struct bpf_func_proto *
-bpf_base_func_proto(enum bpf_func_id func_id)
+bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_map_lookup_elem:
@@ -1730,7 +1730,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		break;
 	}
 
-	if (!bpf_capable())
+	if (!bpf_token_capable(prog->aux->token, CAP_BPF))
 		return NULL;
 
 	switch (func_id) {
@@ -1788,7 +1788,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		break;
 	}
 
-	if (!perfmon_capable())
+	if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
 		return NULL;
 
 	switch (func_id) {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 91b2a2dc4fb0..a236a2cb7ac1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5736,7 +5736,7 @@ static const struct bpf_func_proto bpf_sys_bpf_proto = {
 const struct bpf_func_proto * __weak
 tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
-	return bpf_base_func_proto(func_id);
+	return bpf_base_func_proto(func_id, prog);
 }
 
 BPF_CALL_1(bpf_sys_close, u32, fd)
@@ -5786,7 +5786,8 @@ syscall_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_sys_bpf:
-		return !perfmon_capable() ? NULL : &bpf_sys_bpf_proto;
+		return !bpf_token_capable(prog->aux->token, CAP_PERFMON)
+		       ? NULL : &bpf_sys_bpf_proto;
 	case BPF_FUNC_btf_find_by_name_kind:
 		return &bpf_btf_find_by_name_kind_proto;
 	case BPF_FUNC_sys_close:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7ac6c52b25eb..492d60e9c480 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1629,7 +1629,7 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_trace_vprintk:
 		return bpf_get_trace_vprintk_proto();
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 24061f29c9dd..46ab1d9378dd 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -87,7 +87,7 @@
 #include "dev.h"
 
 static const struct bpf_func_proto *
-bpf_sk_base_func_proto(enum bpf_func_id func_id);
+bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
 
 int copy_bpf_fprog_from_user(struct sock_fprog *dst, sockptr_t src, int len)
 {
@@ -7862,7 +7862,7 @@ sock_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -7955,7 +7955,7 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 			return NULL;
 		}
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -7974,7 +7974,7 @@ sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_skb_event_output_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8161,7 +8161,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8220,7 +8220,7 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 
 #if IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)
@@ -8281,7 +8281,7 @@ sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_tcp_sock_proto;
 #endif /* CONFIG_INET */
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8323,7 +8323,7 @@ sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_cgroup_classid_curr_proto;
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8367,7 +8367,7 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_skc_lookup_tcp_proto;
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8378,7 +8378,7 @@ flow_dissector_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_load_bytes:
 		return &bpf_flow_dissector_load_bytes_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8405,7 +8405,7 @@ lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_under_cgroup:
 		return &bpf_skb_under_cgroup_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11236,7 +11236,7 @@ sk_reuseport_func_proto(enum bpf_func_id func_id,
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11418,7 +11418,7 @@ sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_sk_release:
 		return &bpf_sk_release_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11752,7 +11752,7 @@ const struct bpf_func_proto bpf_sock_from_file_proto = {
 };
 
 static const struct bpf_func_proto *
-bpf_sk_base_func_proto(enum bpf_func_id func_id)
+bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	const struct bpf_func_proto *func;
 
@@ -11781,10 +11781,10 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 
-	if (!perfmon_capable())
+	if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
 		return NULL;
 
 	return func;
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index ae8b15e6896f..634cfafa583d 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -191,7 +191,7 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/net/netfilter/nf_bpf_link.c b/net/netfilter/nf_bpf_link.c
index 0e4beae421f8..5257d5e7eb09 100644
--- a/net/netfilter/nf_bpf_link.c
+++ b/net/netfilter/nf_bpf_link.c
@@ -314,7 +314,7 @@ static bool nf_is_valid_access(int off, int size, enum bpf_access_type type,
 static const struct bpf_func_proto *
 bpf_nf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
-	return bpf_base_func_proto(func_id);
+	return bpf_base_func_proto(func_id, prog);
 }
 
 const struct bpf_verifier_ops netfilter_verifier_ops = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 08/29] bpf: consistently use BPF token throughout BPF verifier logic
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 07/29] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 09/29] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks Andrii Nakryiko
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Remove remaining direct queries to perfmon_capable() and bpf_capable()
in BPF verifier logic and instead use BPF token (if available) to make
decisions about privileges.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h    | 16 ++++++++--------
 include/linux/filter.h |  2 +-
 kernel/bpf/arraymap.c  |  2 +-
 kernel/bpf/core.c      |  2 +-
 kernel/bpf/verifier.c  | 13 ++++++-------
 net/core/filter.c      |  4 ++--
 6 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f7c5aa01bb7b..d1023cd67f65 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2231,24 +2231,24 @@ extern int sysctl_unprivileged_bpf_disabled;
 
 bool bpf_token_capable(const struct bpf_token *token, int cap);
 
-static inline bool bpf_allow_ptr_leaks(void)
+static inline bool bpf_allow_ptr_leaks(const struct bpf_token *token)
 {
-	return perfmon_capable();
+	return bpf_token_capable(token, CAP_PERFMON);
 }
 
-static inline bool bpf_allow_uninit_stack(void)
+static inline bool bpf_allow_uninit_stack(const struct bpf_token *token)
 {
-	return perfmon_capable();
+	return bpf_token_capable(token, CAP_PERFMON);
 }
 
-static inline bool bpf_bypass_spec_v1(void)
+static inline bool bpf_bypass_spec_v1(const struct bpf_token *token)
 {
-	return cpu_mitigations_off() || perfmon_capable();
+	return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON);
 }
 
-static inline bool bpf_bypass_spec_v4(void)
+static inline bool bpf_bypass_spec_v4(const struct bpf_token *token)
 {
-	return cpu_mitigations_off() || perfmon_capable();
+	return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON);
 }
 
 int bpf_map_new_fd(struct bpf_map *map, int flags);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 68fb6c8142fe..12d907f17d36 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1139,7 +1139,7 @@ static inline bool bpf_jit_blinding_enabled(struct bpf_prog *prog)
 		return false;
 	if (!bpf_jit_harden)
 		return false;
-	if (bpf_jit_harden == 1 && bpf_capable())
+	if (bpf_jit_harden == 1 && bpf_token_capable(prog->aux->token, CAP_BPF))
 		return false;
 
 	return true;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 0bdbbbeab155..13358675ff2e 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -82,7 +82,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
 	int numa_node = bpf_map_attr_numa_node(attr);
 	u32 elem_size, index_mask, max_entries;
-	bool bypass_spec_v1 = bpf_bypass_spec_v1();
+	bool bypass_spec_v1 = bpf_bypass_spec_v1(NULL);
 	u64 array_size, mask64;
 	struct bpf_array *array;
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 62e21ba90230..14ace23d517b 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -682,7 +682,7 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 {
 	if (!bpf_prog_kallsyms_candidate(fp) ||
-	    !bpf_capable())
+	    !bpf_token_capable(fp->aux->token, CAP_BPF))
 		return;
 
 	bpf_prog_ksym_set_addr(fp);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d4e31f61de0e..edf93a1c2cee 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -20732,7 +20732,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	env->prog = *prog;
 	env->ops = bpf_verifier_ops[env->prog->type];
 	env->fd_array = make_bpfptr(attr->fd_array, uattr.is_kernel);
-	is_priv = bpf_capable();
+
+	env->allow_ptr_leaks = bpf_allow_ptr_leaks(env->prog->aux->token);
+	env->allow_uninit_stack = bpf_allow_uninit_stack(env->prog->aux->token);
+	env->bypass_spec_v1 = bpf_bypass_spec_v1(env->prog->aux->token);
+	env->bypass_spec_v4 = bpf_bypass_spec_v4(env->prog->aux->token);
+	env->bpf_capable = is_priv = bpf_token_capable(env->prog->aux->token, CAP_BPF);
 
 	bpf_get_btf_vmlinux();
 
@@ -20764,12 +20769,6 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	if (attr->prog_flags & BPF_F_ANY_ALIGNMENT)
 		env->strict_alignment = false;
 
-	env->allow_ptr_leaks = bpf_allow_ptr_leaks();
-	env->allow_uninit_stack = bpf_allow_uninit_stack();
-	env->bypass_spec_v1 = bpf_bypass_spec_v1();
-	env->bypass_spec_v4 = bpf_bypass_spec_v4();
-	env->bpf_capable = bpf_capable();
-
 	if (is_priv)
 		env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ;
 	env->test_reg_invariants = attr->prog_flags & BPF_F_TEST_REG_INVARIANTS;
diff --git a/net/core/filter.c b/net/core/filter.c
index 46ab1d9378dd..3cc52b82bab8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8580,7 +8580,7 @@ static bool cg_skb_is_valid_access(int off, int size,
 		return false;
 	case bpf_ctx_range(struct __sk_buff, data):
 	case bpf_ctx_range(struct __sk_buff, data_end):
-		if (!bpf_capable())
+		if (!bpf_token_capable(prog->aux->token, CAP_BPF))
 			return false;
 		break;
 	}
@@ -8592,7 +8592,7 @@ static bool cg_skb_is_valid_access(int off, int size,
 		case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
 			break;
 		case bpf_ctx_range(struct __sk_buff, tstamp):
-			if (!bpf_capable())
+			if (!bpf_token_capable(prog->aux->token, CAP_BPF))
 				return false;
 			break;
 		default:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 09/29] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 08/29] bpf: consistently use BPF token throughout BPF verifier logic Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 10/29] bpf,lsm: refactor bpf_map_alloc/bpf_map_free " Andrii Nakryiko
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Based on upstream discussion ([0]), rework existing
bpf_prog_alloc_security LSM hook. Rename it to bpf_prog_load and instead
of passing bpf_prog_aux, pass proper bpf_prog pointer for a full BPF
program struct. Also, we pass bpf_attr union with all the user-provided
arguments for BPF_PROG_LOAD command.  This will give LSMs as much
information as we can basically provide.

The hook is also BPF token-aware now, and optional bpf_token struct is
passed as a third argument. bpf_prog_load LSM hook is called after
a bunch of sanity checks were performed, bpf_prog and bpf_prog_aux were
allocated and filled out, but right before performing full-fledged BPF
verification step.

bpf_prog_free LSM hook is now accepting struct bpf_prog argument, for
consistency. SELinux code is adjusted to all new names, types, and
signatures.

Note, given that bpf_prog_load (previously bpf_prog_alloc) hook can be
used by some LSMs to allocate extra security blob, but also by other
LSMs to reject BPF program loading, we need to make sure that
bpf_prog_free LSM hook is called after bpf_prog_load/bpf_prog_alloc one
*even* if the hook itself returned error. If we don't do that, we run
the risk of leaking memory. This seems to be possible today when
combining SELinux and BPF LSM, as one example, depending on their
relative ordering.

Also, for BPF LSM setup, add bpf_prog_load and bpf_prog_free to
sleepable LSM hooks list, as they are both executed in sleepable
context. Also drop bpf_prog_load hook from untrusted, as there is no
issue with refcount or anything else anymore, that originally forced us
to add it to untrusted list in c0c852dd1876 ("bpf: Do not mark certain LSM
hook arguments as trusted"). We now trigger this hook much later and it
should not be an issue anymore.

  [0] https://lore.kernel.org/bpf/9fe88aef7deabbe87d3fc38c4aea3c69.paul@paul-moore.com/

Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/lsm_hook_defs.h |  5 +++--
 include/linux/security.h      | 12 +++++++-----
 kernel/bpf/bpf_lsm.c          |  5 +++--
 kernel/bpf/syscall.c          | 25 +++++++++++++------------
 security/security.c           | 25 +++++++++++++++----------
 security/selinux/hooks.c      | 15 ++++++++-------
 6 files changed, 49 insertions(+), 38 deletions(-)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index ff217a5ce552..41ec4a7c070e 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -400,8 +400,9 @@ LSM_HOOK(int, 0, bpf_map, struct bpf_map *map, fmode_t fmode)
 LSM_HOOK(int, 0, bpf_prog, struct bpf_prog *prog)
 LSM_HOOK(int, 0, bpf_map_alloc_security, struct bpf_map *map)
 LSM_HOOK(void, LSM_RET_VOID, bpf_map_free_security, struct bpf_map *map)
-LSM_HOOK(int, 0, bpf_prog_alloc_security, struct bpf_prog_aux *aux)
-LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free_security, struct bpf_prog_aux *aux)
+LSM_HOOK(int, 0, bpf_prog_load, struct bpf_prog *prog, union bpf_attr *attr,
+	 struct bpf_token *token)
+LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free, struct bpf_prog *prog)
 #endif /* CONFIG_BPF_SYSCALL */
 
 LSM_HOOK(int, 0, locked_down, enum lockdown_reason what)
diff --git a/include/linux/security.h b/include/linux/security.h
index 1d1df326c881..65467eef6678 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -2020,15 +2020,16 @@ static inline void securityfs_remove(struct dentry *dentry)
 union bpf_attr;
 struct bpf_map;
 struct bpf_prog;
-struct bpf_prog_aux;
+struct bpf_token;
 #ifdef CONFIG_SECURITY
 extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
 extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
 extern int security_bpf_prog(struct bpf_prog *prog);
 extern int security_bpf_map_alloc(struct bpf_map *map);
 extern void security_bpf_map_free(struct bpf_map *map);
-extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
-extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
+extern int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
+				  struct bpf_token *token);
+extern void security_bpf_prog_free(struct bpf_prog *prog);
 #else
 static inline int security_bpf(int cmd, union bpf_attr *attr,
 					     unsigned int size)
@@ -2054,12 +2055,13 @@ static inline int security_bpf_map_alloc(struct bpf_map *map)
 static inline void security_bpf_map_free(struct bpf_map *map)
 { }
 
-static inline int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
+static inline int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
+					 struct bpf_token *token)
 {
 	return 0;
 }
 
-static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
+static inline void security_bpf_prog_free(struct bpf_prog *prog)
 { }
 #endif /* CONFIG_SECURITY */
 #endif /* CONFIG_BPF_SYSCALL */
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index e8e910395bf6..7ee0dd011de4 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -263,6 +263,8 @@ BTF_ID(func, bpf_lsm_bpf_map)
 BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
 BTF_ID(func, bpf_lsm_bpf_map_free_security)
 BTF_ID(func, bpf_lsm_bpf_prog)
+BTF_ID(func, bpf_lsm_bpf_prog_load)
+BTF_ID(func, bpf_lsm_bpf_prog_free)
 BTF_ID(func, bpf_lsm_bprm_check_security)
 BTF_ID(func, bpf_lsm_bprm_committed_creds)
 BTF_ID(func, bpf_lsm_bprm_committing_creds)
@@ -358,8 +360,7 @@ BTF_SET_END(sleepable_lsm_hooks)
 
 BTF_SET_START(untrusted_lsm_hooks)
 BTF_ID(func, bpf_lsm_bpf_map_free_security)
-BTF_ID(func, bpf_lsm_bpf_prog_alloc_security)
-BTF_ID(func, bpf_lsm_bpf_prog_free_security)
+BTF_ID(func, bpf_lsm_bpf_prog_free)
 BTF_ID(func, bpf_lsm_file_alloc_security)
 BTF_ID(func, bpf_lsm_file_free_security)
 #ifdef CONFIG_SECURITY_NETWORK
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a236a2cb7ac1..19a0d5dd4d7e 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2180,7 +2180,7 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 	kvfree(aux->func_info);
 	kfree(aux->func_info_aux);
 	free_uid(aux->user);
-	security_bpf_prog_free(aux);
+	security_bpf_prog_free(aux->prog);
 	bpf_prog_free(aux->prog);
 }
 
@@ -2772,10 +2772,6 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	prog->aux->token = token;
 	token = NULL;
 
-	err = security_bpf_prog_alloc(prog->aux);
-	if (err)
-		goto free_prog;
-
 	prog->aux->user = get_current_user();
 	prog->len = attr->insn_cnt;
 
@@ -2783,12 +2779,12 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	if (copy_from_bpfptr(prog->insns,
 			     make_bpfptr(attr->insns, uattr.is_kernel),
 			     bpf_prog_insn_size(prog)) != 0)
-		goto free_prog_sec;
+		goto free_prog;
 	/* copy eBPF program license from user space */
 	if (strncpy_from_bpfptr(license,
 				make_bpfptr(attr->license, uattr.is_kernel),
 				sizeof(license) - 1) < 0)
-		goto free_prog_sec;
+		goto free_prog;
 	license[sizeof(license) - 1] = 0;
 
 	/* eBPF programs must be GPL compatible to use GPL-ed functions */
@@ -2802,25 +2798,29 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	if (bpf_prog_is_dev_bound(prog->aux)) {
 		err = bpf_prog_dev_bound_init(prog, attr);
 		if (err)
-			goto free_prog_sec;
+			goto free_prog;
 	}
 
 	if (type == BPF_PROG_TYPE_EXT && dst_prog &&
 	    bpf_prog_is_dev_bound(dst_prog->aux)) {
 		err = bpf_prog_dev_bound_inherit(prog, dst_prog);
 		if (err)
-			goto free_prog_sec;
+			goto free_prog;
 	}
 
 	/* find program type: socket_filter vs tracing_filter */
 	err = find_prog_type(type, prog);
 	if (err < 0)
-		goto free_prog_sec;
+		goto free_prog;
 
 	prog->aux->load_time = ktime_get_boottime_ns();
 	err = bpf_obj_name_cpy(prog->aux->name, attr->prog_name,
 			       sizeof(attr->prog_name));
 	if (err < 0)
+		goto free_prog;
+
+	err = security_bpf_prog_load(prog, attr, token);
+	if (err)
 		goto free_prog_sec;
 
 	/* run eBPF verifier */
@@ -2866,10 +2866,11 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	 */
 	__bpf_prog_put_noref(prog, prog->aux->real_func_cnt);
 	return err;
+
 free_prog_sec:
-	free_uid(prog->aux->user);
-	security_bpf_prog_free(prog->aux);
+	security_bpf_prog_free(prog);
 free_prog:
+	free_uid(prog->aux->user);
 	if (prog->aux->attach_btf)
 		btf_put(prog->aux->attach_btf);
 	bpf_prog_free(prog);
diff --git a/security/security.c b/security/security.c
index dcb3e7014f9b..c8a1c66cfaad 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5180,16 +5180,21 @@ int security_bpf_map_alloc(struct bpf_map *map)
 }
 
 /**
- * security_bpf_prog_alloc() - Allocate a bpf program LSM blob
- * @aux: bpf program aux info struct
+ * security_bpf_prog_load() - Check if loading of BPF program is allowed
+ * @prog: BPF program object
+ * @attr: BPF syscall attributes used to create BPF program
+ * @token: BPF token used to grant user access to BPF subsystem
  *
- * Initialize the security field inside bpf program.
+ * Perform an access control check when the kernel loads a BPF program and
+ * allocates associated BPF program object. This hook is also responsible for
+ * allocating any required LSM state for the BPF program.
  *
  * Return: Returns 0 on success, error on failure.
  */
-int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
+int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
+			   struct bpf_token *token)
 {
-	return call_int_hook(bpf_prog_alloc_security, 0, aux);
+	return call_int_hook(bpf_prog_load, 0, prog, attr, token);
 }
 
 /**
@@ -5204,14 +5209,14 @@ void security_bpf_map_free(struct bpf_map *map)
 }
 
 /**
- * security_bpf_prog_free() - Free a bpf program's LSM blob
- * @aux: bpf program aux info struct
+ * security_bpf_prog_free() - Free a BPF program's LSM blob
+ * @prog: BPF program struct
  *
- * Clean up the security information stored inside bpf prog.
+ * Clean up the security information stored inside BPF program.
  */
-void security_bpf_prog_free(struct bpf_prog_aux *aux)
+void security_bpf_prog_free(struct bpf_prog *prog)
 {
-	call_void_hook(bpf_prog_free_security, aux);
+	call_void_hook(bpf_prog_free, prog);
 }
 #endif /* CONFIG_BPF_SYSCALL */
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 340b2bbbb2dd..c2de56ca5ea5 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6799,7 +6799,8 @@ static void selinux_bpf_map_free(struct bpf_map *map)
 	kfree(bpfsec);
 }
 
-static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
+static int selinux_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
+				 struct bpf_token *token)
 {
 	struct bpf_security_struct *bpfsec;
 
@@ -6808,16 +6809,16 @@ static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
 		return -ENOMEM;
 
 	bpfsec->sid = current_sid();
-	aux->security = bpfsec;
+	prog->aux->security = bpfsec;
 
 	return 0;
 }
 
-static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
+static void selinux_bpf_prog_free(struct bpf_prog *prog)
 {
-	struct bpf_security_struct *bpfsec = aux->security;
+	struct bpf_security_struct *bpfsec = prog->aux->security;
 
-	aux->security = NULL;
+	prog->aux->security = NULL;
 	kfree(bpfsec);
 }
 #endif
@@ -7174,7 +7175,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
 	LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
 	LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
-	LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free),
+	LSM_HOOK_INIT(bpf_prog_free, selinux_bpf_prog_free),
 #endif
 
 #ifdef CONFIG_PERF_EVENTS
@@ -7232,7 +7233,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 #endif
 #ifdef CONFIG_BPF_SYSCALL
 	LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc),
-	LSM_HOOK_INIT(bpf_prog_alloc_security, selinux_bpf_prog_alloc),
+	LSM_HOOK_INIT(bpf_prog_load, selinux_bpf_prog_load),
 #endif
 #ifdef CONFIG_PERF_EVENTS
 	LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 10/29] bpf,lsm: refactor bpf_map_alloc/bpf_map_free LSM hooks
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (8 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 09/29] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 11/29] bpf,lsm: add BPF token " Andrii Nakryiko
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Similarly to bpf_prog_alloc LSM hook, rename and extend bpf_map_alloc
hook into bpf_map_create, taking not just struct bpf_map, but also
bpf_attr and bpf_token, to give a fuller context to LSMs.

Unlike bpf_prog_alloc, there is no need to move the hook around, as it
currently is firing right before allocating BPF map ID and FD, which
seems to be a sweet spot.

But like bpf_prog_alloc/bpf_prog_free combo, make sure that bpf_map_free
LSM hook is called even if bpf_map_create hook returned error, as if few
LSMs are combined together it could be that one LSM successfully
allocated security blob for its needs, while subsequent LSM rejected BPF
map creation. The former LSM would still need to free up LSM blob, so we
need to ensure security_bpf_map_free() is called regardless of the
outcome.

Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/lsm_hook_defs.h |  5 +++--
 include/linux/security.h      |  6 ++++--
 kernel/bpf/bpf_lsm.c          |  6 +++---
 kernel/bpf/syscall.c          |  4 ++--
 security/security.c           | 16 ++++++++++------
 security/selinux/hooks.c      |  7 ++++---
 6 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 41ec4a7c070e..adb25cc63ce3 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -398,8 +398,9 @@ LSM_HOOK(void, LSM_RET_VOID, audit_rule_free, void *lsmrule)
 LSM_HOOK(int, 0, bpf, int cmd, union bpf_attr *attr, unsigned int size)
 LSM_HOOK(int, 0, bpf_map, struct bpf_map *map, fmode_t fmode)
 LSM_HOOK(int, 0, bpf_prog, struct bpf_prog *prog)
-LSM_HOOK(int, 0, bpf_map_alloc_security, struct bpf_map *map)
-LSM_HOOK(void, LSM_RET_VOID, bpf_map_free_security, struct bpf_map *map)
+LSM_HOOK(int, 0, bpf_map_create, struct bpf_map *map, union bpf_attr *attr,
+	 struct bpf_token *token)
+LSM_HOOK(void, LSM_RET_VOID, bpf_map_free, struct bpf_map *map)
 LSM_HOOK(int, 0, bpf_prog_load, struct bpf_prog *prog, union bpf_attr *attr,
 	 struct bpf_token *token)
 LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free, struct bpf_prog *prog)
diff --git a/include/linux/security.h b/include/linux/security.h
index 65467eef6678..08fd777cbe94 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -2025,7 +2025,8 @@ struct bpf_token;
 extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
 extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
 extern int security_bpf_prog(struct bpf_prog *prog);
-extern int security_bpf_map_alloc(struct bpf_map *map);
+extern int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
+				   struct bpf_token *token);
 extern void security_bpf_map_free(struct bpf_map *map);
 extern int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
 				  struct bpf_token *token);
@@ -2047,7 +2048,8 @@ static inline int security_bpf_prog(struct bpf_prog *prog)
 	return 0;
 }
 
-static inline int security_bpf_map_alloc(struct bpf_map *map)
+static inline int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
+					  struct bpf_token *token)
 {
 	return 0;
 }
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 7ee0dd011de4..76976908b302 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -260,8 +260,8 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 BTF_SET_START(sleepable_lsm_hooks)
 BTF_ID(func, bpf_lsm_bpf)
 BTF_ID(func, bpf_lsm_bpf_map)
-BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
-BTF_ID(func, bpf_lsm_bpf_map_free_security)
+BTF_ID(func, bpf_lsm_bpf_map_create)
+BTF_ID(func, bpf_lsm_bpf_map_free)
 BTF_ID(func, bpf_lsm_bpf_prog)
 BTF_ID(func, bpf_lsm_bpf_prog_load)
 BTF_ID(func, bpf_lsm_bpf_prog_free)
@@ -359,7 +359,7 @@ BTF_ID(func, bpf_lsm_userns_create)
 BTF_SET_END(sleepable_lsm_hooks)
 
 BTF_SET_START(untrusted_lsm_hooks)
-BTF_ID(func, bpf_lsm_bpf_map_free_security)
+BTF_ID(func, bpf_lsm_bpf_map_free)
 BTF_ID(func, bpf_lsm_bpf_prog_free)
 BTF_ID(func, bpf_lsm_file_alloc_security)
 BTF_ID(func, bpf_lsm_file_free_security)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 19a0d5dd4d7e..d6337842006d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1318,9 +1318,9 @@ static int map_create(union bpf_attr *attr)
 			attr->btf_vmlinux_value_type_id;
 	}
 
-	err = security_bpf_map_alloc(map);
+	err = security_bpf_map_create(map, attr, token);
 	if (err)
-		goto free_map;
+		goto free_map_sec;
 
 	err = bpf_map_alloc_id(map);
 	if (err)
diff --git a/security/security.c b/security/security.c
index c8a1c66cfaad..ad24cf36da94 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5167,16 +5167,20 @@ int security_bpf_prog(struct bpf_prog *prog)
 }
 
 /**
- * security_bpf_map_alloc() - Allocate a bpf map LSM blob
- * @map: bpf map
+ * security_bpf_map_create() - Check if BPF map creation is allowed
+ * @map: BPF map object
+ * @attr: BPF syscall attributes used to create BPF map
+ * @token: BPF token used to grant user access
  *
- * Initialize the security field inside bpf map.
+ * Do a check when the kernel creates a new BPF map. This is also the
+ * point where LSM blob is allocated for LSMs that need them.
  *
  * Return: Returns 0 on success, error on failure.
  */
-int security_bpf_map_alloc(struct bpf_map *map)
+int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
+			    struct bpf_token *token)
 {
-	return call_int_hook(bpf_map_alloc_security, 0, map);
+	return call_int_hook(bpf_map_create, 0, map, attr, token);
 }
 
 /**
@@ -5205,7 +5209,7 @@ int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
  */
 void security_bpf_map_free(struct bpf_map *map)
 {
-	call_void_hook(bpf_map_free_security, map);
+	call_void_hook(bpf_map_free, map);
 }
 
 /**
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c2de56ca5ea5..c4ba3f0fcb97 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6777,7 +6777,8 @@ static int selinux_bpf_prog(struct bpf_prog *prog)
 			    BPF__PROG_RUN, NULL);
 }
 
-static int selinux_bpf_map_alloc(struct bpf_map *map)
+static int selinux_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
+				  struct bpf_token *token)
 {
 	struct bpf_security_struct *bpfsec;
 
@@ -7174,7 +7175,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(bpf, selinux_bpf),
 	LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
 	LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
-	LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
+	LSM_HOOK_INIT(bpf_map_free, selinux_bpf_map_free),
 	LSM_HOOK_INIT(bpf_prog_free, selinux_bpf_prog_free),
 #endif
 
@@ -7232,7 +7233,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(audit_rule_init, selinux_audit_rule_init),
 #endif
 #ifdef CONFIG_BPF_SYSCALL
-	LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc),
+	LSM_HOOK_INIT(bpf_map_create, selinux_bpf_map_create),
 	LSM_HOOK_INIT(bpf_prog_load, selinux_bpf_prog_load),
 #endif
 #ifdef CONFIG_PERF_EVENTS
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 11/29] bpf,lsm: add BPF token LSM hooks
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (9 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 10/29] bpf,lsm: refactor bpf_map_alloc/bpf_map_free " Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 12/29] libbpf: add bpf_token_create() API Andrii Nakryiko
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Wire up bpf_token_create and bpf_token_free LSM hooks, which allow to
allocate LSM security blob (we add `void *security` field to struct
bpf_token for that), but also control who can instantiate BPF token.
This follows existing pattern for BPF map and BPF prog.

Also add security_bpf_token_allow_cmd() and security_bpf_token_capable()
LSM hooks that allow LSM implementation to control and negate (if
necessary) BPF token's delegation of a specific bpf_cmd and capability,
respectively.

Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h           |  3 ++
 include/linux/lsm_hook_defs.h |  5 +++
 include/linux/security.h      | 25 +++++++++++++++
 kernel/bpf/bpf_lsm.c          |  4 +++
 kernel/bpf/token.c            | 18 +++++++----
 security/security.c           | 60 +++++++++++++++++++++++++++++++++++
 6 files changed, 109 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d1023cd67f65..778d07bb7240 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1633,6 +1633,9 @@ struct bpf_token {
 	u64 allowed_maps;
 	u64 allowed_progs;
 	u64 allowed_attachs;
+#ifdef CONFIG_SECURITY
+	void *security;
+#endif
 };
 
 struct bpf_struct_ops_value;
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index adb25cc63ce3..3fdd00b452ac 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -404,6 +404,11 @@ LSM_HOOK(void, LSM_RET_VOID, bpf_map_free, struct bpf_map *map)
 LSM_HOOK(int, 0, bpf_prog_load, struct bpf_prog *prog, union bpf_attr *attr,
 	 struct bpf_token *token)
 LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free, struct bpf_prog *prog)
+LSM_HOOK(int, 0, bpf_token_create, struct bpf_token *token, union bpf_attr *attr,
+	 struct path *path)
+LSM_HOOK(void, LSM_RET_VOID, bpf_token_free, struct bpf_token *token)
+LSM_HOOK(int, 0, bpf_token_cmd, const struct bpf_token *token, enum bpf_cmd cmd)
+LSM_HOOK(int, 0, bpf_token_capable, const struct bpf_token *token, int cap)
 #endif /* CONFIG_BPF_SYSCALL */
 
 LSM_HOOK(int, 0, locked_down, enum lockdown_reason what)
diff --git a/include/linux/security.h b/include/linux/security.h
index 08fd777cbe94..00809d2d5c38 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -32,6 +32,7 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/sockptr.h>
+#include <linux/bpf.h>
 
 struct linux_binprm;
 struct cred;
@@ -2031,6 +2032,11 @@ extern void security_bpf_map_free(struct bpf_map *map);
 extern int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
 				  struct bpf_token *token);
 extern void security_bpf_prog_free(struct bpf_prog *prog);
+extern int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
+				     struct path *path);
+extern void security_bpf_token_free(struct bpf_token *token);
+extern int security_bpf_token_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
+extern int security_bpf_token_capable(const struct bpf_token *token, int cap);
 #else
 static inline int security_bpf(int cmd, union bpf_attr *attr,
 					     unsigned int size)
@@ -2065,6 +2071,25 @@ static inline int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *
 
 static inline void security_bpf_prog_free(struct bpf_prog *prog)
 { }
+
+static inline int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
+				     struct path *path)
+{
+	return 0;
+}
+
+static inline void security_bpf_token_free(struct bpf_token *token)
+{ }
+
+static inline int security_bpf_token_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
+{
+	return 0;
+}
+
+static inline int security_bpf_token_capable(const struct bpf_token *token, int cap)
+{
+	return 0;
+}
 #endif /* CONFIG_SECURITY */
 #endif /* CONFIG_BPF_SYSCALL */
 
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 76976908b302..63b4dc495125 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -265,6 +265,10 @@ BTF_ID(func, bpf_lsm_bpf_map_free)
 BTF_ID(func, bpf_lsm_bpf_prog)
 BTF_ID(func, bpf_lsm_bpf_prog_load)
 BTF_ID(func, bpf_lsm_bpf_prog_free)
+BTF_ID(func, bpf_lsm_bpf_token_create)
+BTF_ID(func, bpf_lsm_bpf_token_free)
+BTF_ID(func, bpf_lsm_bpf_token_cmd)
+BTF_ID(func, bpf_lsm_bpf_token_capable)
 BTF_ID(func, bpf_lsm_bprm_check_security)
 BTF_ID(func, bpf_lsm_bprm_committed_creds)
 BTF_ID(func, bpf_lsm_bprm_committing_creds)
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index 5a51e6b8f6bf..17212efcde60 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -7,6 +7,7 @@
 #include <linux/idr.h>
 #include <linux/namei.h>
 #include <linux/user_namespace.h>
+#include <linux/security.h>
 
 bool bpf_token_capable(const struct bpf_token *token, int cap)
 {
@@ -14,10 +15,9 @@ bool bpf_token_capable(const struct bpf_token *token, int cap)
 	 * token's userns is *exactly* the same as current user's userns
 	 */
 	if (token && current_user_ns() == token->userns) {
-		if (ns_capable(token->userns, cap))
-			return true;
-		if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
-			return true;
+		if (ns_capable(token->userns, cap) ||
+		    (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN)))
+			return security_bpf_token_capable(token, cap) == 0;
 	}
 	/* otherwise fallback to capable() checks */
 	return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
@@ -30,6 +30,7 @@ void bpf_token_inc(struct bpf_token *token)
 
 static void bpf_token_free(struct bpf_token *token)
 {
+	security_bpf_token_free(token);
 	put_user_ns(token->userns);
 	kvfree(token);
 }
@@ -186,6 +187,10 @@ int bpf_token_create(union bpf_attr *attr)
 	token->allowed_progs = mnt_opts->delegate_progs;
 	token->allowed_attachs = mnt_opts->delegate_attachs;
 
+	err = security_bpf_token_create(token, attr, &path);
+	if (err)
+		goto out_token;
+
 	fd = get_unused_fd_flags(O_CLOEXEC);
 	if (fd < 0) {
 		err = fd;
@@ -233,8 +238,9 @@ bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
 	 */
 	if (!token || current_user_ns() != token->userns)
 		return false;
-
-	return token->allowed_cmds & (1ULL << cmd);
+	if (!(token->allowed_cmds & (1ULL << cmd)))
+		return false;
+	return security_bpf_token_cmd(token, cmd) == 0;
 }
 
 bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type)
diff --git a/security/security.c b/security/security.c
index ad24cf36da94..088a79c35c26 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5201,6 +5201,55 @@ int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
 	return call_int_hook(bpf_prog_load, 0, prog, attr, token);
 }
 
+/**
+ * security_bpf_token_create() - Check if creating of BPF token is allowed
+ * @token: BPF token object
+ * @attr: BPF syscall attributes used to create BPF token
+ * @path: path pointing to BPF FS mount point from which BPF token is created
+ *
+ * Do a check when the kernel instantiates a new BPF token object from BPF FS
+ * instance. This is also the point where LSM blob can be allocated for LSMs.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
+			      struct path *path)
+{
+	return call_int_hook(bpf_token_create, 0, token, attr, path);
+}
+
+/**
+ * security_bpf_token_cmd() - Check if BPF token is allowed to delegate
+ * requested BPF syscall command
+ * @token: BPF token object
+ * @cmd: BPF syscall command requested to be delegated by BPF token
+ *
+ * Do a check when the kernel decides whether provided BPF token should allow
+ * delegation of requested BPF syscall command.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_bpf_token_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
+{
+	return call_int_hook(bpf_token_cmd, 0, token, cmd);
+}
+
+/**
+ * security_bpf_token_capable() - Check if BPF token is allowed to delegate
+ * requested BPF-related capability
+ * @token: BPF token object
+ * @cap: capabilities requested to be delegated by BPF token
+ *
+ * Do a check when the kernel decides whether provided BPF token should allow
+ * delegation of requested BPF-related capabilities.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_bpf_token_capable(const struct bpf_token *token, int cap)
+{
+	return call_int_hook(bpf_token_capable, 0, token, cap);
+}
+
 /**
  * security_bpf_map_free() - Free a bpf map's LSM blob
  * @map: bpf map
@@ -5222,6 +5271,17 @@ void security_bpf_prog_free(struct bpf_prog *prog)
 {
 	call_void_hook(bpf_prog_free, prog);
 }
+
+/**
+ * security_bpf_token_free() - Free a BPF token's LSM blob
+ * @token: BPF token struct
+ *
+ * Clean up the security information stored inside BPF token.
+ */
+void security_bpf_token_free(struct bpf_token *token)
+{
+	call_void_hook(bpf_token_free, token);
+}
 #endif /* CONFIG_BPF_SYSCALL */
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 12/29] libbpf: add bpf_token_create() API
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (10 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 11/29] bpf,lsm: add BPF token " Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 13/29] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Add low-level wrapper API for BPF_TOKEN_CREATE command in bpf() syscall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c      | 17 +++++++++++++++++
 tools/lib/bpf/bpf.h      | 24 ++++++++++++++++++++++++
 tools/lib/bpf/libbpf.map |  1 +
 3 files changed, 42 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 9dc9625651dc..d4019928a864 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -1287,3 +1287,20 @@ int bpf_prog_bind_map(int prog_fd, int map_fd,
 	ret = sys_bpf(BPF_PROG_BIND_MAP, &attr, attr_sz);
 	return libbpf_err_errno(ret);
 }
+
+int bpf_token_create(int bpffs_fd, struct bpf_token_create_opts *opts)
+{
+	const size_t attr_sz = offsetofend(union bpf_attr, token_create);
+	union bpf_attr attr;
+	int fd;
+
+	if (!OPTS_VALID(opts, bpf_token_create_opts))
+		return libbpf_err(-EINVAL);
+
+	memset(&attr, 0, attr_sz);
+	attr.token_create.bpffs_fd = bpffs_fd;
+	attr.token_create.flags = OPTS_GET(opts, flags, 0);
+
+	fd = sys_bpf_fd(BPF_TOKEN_CREATE, &attr, attr_sz);
+	return libbpf_err_errno(fd);
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index d0f53772bdc0..e49254c9f68f 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -640,6 +640,30 @@ struct bpf_test_run_opts {
 LIBBPF_API int bpf_prog_test_run_opts(int prog_fd,
 				      struct bpf_test_run_opts *opts);
 
+struct bpf_token_create_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 flags;
+	size_t :0;
+};
+#define bpf_token_create_opts__last_field flags
+
+/**
+ * @brief **bpf_token_create()** creates a new instance of BPF token derived
+ * from specified BPF FS mount point.
+ *
+ * BPF token created with this API can be passed to bpf() syscall for
+ * commands like BPF_PROG_LOAD, BPF_MAP_CREATE, etc.
+ *
+ * @param bpffs_fd FD for BPF FS instance from which to derive a BPF token
+ * instance.
+ * @param opts optional BPF token creation options, can be NULL
+ *
+ * @return BPF token FD > 0, on success; negative error code, otherwise (errno
+ * is also set to the error code)
+ */
+LIBBPF_API int bpf_token_create(int bpffs_fd,
+				struct bpf_token_create_opts *opts);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 91c5aef7dae7..d9e1f57534fa 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -411,4 +411,5 @@ LIBBPF_1.3.0 {
 } LIBBPF_1.2.0;
 
 LIBBPF_1.4.0 {
+		bpf_token_create;
 } LIBBPF_1.3.0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 13/29] libbpf: add BPF token support to bpf_map_create() API
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (11 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 12/29] libbpf: add bpf_token_create() API Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-04 19:04   ` Linus Torvalds
  2024-01-03 22:20 ` [PATCH bpf-next 14/29] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
                   ` (16 subsequent siblings)
  29 siblings, 1 reply; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Add ability to provide token_fd for BPF_MAP_CREATE command through
bpf_map_create() API.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 4 +++-
 tools/lib/bpf/bpf.h | 5 ++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index d4019928a864..1653b64b7015 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -169,7 +169,7 @@ int bpf_map_create(enum bpf_map_type map_type,
 		   __u32 max_entries,
 		   const struct bpf_map_create_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, map_extra);
+	const size_t attr_sz = offsetofend(union bpf_attr, map_token_fd);
 	union bpf_attr attr;
 	int fd;
 
@@ -198,6 +198,8 @@ int bpf_map_create(enum bpf_map_type map_type,
 	attr.numa_node = OPTS_GET(opts, numa_node, 0);
 	attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0);
 
+	attr.map_token_fd = OPTS_GET(opts, token_fd, 0);
+
 	fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
 	return libbpf_err_errno(fd);
 }
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index e49254c9f68f..ae2136f596b4 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -51,8 +51,11 @@ struct bpf_map_create_opts {
 
 	__u32 numa_node;
 	__u32 map_ifindex;
+
+	__u32 token_fd;
+	size_t :0;
 };
-#define bpf_map_create_opts__last_field map_ifindex
+#define bpf_map_create_opts__last_field token_fd
 
 LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
 			      const char *map_name,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 14/29] libbpf: add BPF token support to bpf_btf_load() API
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (12 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 13/29] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 15/29] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Allow user to specify token_fd for bpf_btf_load() API that wraps
kernel's BPF_BTF_LOAD command. This allows loading BTF from unprivileged
process as long as it has BPF token allowing BPF_BTF_LOAD command, which
can be created and delegated by privileged process.

Wire through new btf_flags as well, so that user can provide
BPF_F_TOKEN_FD flag, if necessary.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 6 +++++-
 tools/lib/bpf/bpf.h | 5 ++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 1653b64b7015..cf250cb1d5ef 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -1184,7 +1184,7 @@ int bpf_raw_tracepoint_open(const char *name, int prog_fd)
 
 int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, btf_log_true_size);
+	const size_t attr_sz = offsetofend(union bpf_attr, btf_token_fd);
 	union bpf_attr attr;
 	char *log_buf;
 	size_t log_size;
@@ -1209,6 +1209,10 @@ int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts
 
 	attr.btf = ptr_to_u64(btf_data);
 	attr.btf_size = btf_size;
+
+	attr.btf_flags = OPTS_GET(opts, btf_flags, 0);
+	attr.btf_token_fd = OPTS_GET(opts, token_fd, 0);
+
 	/* log_level == 0 and log_buf != NULL means "try loading without
 	 * log_buf, but retry with log_buf and log_level=1 on error", which is
 	 * consistent across low-level and high-level BTF and program loading
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index ae2136f596b4..fde54ea08e6f 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -133,9 +133,12 @@ struct bpf_btf_load_opts {
 	 * If kernel doesn't support this feature, log_size is left unchanged.
 	 */
 	__u32 log_true_size;
+
+	__u32 btf_flags;
+	__u32 token_fd;
 	size_t :0;
 };
-#define bpf_btf_load_opts__last_field log_true_size
+#define bpf_btf_load_opts__last_field token_fd
 
 LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size,
 			    struct bpf_btf_load_opts *opts);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 15/29] libbpf: add BPF token support to bpf_prog_load() API
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (13 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 14/29] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 16/29] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Wire through token_fd into bpf_prog_load().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 3 ++-
 tools/lib/bpf/bpf.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index cf250cb1d5ef..d69137459abf 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -234,7 +234,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 		  const struct bpf_insn *insns, size_t insn_cnt,
 		  struct bpf_prog_load_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, log_true_size);
+	const size_t attr_sz = offsetofend(union bpf_attr, prog_token_fd);
 	void *finfo = NULL, *linfo = NULL;
 	const char *func_info, *line_info;
 	__u32 log_size, log_level, attach_prog_fd, attach_btf_obj_fd;
@@ -263,6 +263,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 	attr.prog_flags = OPTS_GET(opts, prog_flags, 0);
 	attr.prog_ifindex = OPTS_GET(opts, prog_ifindex, 0);
 	attr.kern_version = OPTS_GET(opts, kern_version, 0);
+	attr.prog_token_fd = OPTS_GET(opts, token_fd, 0);
 
 	if (prog_name && kernel_supports(NULL, FEAT_PROG_NAME))
 		libbpf_strlcpy(attr.prog_name, prog_name, sizeof(attr.prog_name));
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index fde54ea08e6f..5c7439991f57 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -105,9 +105,10 @@ struct bpf_prog_load_opts {
 	 * If kernel doesn't support this feature, log_size is left unchanged.
 	 */
 	__u32 log_true_size;
+	__u32 token_fd;
 	size_t :0;
 };
-#define bpf_prog_load_opts__last_field log_true_size
+#define bpf_prog_load_opts__last_field token_fd
 
 LIBBPF_API int bpf_prog_load(enum bpf_prog_type prog_type,
 			     const char *prog_name, const char *license,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 16/29] selftests/bpf: add BPF token-enabled tests
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (14 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 15/29] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 17/29] bpf,selinux: allocate bpf_security_struct per BPF token Andrii Nakryiko
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Add a selftest that attempts to conceptually replicate intended BPF
token use cases inside user namespaced container.

Child process is forked. It is then put into its own userns and mountns.
Child creates BPF FS context object. This ensures child userns is
captured as the owning userns for this instance of BPF FS. Given setting
delegation mount options is privileged operation, we ensure that child
cannot set them.

This context is passed back to privileged parent process through Unix
socket, where parent sets up delegation options, creates, and mounts it
as a detached mount. This mount FD is passed back to the child to be
used for BPF token creation, which allows otherwise privileged BPF
operations to succeed inside userns.

We validate that all of token-enabled privileged commands (BPF_BTF_LOAD,
BPF_MAP_CREATE, and BPF_PROG_LOAD) work as intended. They should only
succeed inside the userns if a) BPF token is provided with proper
allowed sets of commands and types; and b) namespaces CAP_BPF and other
privileges are set. Lacking a) or b) should lead to -EPERM failures.

Based on suggested workflow by Christian Brauner ([0]).

  [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/token.c  | 683 ++++++++++++++++++
 1 file changed, 683 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/token.c

diff --git a/tools/testing/selftests/bpf/prog_tests/token.c b/tools/testing/selftests/bpf/prog_tests/token.c
new file mode 100644
index 000000000000..5394a0c880a9
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/token.c
@@ -0,0 +1,683 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+#define _GNU_SOURCE
+#include <test_progs.h>
+#include <bpf/btf.h>
+#include "cap_helpers.h"
+#include <fcntl.h>
+#include <sched.h>
+#include <signal.h>
+#include <unistd.h>
+#include <linux/filter.h>
+#include <linux/unistd.h>
+#include <linux/mount.h>
+#include <sys/socket.h>
+#include <sys/syscall.h>
+#include <sys/un.h>
+
+static inline int sys_mount(const char *dev_name, const char *dir_name,
+			    const char *type, unsigned long flags,
+			    const void *data)
+{
+	return syscall(__NR_mount, dev_name, dir_name, type, flags, data);
+}
+
+static inline int sys_fsopen(const char *fsname, unsigned flags)
+{
+	return syscall(__NR_fsopen, fsname, flags);
+}
+
+static inline int sys_fspick(int dfd, const char *path, unsigned flags)
+{
+	return syscall(__NR_fspick, dfd, path, flags);
+}
+
+static inline int sys_fsconfig(int fs_fd, unsigned cmd, const char *key, const void *val, int aux)
+{
+	return syscall(__NR_fsconfig, fs_fd, cmd, key, val, aux);
+}
+
+static inline int sys_fsmount(int fs_fd, unsigned flags, unsigned ms_flags)
+{
+	return syscall(__NR_fsmount, fs_fd, flags, ms_flags);
+}
+
+static int drop_priv_caps(__u64 *old_caps)
+{
+	return cap_disable_effective((1ULL << CAP_BPF) |
+				     (1ULL << CAP_PERFMON) |
+				     (1ULL << CAP_NET_ADMIN) |
+				     (1ULL << CAP_SYS_ADMIN), old_caps);
+}
+
+static int restore_priv_caps(__u64 old_caps)
+{
+	return cap_enable_effective(old_caps, NULL);
+}
+
+static int set_delegate_mask(int fs_fd, const char *key, __u64 mask)
+{
+	char buf[32];
+	int err;
+
+	snprintf(buf, sizeof(buf), "0x%llx", (unsigned long long)mask);
+	err = sys_fsconfig(fs_fd, FSCONFIG_SET_STRING, key,
+			   mask == ~0ULL ? "any" : buf, 0);
+	if (err < 0)
+		err = -errno;
+	return err;
+}
+
+#define zclose(fd) do { if (fd >= 0) close(fd); fd = -1; } while (0)
+
+struct bpffs_opts {
+	__u64 cmds;
+	__u64 maps;
+	__u64 progs;
+	__u64 attachs;
+};
+
+static int create_bpffs_fd(void)
+{
+	int fs_fd;
+
+	/* create VFS context */
+	fs_fd = sys_fsopen("bpf", 0);
+	ASSERT_GE(fs_fd, 0, "fs_fd");
+
+	return fs_fd;
+}
+
+static int materialize_bpffs_fd(int fs_fd, struct bpffs_opts *opts)
+{
+	int mnt_fd, err;
+
+	/* set up token delegation mount options */
+	err = set_delegate_mask(fs_fd, "delegate_cmds", opts->cmds);
+	if (!ASSERT_OK(err, "fs_cfg_cmds"))
+		return err;
+	err = set_delegate_mask(fs_fd, "delegate_maps", opts->maps);
+	if (!ASSERT_OK(err, "fs_cfg_maps"))
+		return err;
+	err = set_delegate_mask(fs_fd, "delegate_progs", opts->progs);
+	if (!ASSERT_OK(err, "fs_cfg_progs"))
+		return err;
+	err = set_delegate_mask(fs_fd, "delegate_attachs", opts->attachs);
+	if (!ASSERT_OK(err, "fs_cfg_attachs"))
+		return err;
+
+	/* instantiate FS object */
+	err = sys_fsconfig(fs_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+	if (err < 0)
+		return -errno;
+
+	/* create O_PATH fd for detached mount */
+	mnt_fd = sys_fsmount(fs_fd, 0, 0);
+	if (err < 0)
+		return -errno;
+
+	return mnt_fd;
+}
+
+/* send FD over Unix domain (AF_UNIX) socket */
+static int sendfd(int sockfd, int fd)
+{
+	struct msghdr msg = {};
+	struct cmsghdr *cmsg;
+	int fds[1] = { fd }, err;
+	char iobuf[1];
+	struct iovec io = {
+		.iov_base = iobuf,
+		.iov_len = sizeof(iobuf),
+	};
+	union {
+		char buf[CMSG_SPACE(sizeof(fds))];
+		struct cmsghdr align;
+	} u;
+
+	msg.msg_iov = &io;
+	msg.msg_iovlen = 1;
+	msg.msg_control = u.buf;
+	msg.msg_controllen = sizeof(u.buf);
+	cmsg = CMSG_FIRSTHDR(&msg);
+	cmsg->cmsg_level = SOL_SOCKET;
+	cmsg->cmsg_type = SCM_RIGHTS;
+	cmsg->cmsg_len = CMSG_LEN(sizeof(fds));
+	memcpy(CMSG_DATA(cmsg), fds, sizeof(fds));
+
+	err = sendmsg(sockfd, &msg, 0);
+	if (err < 0)
+		err = -errno;
+	if (!ASSERT_EQ(err, 1, "sendmsg"))
+		return -EINVAL;
+
+	return 0;
+}
+
+/* receive FD over Unix domain (AF_UNIX) socket */
+static int recvfd(int sockfd, int *fd)
+{
+	struct msghdr msg = {};
+	struct cmsghdr *cmsg;
+	int fds[1], err;
+	char iobuf[1];
+	struct iovec io = {
+		.iov_base = iobuf,
+		.iov_len = sizeof(iobuf),
+	};
+	union {
+		char buf[CMSG_SPACE(sizeof(fds))];
+		struct cmsghdr align;
+	} u;
+
+	msg.msg_iov = &io;
+	msg.msg_iovlen = 1;
+	msg.msg_control = u.buf;
+	msg.msg_controllen = sizeof(u.buf);
+
+	err = recvmsg(sockfd, &msg, 0);
+	if (err < 0)
+		err = -errno;
+	if (!ASSERT_EQ(err, 1, "recvmsg"))
+		return -EINVAL;
+
+	cmsg = CMSG_FIRSTHDR(&msg);
+	if (!ASSERT_OK_PTR(cmsg, "cmsg_null") ||
+	    !ASSERT_EQ(cmsg->cmsg_len, CMSG_LEN(sizeof(fds)), "cmsg_len") ||
+	    !ASSERT_EQ(cmsg->cmsg_level, SOL_SOCKET, "cmsg_level") ||
+	    !ASSERT_EQ(cmsg->cmsg_type, SCM_RIGHTS, "cmsg_type"))
+		return -EINVAL;
+
+	memcpy(fds, CMSG_DATA(cmsg), sizeof(fds));
+	*fd = fds[0];
+
+	return 0;
+}
+
+static ssize_t write_nointr(int fd, const void *buf, size_t count)
+{
+	ssize_t ret;
+
+	do {
+		ret = write(fd, buf, count);
+	} while (ret < 0 && errno == EINTR);
+
+	return ret;
+}
+
+static int write_file(const char *path, const void *buf, size_t count)
+{
+	int fd;
+	ssize_t ret;
+
+	fd = open(path, O_WRONLY | O_CLOEXEC | O_NOCTTY | O_NOFOLLOW);
+	if (fd < 0)
+		return -1;
+
+	ret = write_nointr(fd, buf, count);
+	close(fd);
+	if (ret < 0 || (size_t)ret != count)
+		return -1;
+
+	return 0;
+}
+
+static int create_and_enter_userns(void)
+{
+	uid_t uid;
+	gid_t gid;
+	char map[100];
+
+	uid = getuid();
+	gid = getgid();
+
+	if (unshare(CLONE_NEWUSER))
+		return -1;
+
+	if (write_file("/proc/self/setgroups", "deny", sizeof("deny") - 1) &&
+	    errno != ENOENT)
+		return -1;
+
+	snprintf(map, sizeof(map), "0 %d 1", uid);
+	if (write_file("/proc/self/uid_map", map, strlen(map)))
+		return -1;
+
+
+	snprintf(map, sizeof(map), "0 %d 1", gid);
+	if (write_file("/proc/self/gid_map", map, strlen(map)))
+		return -1;
+
+	if (setgid(0))
+		return -1;
+
+	if (setuid(0))
+		return -1;
+
+	return 0;
+}
+
+typedef int (*child_callback_fn)(int);
+
+static void child(int sock_fd, struct bpffs_opts *opts, child_callback_fn callback)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, map_opts);
+	int mnt_fd = -1, fs_fd = -1, err = 0, bpffs_fd = -1;
+
+	/* setup userns with root mappings */
+	err = create_and_enter_userns();
+	if (!ASSERT_OK(err, "create_and_enter_userns"))
+		goto cleanup;
+
+	/* setup mountns to allow creating BPF FS (fsopen("bpf")) from unpriv process */
+	err = unshare(CLONE_NEWNS);
+	if (!ASSERT_OK(err, "create_mountns"))
+		goto cleanup;
+
+	err = sys_mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0);
+	if (!ASSERT_OK(err, "remount_root"))
+		goto cleanup;
+
+	fs_fd = create_bpffs_fd();
+	if (!ASSERT_GE(fs_fd, 0, "create_bpffs_fd")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* ensure unprivileged child cannot set delegation options */
+	err = set_delegate_mask(fs_fd, "delegate_cmds", 0x1);
+	ASSERT_EQ(err, -EPERM, "delegate_cmd_eperm");
+	err = set_delegate_mask(fs_fd, "delegate_maps", 0x1);
+	ASSERT_EQ(err, -EPERM, "delegate_maps_eperm");
+	err = set_delegate_mask(fs_fd, "delegate_progs", 0x1);
+	ASSERT_EQ(err, -EPERM, "delegate_progs_eperm");
+	err = set_delegate_mask(fs_fd, "delegate_attachs", 0x1);
+	ASSERT_EQ(err, -EPERM, "delegate_attachs_eperm");
+
+	/* pass BPF FS context object to parent */
+	err = sendfd(sock_fd, fs_fd);
+	if (!ASSERT_OK(err, "send_fs_fd"))
+		goto cleanup;
+	zclose(fs_fd);
+
+	/* avoid mucking around with mount namespaces and mounting at
+	 * well-known path, just get detach-mounted BPF FS fd back from parent
+	 */
+	err = recvfd(sock_fd, &mnt_fd);
+	if (!ASSERT_OK(err, "recv_mnt_fd"))
+		goto cleanup;
+
+	/* try to fspick() BPF FS and try to add some delegation options */
+	fs_fd = sys_fspick(mnt_fd, "", FSPICK_EMPTY_PATH);
+	if (!ASSERT_GE(fs_fd, 0, "bpffs_fspick")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* ensure unprivileged child cannot reconfigure to set delegation options */
+	err = set_delegate_mask(fs_fd, "delegate_cmds", ~0ULL);
+	if (!ASSERT_EQ(err, -EPERM, "delegate_cmd_eperm_reconfig")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+	err = set_delegate_mask(fs_fd, "delegate_maps", ~0ULL);
+	if (!ASSERT_EQ(err, -EPERM, "delegate_maps_eperm_reconfig")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+	err = set_delegate_mask(fs_fd, "delegate_progs", ~0ULL);
+	if (!ASSERT_EQ(err, -EPERM, "delegate_progs_eperm_reconfig")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+	err = set_delegate_mask(fs_fd, "delegate_attachs", ~0ULL);
+	if (!ASSERT_EQ(err, -EPERM, "delegate_attachs_eperm_reconfig")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+	zclose(fs_fd);
+
+	bpffs_fd = openat(mnt_fd, ".", 0, O_RDWR);
+	if (!ASSERT_GE(bpffs_fd, 0, "bpffs_open")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* do custom test logic with customly set up BPF FS instance */
+	err = callback(bpffs_fd);
+	if (!ASSERT_OK(err, "test_callback"))
+		goto cleanup;
+
+	err = 0;
+cleanup:
+	zclose(sock_fd);
+	zclose(mnt_fd);
+	zclose(fs_fd);
+	zclose(bpffs_fd);
+
+	exit(-err);
+}
+
+static int wait_for_pid(pid_t pid)
+{
+	int status, ret;
+
+again:
+	ret = waitpid(pid, &status, 0);
+	if (ret == -1) {
+		if (errno == EINTR)
+			goto again;
+
+		return -1;
+	}
+
+	if (!WIFEXITED(status))
+		return -1;
+
+	return WEXITSTATUS(status);
+}
+
+static void parent(int child_pid, struct bpffs_opts *bpffs_opts, int sock_fd)
+{
+	int fs_fd = -1, mnt_fd = -1, err;
+
+	err = recvfd(sock_fd, &fs_fd);
+	if (!ASSERT_OK(err, "recv_bpffs_fd"))
+		goto cleanup;
+
+	mnt_fd = materialize_bpffs_fd(fs_fd, bpffs_opts);
+	if (!ASSERT_GE(mnt_fd, 0, "materialize_bpffs_fd")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+	zclose(fs_fd);
+
+	/* pass BPF FS context object to parent */
+	err = sendfd(sock_fd, mnt_fd);
+	if (!ASSERT_OK(err, "send_mnt_fd"))
+		goto cleanup;
+	zclose(mnt_fd);
+
+	err = wait_for_pid(child_pid);
+	ASSERT_OK(err, "waitpid_child");
+
+cleanup:
+	zclose(sock_fd);
+	zclose(fs_fd);
+	zclose(mnt_fd);
+
+	if (child_pid > 0)
+		(void)kill(child_pid, SIGKILL);
+}
+
+static void subtest_userns(struct bpffs_opts *bpffs_opts, child_callback_fn cb)
+{
+	int sock_fds[2] = { -1, -1 };
+	int child_pid = 0, err;
+
+	err = socketpair(AF_UNIX, SOCK_STREAM, 0, sock_fds);
+	if (!ASSERT_OK(err, "socketpair"))
+		goto cleanup;
+
+	child_pid = fork();
+	if (!ASSERT_GE(child_pid, 0, "fork"))
+		goto cleanup;
+
+	if (child_pid == 0) {
+		zclose(sock_fds[0]);
+		return child(sock_fds[1], bpffs_opts, cb);
+
+	} else {
+		zclose(sock_fds[1]);
+		return parent(child_pid, bpffs_opts, sock_fds[0]);
+	}
+
+cleanup:
+	zclose(sock_fds[0]);
+	zclose(sock_fds[1]);
+	if (child_pid > 0)
+		(void)kill(child_pid, SIGKILL);
+}
+
+static int userns_map_create(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, map_opts);
+	int err, token_fd = -1, map_fd = -1;
+	__u64 old_caps = 0;
+
+	/* create BPF token from BPF FS mount */
+	token_fd = bpf_token_create(mnt_fd, NULL);
+	if (!ASSERT_GT(token_fd, 0, "token_create")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* while inside non-init userns, we need both a BPF token *and*
+	 * CAP_BPF inside current userns to create privileged map; let's test
+	 * that neither BPF token alone nor namespaced CAP_BPF is sufficient
+	 */
+	err = drop_priv_caps(&old_caps);
+	if (!ASSERT_OK(err, "drop_caps"))
+		goto cleanup;
+
+	/* no token, no CAP_BPF -> fail */
+	map_opts.map_flags = 0;
+	map_opts.token_fd = 0;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "wo_token_wo_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_LT(map_fd, 0, "stack_map_wo_token_wo_cap_bpf_should_fail")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* token without CAP_BPF -> fail */
+	map_opts.map_flags = BPF_F_TOKEN_FD;
+	map_opts.token_fd = token_fd;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "w_token_wo_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_LT(map_fd, 0, "stack_map_w_token_wo_cap_bpf_should_fail")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* get back effective local CAP_BPF (and CAP_SYS_ADMIN) */
+	err = restore_priv_caps(old_caps);
+	if (!ASSERT_OK(err, "restore_caps"))
+		goto cleanup;
+
+	/* CAP_BPF without token -> fail */
+	map_opts.map_flags = 0;
+	map_opts.token_fd = 0;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "wo_token_w_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_LT(map_fd, 0, "stack_map_wo_token_w_cap_bpf_should_fail")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* finally, namespaced CAP_BPF + token -> success */
+	map_opts.map_flags = BPF_F_TOKEN_FD;
+	map_opts.token_fd = token_fd;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "w_token_w_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_GT(map_fd, 0, "stack_map_w_token_w_cap_bpf")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+cleanup:
+	zclose(token_fd);
+	zclose(map_fd);
+	return err;
+}
+
+static int userns_btf_load(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_btf_load_opts, btf_opts);
+	int err, token_fd = -1, btf_fd = -1;
+	const void *raw_btf_data;
+	struct btf *btf = NULL;
+	__u32 raw_btf_size;
+	__u64 old_caps = 0;
+
+	/* create BPF token from BPF FS mount */
+	token_fd = bpf_token_create(mnt_fd, NULL);
+	if (!ASSERT_GT(token_fd, 0, "token_create")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* while inside non-init userns, we need both a BPF token *and*
+	 * CAP_BPF inside current userns to create privileged map; let's test
+	 * that neither BPF token alone nor namespaced CAP_BPF is sufficient
+	 */
+	err = drop_priv_caps(&old_caps);
+	if (!ASSERT_OK(err, "drop_caps"))
+		goto cleanup;
+
+	/* setup a trivial BTF data to load to the kernel */
+	btf = btf__new_empty();
+	if (!ASSERT_OK_PTR(btf, "empty_btf"))
+		goto cleanup;
+
+	ASSERT_GT(btf__add_int(btf, "int", 4, 0), 0, "int_type");
+
+	raw_btf_data = btf__raw_data(btf, &raw_btf_size);
+	if (!ASSERT_OK_PTR(raw_btf_data, "raw_btf_data"))
+		goto cleanup;
+
+	/* no token + no CAP_BPF -> failure */
+	btf_opts.btf_flags = 0;
+	btf_opts.token_fd = 0;
+	btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts);
+	if (!ASSERT_LT(btf_fd, 0, "no_token_no_cap_should_fail"))
+		goto cleanup;
+
+	/* token + no CAP_BPF -> failure */
+	btf_opts.btf_flags = BPF_F_TOKEN_FD;
+	btf_opts.token_fd = token_fd;
+	btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts);
+	if (!ASSERT_LT(btf_fd, 0, "token_no_cap_should_fail"))
+		goto cleanup;
+
+	/* get back effective local CAP_BPF (and CAP_SYS_ADMIN) */
+	err = restore_priv_caps(old_caps);
+	if (!ASSERT_OK(err, "restore_caps"))
+		goto cleanup;
+
+	/* token + CAP_BPF -> success */
+	btf_opts.btf_flags = BPF_F_TOKEN_FD;
+	btf_opts.token_fd = token_fd;
+	btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts);
+	if (!ASSERT_GT(btf_fd, 0, "token_and_cap_success"))
+		goto cleanup;
+
+	err = 0;
+cleanup:
+	btf__free(btf);
+	zclose(btf_fd);
+	zclose(token_fd);
+	return err;
+}
+
+static int userns_prog_load(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_prog_load_opts, prog_opts);
+	int err, token_fd = -1, prog_fd = -1;
+	struct bpf_insn insns[] = {
+		/* bpf_jiffies64() requires CAP_BPF */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+		/* bpf_get_current_task() requires CAP_PERFMON */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_current_task),
+		/* r0 = 0; exit; */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	size_t insn_cnt = ARRAY_SIZE(insns);
+	__u64 old_caps = 0;
+
+	/* create BPF token from BPF FS mount */
+	token_fd = bpf_token_create(mnt_fd, NULL);
+	if (!ASSERT_GT(token_fd, 0, "token_create")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* validate we can successfully load BPF program with token; this
+	 * being XDP program (CAP_NET_ADMIN) using bpf_jiffies64() (CAP_BPF)
+	 * and bpf_get_current_task() (CAP_PERFMON) helpers validates we have
+	 * BPF token wired properly in a bunch of places in the kernel
+	 */
+	prog_opts.prog_flags = BPF_F_TOKEN_FD;
+	prog_opts.token_fd = token_fd;
+	prog_opts.expected_attach_type = BPF_XDP;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_GT(prog_fd, 0, "prog_fd")) {
+		err = -EPERM;
+		goto cleanup;
+	}
+
+	/* no token + caps -> failure */
+	prog_opts.prog_flags = 0;
+	prog_opts.token_fd = 0;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm")) {
+		err = -EPERM;
+		goto cleanup;
+	}
+
+	err = drop_priv_caps(&old_caps);
+	if (!ASSERT_OK(err, "drop_caps"))
+		goto cleanup;
+
+	/* no caps + token -> failure */
+	prog_opts.prog_flags = BPF_F_TOKEN_FD;
+	prog_opts.token_fd = token_fd;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm")) {
+		err = -EPERM;
+		goto cleanup;
+	}
+
+	/* no caps + no token -> definitely a failure */
+	prog_opts.prog_flags = 0;
+	prog_opts.token_fd = 0;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm")) {
+		err = -EPERM;
+		goto cleanup;
+	}
+
+	err = 0;
+cleanup:
+	zclose(prog_fd);
+	zclose(token_fd);
+	return err;
+}
+
+void test_token(void)
+{
+	if (test__start_subtest("map_token")) {
+		struct bpffs_opts opts = {
+			.cmds = 1ULL << BPF_MAP_CREATE,
+			.maps = 1ULL << BPF_MAP_TYPE_STACK,
+		};
+
+		subtest_userns(&opts, userns_map_create);
+	}
+	if (test__start_subtest("btf_token")) {
+		struct bpffs_opts opts = {
+			.cmds = 1ULL << BPF_BTF_LOAD,
+		};
+
+		subtest_userns(&opts, userns_btf_load);
+	}
+	if (test__start_subtest("prog_token")) {
+		struct bpffs_opts opts = {
+			.cmds = 1ULL << BPF_PROG_LOAD,
+			.progs = 1ULL << BPF_PROG_TYPE_XDP,
+			.attachs = 1ULL << BPF_XDP,
+		};
+
+		subtest_userns(&opts, userns_prog_load);
+	}
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 17/29] bpf,selinux: allocate bpf_security_struct per BPF token
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (15 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 16/29] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 18/29] bpf: fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS Andrii Nakryiko
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Utilize newly added bpf_token_create/bpf_token_free LSM hooks to
allocate struct bpf_security_struct for each BPF token object in
SELinux. This just follows similar pattern for BPF prog and map.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 security/selinux/hooks.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c4ba3f0fcb97..30d8078b1ca1 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6822,6 +6822,29 @@ static void selinux_bpf_prog_free(struct bpf_prog *prog)
 	prog->aux->security = NULL;
 	kfree(bpfsec);
 }
+
+static int selinux_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
+				    struct path *path)
+{
+	struct bpf_security_struct *bpfsec;
+
+	bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
+	if (!bpfsec)
+		return -ENOMEM;
+
+	bpfsec->sid = current_sid();
+	token->security = bpfsec;
+
+	return 0;
+}
+
+static void selinux_bpf_token_free(struct bpf_token *token)
+{
+	struct bpf_security_struct *bpfsec = token->security;
+
+	token->security = NULL;
+	kfree(bpfsec);
+}
 #endif
 
 struct lsm_blob_sizes selinux_blob_sizes __ro_after_init = {
@@ -7177,6 +7200,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
 	LSM_HOOK_INIT(bpf_map_free, selinux_bpf_map_free),
 	LSM_HOOK_INIT(bpf_prog_free, selinux_bpf_prog_free),
+	LSM_HOOK_INIT(bpf_token_free, selinux_bpf_token_free),
 #endif
 
 #ifdef CONFIG_PERF_EVENTS
@@ -7235,6 +7259,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 #ifdef CONFIG_BPF_SYSCALL
 	LSM_HOOK_INIT(bpf_map_create, selinux_bpf_map_create),
 	LSM_HOOK_INIT(bpf_prog_load, selinux_bpf_prog_load),
+	LSM_HOOK_INIT(bpf_token_create, selinux_bpf_token_create),
 #endif
 #ifdef CONFIG_PERF_EVENTS
 	LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 18/29] bpf: fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (16 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 17/29] bpf,selinux: allocate bpf_security_struct per BPF token Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 19/29] bpf: support symbolic BPF FS delegation mount options Andrii Nakryiko
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

It's quite confusing in practice when it's possible to successfully
create a BPF token from BPF FS that didn't have any of delegate_xxx
mount options set up. While it's not wrong, it's actually more
meaningful to reject BPF_TOKEN_CREATE with specific error code (-ENOENT)
to let user-space know that no token delegation is setup up.

So, instead of creating empty BPF token that will be always ignored
because it doesn't have any of the allow_xxx bits set, reject it with
-ENOENT. If we ever need empty BPF token to be possible, we can support
that with extra flag passed into BPF_TOKEN_CREATE.

Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/token.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index 17212efcde60..a86fccd57e2d 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -152,6 +152,15 @@ int bpf_token_create(union bpf_attr *attr)
 		goto out_path;
 	}
 
+	mnt_opts = path.dentry->d_sb->s_fs_info;
+	if (mnt_opts->delegate_cmds == 0 &&
+	    mnt_opts->delegate_maps == 0 &&
+	    mnt_opts->delegate_progs == 0 &&
+	    mnt_opts->delegate_attachs == 0) {
+		err = -ENOENT; /* no BPF token delegation is set up */
+		goto out_path;
+	}
+
 	mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
 	inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
 	if (IS_ERR(inode)) {
@@ -181,7 +190,6 @@ int bpf_token_create(union bpf_attr *attr)
 	/* remember bpffs owning userns for future ns_capable() checks */
 	token->userns = get_user_ns(userns);
 
-	mnt_opts = path.dentry->d_sb->s_fs_info;
 	token->allowed_cmds = mnt_opts->delegate_cmds;
 	token->allowed_maps = mnt_opts->delegate_maps;
 	token->allowed_progs = mnt_opts->delegate_progs;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 19/29] bpf: support symbolic BPF FS delegation mount options
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (17 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 18/29] bpf: fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 20/29] selftests/bpf: utilize string values for delegate_xxx " Andrii Nakryiko
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Besides already supported special "any" value and hex bit mask, support
string-based parsing of delegation masks based on exact enumerator
names. Utilize BTF information of `enum bpf_cmd`, `enum bpf_map_type`,
`enum bpf_prog_type`, and `enum bpf_attach_type` types to find supported
symbolic names (ignoring __MAX_xxx guard values and stripping repetitive
prefixes like BPF_ for cmd and attach types, BPF_MAP_TYPE_ for maps, and
BPF_PROG_TYPE_ for prog types). The case doesn't matter, but it is
normalized to lower case in mount option output. So "PROG_LOAD",
"prog_load", and "MAP_create" are all valid values to specify for
delegate_cmds options, "array" is among supported for map types, etc.

Besides supporting string values, we also support multiple values
specified at the same time, using colon (':') separator.

There are corresponding changes on bpf_show_options side to use known
values to print them in human-readable format, falling back to hex mask
printing, if there are any unrecognized bits. This shouldn't be
necessary when enum BTF information is present, but in general we should
always be able to fall back to this even if kernel was built without BTF.
As mentioned, emitted symbolic names are normalized to be all lower case.

Example below shows various ways to specify delegate_cmds options
through mount command and how mount options are printed back:

12/14 14:39:07.604
vmuser@archvm:~/local/linux/tools/testing/selftests/bpf
$ mount | rg token

  $ sudo mkdir -p /sys/fs/bpf/token
  $ sudo mount -t bpf bpffs /sys/fs/bpf/token \
               -o delegate_cmds=prog_load:MAP_CREATE \
               -o delegate_progs=kprobe \
               -o delegate_attachs=xdp
  $ mount | grep token
  bpffs on /sys/fs/bpf/token type bpf (rw,relatime,delegate_cmds=map_create:prog_load,delegate_progs=kprobe,delegate_attachs=xdp)

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/inode.c | 249 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 211 insertions(+), 38 deletions(-)

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 5fb10da5717f..af5d2ffadd70 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -595,6 +595,136 @@ struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type typ
 }
 EXPORT_SYMBOL(bpf_prog_get_type_path);
 
+struct bpffs_btf_enums {
+	const struct btf *btf;
+	const struct btf_type *cmd_t;
+	const struct btf_type *map_t;
+	const struct btf_type *prog_t;
+	const struct btf_type *attach_t;
+};
+
+static int find_bpffs_btf_enums(struct bpffs_btf_enums *info)
+{
+	const struct btf *btf;
+	const struct btf_type *t;
+	const char *name;
+	int i, n;
+
+	memset(info, 0, sizeof(*info));
+
+	btf = bpf_get_btf_vmlinux();
+	if (IS_ERR(btf))
+		return PTR_ERR(btf);
+	if (!btf)
+		return -ENOENT;
+
+	info->btf = btf;
+
+	for (i = 1, n = btf_nr_types(btf); i < n; i++) {
+		t = btf_type_by_id(btf, i);
+		if (!btf_type_is_enum(t))
+			continue;
+
+		name = btf_name_by_offset(btf, t->name_off);
+		if (!name)
+			continue;
+
+		if (strcmp(name, "bpf_cmd") == 0)
+			info->cmd_t = t;
+		else if (strcmp(name, "bpf_map_type") == 0)
+			info->map_t = t;
+		else if (strcmp(name, "bpf_prog_type") == 0)
+			info->prog_t = t;
+		else if (strcmp(name, "bpf_attach_type") == 0)
+			info->attach_t = t;
+		else
+			continue;
+
+		if (info->cmd_t && info->map_t && info->prog_t && info->attach_t)
+			return 0;
+	}
+
+	return -ESRCH;
+}
+
+static bool find_btf_enum_const(const struct btf *btf, const struct btf_type *enum_t,
+				const char *prefix, const char *str, int *value)
+{
+	const struct btf_enum *e;
+	const char *name;
+	int i, n, pfx_len = strlen(prefix);
+
+	*value = 0;
+
+	if (!btf || !enum_t)
+		return false;
+
+	for (i = 0, n = btf_vlen(enum_t); i < n; i++) {
+		e = &btf_enum(enum_t)[i];
+
+		name = btf_name_by_offset(btf, e->name_off);
+		if (!name || strncasecmp(name, prefix, pfx_len) != 0)
+			continue;
+
+		/* match symbolic name case insensitive and ignoring prefix */
+		if (strcasecmp(name + pfx_len, str) == 0) {
+			*value = e->val;
+			return true;
+		}
+	}
+
+	return false;
+}
+
+static void seq_print_delegate_opts(struct seq_file *m,
+				    const char *opt_name,
+				    const struct btf *btf,
+				    const struct btf_type *enum_t,
+				    const char *prefix,
+				    u64 delegate_msk, u64 any_msk)
+{
+	const struct btf_enum *e;
+	bool first = true;
+	const char *name;
+	u64 msk;
+	int i, n, pfx_len = strlen(prefix);
+
+	delegate_msk &= any_msk; /* clear unknown bits */
+
+	if (delegate_msk == 0)
+		return;
+
+	seq_printf(m, ",%s", opt_name);
+	if (delegate_msk == any_msk) {
+		seq_printf(m, "=any");
+		return;
+	}
+
+	if (btf && enum_t) {
+		for (i = 0, n = btf_vlen(enum_t); i < n; i++) {
+			e = &btf_enum(enum_t)[i];
+			name = btf_name_by_offset(btf, e->name_off);
+			if (!name || strncasecmp(name, prefix, pfx_len) != 0)
+				continue;
+			msk = 1ULL << e->val;
+			if (delegate_msk & msk) {
+				/* emit lower-case name without prefix */
+				seq_printf(m, "%c", first ? '=' : ':');
+				name += pfx_len;
+				while (*name) {
+					seq_printf(m, "%c", tolower(*name));
+					name++;
+				}
+
+				delegate_msk &= ~msk;
+				first = false;
+			}
+		}
+	}
+	if (delegate_msk)
+		seq_printf(m, "%c0x%llx", first ? '=' : ':', delegate_msk);
+}
+
 /*
  * Display the mount options in /proc/mounts.
  */
@@ -614,29 +744,34 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	if (mode != S_IRWXUGO)
 		seq_printf(m, ",mode=%o", mode);
 
-	mask = (1ULL << __MAX_BPF_CMD) - 1;
-	if ((opts->delegate_cmds & mask) == mask)
-		seq_printf(m, ",delegate_cmds=any");
-	else if (opts->delegate_cmds)
-		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
-
-	mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1;
-	if ((opts->delegate_maps & mask) == mask)
-		seq_printf(m, ",delegate_maps=any");
-	else if (opts->delegate_maps)
-		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
-
-	mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1;
-	if ((opts->delegate_progs & mask) == mask)
-		seq_printf(m, ",delegate_progs=any");
-	else if (opts->delegate_progs)
-		seq_printf(m, ",delegate_progs=0x%llx", opts->delegate_progs);
-
-	mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1;
-	if ((opts->delegate_attachs & mask) == mask)
-		seq_printf(m, ",delegate_attachs=any");
-	else if (opts->delegate_attachs)
-		seq_printf(m, ",delegate_attachs=0x%llx", opts->delegate_attachs);
+	if (opts->delegate_cmds || opts->delegate_maps ||
+	    opts->delegate_progs || opts->delegate_attachs) {
+		struct bpffs_btf_enums info;
+
+		/* ignore errors, fallback to hex */
+		(void)find_bpffs_btf_enums(&info);
+
+		mask = (1ULL << __MAX_BPF_CMD) - 1;
+		seq_print_delegate_opts(m, "delegate_cmds",
+					info.btf, info.cmd_t, "BPF_",
+					opts->delegate_cmds, mask);
+
+		mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1;
+		seq_print_delegate_opts(m, "delegate_maps",
+					info.btf, info.map_t, "BPF_MAP_TYPE_",
+					opts->delegate_maps, mask);
+
+		mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1;
+		seq_print_delegate_opts(m, "delegate_progs",
+					info.btf, info.prog_t, "BPF_PROG_TYPE_",
+					opts->delegate_progs, mask);
+
+		mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1;
+		seq_print_delegate_opts(m, "delegate_attachs",
+					info.btf, info.attach_t, "BPF_",
+					opts->delegate_attachs, mask);
+	}
+
 	return 0;
 }
 
@@ -686,7 +821,6 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 	kuid_t uid;
 	kgid_t gid;
 	int opt, err;
-	u64 msk;
 
 	opt = fs_parse(fc, bpf_fs_parameters, param, &result);
 	if (opt < 0) {
@@ -741,24 +875,63 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 	case OPT_DELEGATE_CMDS:
 	case OPT_DELEGATE_MAPS:
 	case OPT_DELEGATE_PROGS:
-	case OPT_DELEGATE_ATTACHS:
-		if (strcmp(param->string, "any") == 0) {
-			msk = ~0ULL;
-		} else {
-			err = kstrtou64(param->string, 0, &msk);
-			if (err)
-				return err;
+	case OPT_DELEGATE_ATTACHS: {
+		struct bpffs_btf_enums info;
+		const struct btf_type *enum_t;
+		const char *enum_pfx;
+		u64 *delegate_msk, msk = 0;
+		char *p;
+		int val;
+
+		/* ignore errors, fallback to hex */
+		(void)find_bpffs_btf_enums(&info);
+
+		switch (opt) {
+		case OPT_DELEGATE_CMDS:
+			delegate_msk = &opts->delegate_cmds;
+			enum_t = info.cmd_t;
+			enum_pfx = "BPF_";
+			break;
+		case OPT_DELEGATE_MAPS:
+			delegate_msk = &opts->delegate_maps;
+			enum_t = info.map_t;
+			enum_pfx = "BPF_MAP_TYPE_";
+			break;
+		case OPT_DELEGATE_PROGS:
+			delegate_msk = &opts->delegate_progs;
+			enum_t = info.prog_t;
+			enum_pfx = "BPF_PROG_TYPE_";
+			break;
+		case OPT_DELEGATE_ATTACHS:
+			delegate_msk = &opts->delegate_attachs;
+			enum_t = info.attach_t;
+			enum_pfx = "BPF_";
+			break;
+		default:
+			return -EINVAL;
 		}
+
+		while ((p = strsep(&param->string, ":"))) {
+			if (strcmp(p, "any") == 0) {
+				msk |= ~0ULL;
+			} else if (find_btf_enum_const(info.btf, enum_t, enum_pfx, p, &val)) {
+				msk |= 1ULL << val;
+			} else {
+				err = kstrtou64(p, 0, &msk);
+				if (err)
+					return err;
+			}
+		}
+
 		/* Setting delegation mount options requires privileges */
 		if (msk && !capable(CAP_SYS_ADMIN))
 			return -EPERM;
-		switch (opt) {
-		case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
-		case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
-		case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
-		case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
-		default: return -EINVAL;
-		}
+
+		*delegate_msk |= msk;
+		break;
+	}
+	default:
+		/* ignore unknown mount options */
 		break;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 20/29] selftests/bpf: utilize string values for delegate_xxx mount options
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (18 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 19/29] bpf: support symbolic BPF FS delegation mount options Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 21/29] libbpf: split feature detectors definitions from cached results Andrii Nakryiko
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Use both hex-based and string-based way to specify delegate mount
options for BPF FS.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/token.c  | 52 ++++++++++++-------
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/token.c b/tools/testing/selftests/bpf/prog_tests/token.c
index 5394a0c880a9..185ed2f79315 100644
--- a/tools/testing/selftests/bpf/prog_tests/token.c
+++ b/tools/testing/selftests/bpf/prog_tests/token.c
@@ -55,14 +55,22 @@ static int restore_priv_caps(__u64 old_caps)
 	return cap_enable_effective(old_caps, NULL);
 }
 
-static int set_delegate_mask(int fs_fd, const char *key, __u64 mask)
+static int set_delegate_mask(int fs_fd, const char *key, __u64 mask, const char *mask_str)
 {
 	char buf[32];
 	int err;
 
-	snprintf(buf, sizeof(buf), "0x%llx", (unsigned long long)mask);
+	if (!mask_str) {
+		if (mask == ~0ULL) {
+			mask_str = "any";
+		} else {
+			snprintf(buf, sizeof(buf), "0x%llx", (unsigned long long)mask);
+			mask_str = buf;
+		}
+	}
+
 	err = sys_fsconfig(fs_fd, FSCONFIG_SET_STRING, key,
-			   mask == ~0ULL ? "any" : buf, 0);
+			   mask_str, 0);
 	if (err < 0)
 		err = -errno;
 	return err;
@@ -75,6 +83,10 @@ struct bpffs_opts {
 	__u64 maps;
 	__u64 progs;
 	__u64 attachs;
+	const char *cmds_str;
+	const char *maps_str;
+	const char *progs_str;
+	const char *attachs_str;
 };
 
 static int create_bpffs_fd(void)
@@ -93,16 +105,16 @@ static int materialize_bpffs_fd(int fs_fd, struct bpffs_opts *opts)
 	int mnt_fd, err;
 
 	/* set up token delegation mount options */
-	err = set_delegate_mask(fs_fd, "delegate_cmds", opts->cmds);
+	err = set_delegate_mask(fs_fd, "delegate_cmds", opts->cmds, opts->cmds_str);
 	if (!ASSERT_OK(err, "fs_cfg_cmds"))
 		return err;
-	err = set_delegate_mask(fs_fd, "delegate_maps", opts->maps);
+	err = set_delegate_mask(fs_fd, "delegate_maps", opts->maps, opts->maps_str);
 	if (!ASSERT_OK(err, "fs_cfg_maps"))
 		return err;
-	err = set_delegate_mask(fs_fd, "delegate_progs", opts->progs);
+	err = set_delegate_mask(fs_fd, "delegate_progs", opts->progs, opts->progs_str);
 	if (!ASSERT_OK(err, "fs_cfg_progs"))
 		return err;
-	err = set_delegate_mask(fs_fd, "delegate_attachs", opts->attachs);
+	err = set_delegate_mask(fs_fd, "delegate_attachs", opts->attachs, opts->attachs_str);
 	if (!ASSERT_OK(err, "fs_cfg_attachs"))
 		return err;
 
@@ -284,13 +296,13 @@ static void child(int sock_fd, struct bpffs_opts *opts, child_callback_fn callba
 	}
 
 	/* ensure unprivileged child cannot set delegation options */
-	err = set_delegate_mask(fs_fd, "delegate_cmds", 0x1);
+	err = set_delegate_mask(fs_fd, "delegate_cmds", 0x1, NULL);
 	ASSERT_EQ(err, -EPERM, "delegate_cmd_eperm");
-	err = set_delegate_mask(fs_fd, "delegate_maps", 0x1);
+	err = set_delegate_mask(fs_fd, "delegate_maps", 0x1, NULL);
 	ASSERT_EQ(err, -EPERM, "delegate_maps_eperm");
-	err = set_delegate_mask(fs_fd, "delegate_progs", 0x1);
+	err = set_delegate_mask(fs_fd, "delegate_progs", 0x1, NULL);
 	ASSERT_EQ(err, -EPERM, "delegate_progs_eperm");
-	err = set_delegate_mask(fs_fd, "delegate_attachs", 0x1);
+	err = set_delegate_mask(fs_fd, "delegate_attachs", 0x1, NULL);
 	ASSERT_EQ(err, -EPERM, "delegate_attachs_eperm");
 
 	/* pass BPF FS context object to parent */
@@ -314,22 +326,22 @@ static void child(int sock_fd, struct bpffs_opts *opts, child_callback_fn callba
 	}
 
 	/* ensure unprivileged child cannot reconfigure to set delegation options */
-	err = set_delegate_mask(fs_fd, "delegate_cmds", ~0ULL);
+	err = set_delegate_mask(fs_fd, "delegate_cmds", 0, "any");
 	if (!ASSERT_EQ(err, -EPERM, "delegate_cmd_eperm_reconfig")) {
 		err = -EINVAL;
 		goto cleanup;
 	}
-	err = set_delegate_mask(fs_fd, "delegate_maps", ~0ULL);
+	err = set_delegate_mask(fs_fd, "delegate_maps", 0, "any");
 	if (!ASSERT_EQ(err, -EPERM, "delegate_maps_eperm_reconfig")) {
 		err = -EINVAL;
 		goto cleanup;
 	}
-	err = set_delegate_mask(fs_fd, "delegate_progs", ~0ULL);
+	err = set_delegate_mask(fs_fd, "delegate_progs", 0, "any");
 	if (!ASSERT_EQ(err, -EPERM, "delegate_progs_eperm_reconfig")) {
 		err = -EINVAL;
 		goto cleanup;
 	}
-	err = set_delegate_mask(fs_fd, "delegate_attachs", ~0ULL);
+	err = set_delegate_mask(fs_fd, "delegate_attachs", 0, "any");
 	if (!ASSERT_EQ(err, -EPERM, "delegate_attachs_eperm_reconfig")) {
 		err = -EINVAL;
 		goto cleanup;
@@ -658,8 +670,8 @@ void test_token(void)
 {
 	if (test__start_subtest("map_token")) {
 		struct bpffs_opts opts = {
-			.cmds = 1ULL << BPF_MAP_CREATE,
-			.maps = 1ULL << BPF_MAP_TYPE_STACK,
+			.cmds_str = "map_create",
+			.maps_str = "stack",
 		};
 
 		subtest_userns(&opts, userns_map_create);
@@ -673,9 +685,9 @@ void test_token(void)
 	}
 	if (test__start_subtest("prog_token")) {
 		struct bpffs_opts opts = {
-			.cmds = 1ULL << BPF_PROG_LOAD,
-			.progs = 1ULL << BPF_PROG_TYPE_XDP,
-			.attachs = 1ULL << BPF_XDP,
+			.cmds_str = "PROG_LOAD",
+			.progs_str = "XDP",
+			.attachs_str = "xdp",
 		};
 
 		subtest_userns(&opts, userns_prog_load);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 21/29] libbpf: split feature detectors definitions from cached results
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (19 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 20/29] selftests/bpf: utilize string values for delegate_xxx " Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 22/29] libbpf: further decouple feature checking logic from bpf_object Andrii Nakryiko
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Split a list of supported feature detectors with their corresponding
callbacks from actual cached supported/missing values. This will allow
to have more flexible per-token or per-object feature detectors in
subsequent refactorings.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ebcfb2147fbd..95a7d459b842 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -5001,12 +5001,17 @@ enum kern_feature_result {
 	FEAT_MISSING = 2,
 };
 
+struct kern_feature_cache {
+	enum kern_feature_result res[__FEAT_CNT];
+};
+
 typedef int (*feature_probe_fn)(void);
 
+static struct kern_feature_cache feature_cache;
+
 static struct kern_feature_desc {
 	const char *desc;
 	feature_probe_fn probe;
-	enum kern_feature_result res;
 } feature_probes[__FEAT_CNT] = {
 	[FEAT_PROG_NAME] = {
 		"BPF program name", probe_kern_prog_name,
@@ -5074,6 +5079,7 @@ static struct kern_feature_desc {
 bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
 {
 	struct kern_feature_desc *feat = &feature_probes[feat_id];
+	struct kern_feature_cache *cache = &feature_cache;
 	int ret;
 
 	if (obj && obj->gen_loader)
@@ -5082,19 +5088,19 @@ bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
 		 */
 		return true;
 
-	if (READ_ONCE(feat->res) == FEAT_UNKNOWN) {
+	if (READ_ONCE(cache->res[feat_id]) == FEAT_UNKNOWN) {
 		ret = feat->probe();
 		if (ret > 0) {
-			WRITE_ONCE(feat->res, FEAT_SUPPORTED);
+			WRITE_ONCE(cache->res[feat_id], FEAT_SUPPORTED);
 		} else if (ret == 0) {
-			WRITE_ONCE(feat->res, FEAT_MISSING);
+			WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
 		} else {
 			pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret);
-			WRITE_ONCE(feat->res, FEAT_MISSING);
+			WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
 		}
 	}
 
-	return READ_ONCE(feat->res) == FEAT_SUPPORTED;
+	return READ_ONCE(cache->res[feat_id]) == FEAT_SUPPORTED;
 }
 
 static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 22/29] libbpf: further decouple feature checking logic from bpf_object
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (20 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 21/29] libbpf: split feature detectors definitions from cached results Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 23/29] libbpf: move feature detection code into its own file Andrii Nakryiko
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Add feat_supported() helper that accepts feature cache instead of
bpf_object. This allows low-level code in bpf.c to not know or care
about higher-level concept of bpf_object, yet it will be able to utilize
custom feature checking in cases where BPF token might influence the
outcome.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c             |  6 +++---
 tools/lib/bpf/libbpf.c          | 22 +++++++++++++++-------
 tools/lib/bpf/libbpf_internal.h |  5 ++++-
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index d69137459abf..10bf11a758bf 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -146,7 +146,7 @@ int bump_rlimit_memlock(void)
 	struct rlimit rlim;
 
 	/* if kernel supports memcg-based accounting, skip bumping RLIMIT_MEMLOCK */
-	if (memlock_bumped || kernel_supports(NULL, FEAT_MEMCG_ACCOUNT))
+	if (memlock_bumped || feat_supported(NULL, FEAT_MEMCG_ACCOUNT))
 		return 0;
 
 	memlock_bumped = true;
@@ -181,7 +181,7 @@ int bpf_map_create(enum bpf_map_type map_type,
 		return libbpf_err(-EINVAL);
 
 	attr.map_type = map_type;
-	if (map_name && kernel_supports(NULL, FEAT_PROG_NAME))
+	if (map_name && feat_supported(NULL, FEAT_PROG_NAME))
 		libbpf_strlcpy(attr.map_name, map_name, sizeof(attr.map_name));
 	attr.key_size = key_size;
 	attr.value_size = value_size;
@@ -265,7 +265,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 	attr.kern_version = OPTS_GET(opts, kern_version, 0);
 	attr.prog_token_fd = OPTS_GET(opts, token_fd, 0);
 
-	if (prog_name && kernel_supports(NULL, FEAT_PROG_NAME))
+	if (prog_name && feat_supported(NULL, FEAT_PROG_NAME))
 		libbpf_strlcpy(attr.prog_name, prog_name, sizeof(attr.prog_name));
 	attr.license = ptr_to_u64(license);
 
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 95a7d459b842..aea40c42b90e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -5076,17 +5076,14 @@ static struct kern_feature_desc {
 	},
 };
 
-bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
+bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id)
 {
 	struct kern_feature_desc *feat = &feature_probes[feat_id];
-	struct kern_feature_cache *cache = &feature_cache;
 	int ret;
 
-	if (obj && obj->gen_loader)
-		/* To generate loader program assume the latest kernel
-		 * to avoid doing extra prog_load, map_create syscalls.
-		 */
-		return true;
+	/* assume global feature cache, unless custom one is provided */
+	if (!cache)
+		cache = &feature_cache;
 
 	if (READ_ONCE(cache->res[feat_id]) == FEAT_UNKNOWN) {
 		ret = feat->probe();
@@ -5103,6 +5100,17 @@ bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
 	return READ_ONCE(cache->res[feat_id]) == FEAT_SUPPORTED;
 }
 
+bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
+{
+	if (obj && obj->gen_loader)
+		/* To generate loader program assume the latest kernel
+		 * to avoid doing extra prog_load, map_create syscalls.
+		 */
+		return true;
+
+	return feat_supported(NULL, feat_id);
+}
+
 static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd)
 {
 	struct bpf_map_info map_info;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index b5d334754e5d..754a432335e4 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -360,8 +360,11 @@ enum kern_feature_id {
 	__FEAT_CNT,
 };
 
-int probe_memcg_account(void);
+struct kern_feature_cache;
+bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id);
 bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id);
+
+int probe_memcg_account(void);
 int bump_rlimit_memlock(void);
 
 int parse_cpu_mask_str(const char *s, bool **mask, int *mask_sz);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 23/29] libbpf: move feature detection code into its own file
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (21 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 22/29] libbpf: further decouple feature checking logic from bpf_object Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 24/29] libbpf: wire up token_fd into feature probing logic Andrii Nakryiko
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

It's quite a lot of well isolated code, so it seems like a good
candidate to move it out of libbpf.c to reduce its size.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/Build             |   2 +-
 tools/lib/bpf/elf.c             |   2 -
 tools/lib/bpf/features.c        | 463 ++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.c          | 463 +-------------------------------
 tools/lib/bpf/libbpf_internal.h |  12 +-
 tools/lib/bpf/str_error.h       |   3 +
 6 files changed, 479 insertions(+), 466 deletions(-)
 create mode 100644 tools/lib/bpf/features.c

diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index 2d0c282c8588..b6619199a706 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1,4 +1,4 @@
 libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
 	    netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \
 	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
-	    usdt.o zip.o elf.o
+	    usdt.o zip.o elf.o features.o
diff --git a/tools/lib/bpf/elf.c b/tools/lib/bpf/elf.c
index b02faec748a5..c92e02394159 100644
--- a/tools/lib/bpf/elf.c
+++ b/tools/lib/bpf/elf.c
@@ -11,8 +11,6 @@
 #include "libbpf_internal.h"
 #include "str_error.h"
 
-#define STRERR_BUFSIZE  128
-
 /* A SHT_GNU_versym section holds 16-bit words. This bit is set if
  * the symbol is hidden and can only be seen when referenced using an
  * explicit version number. This is a GNU extension.
diff --git a/tools/lib/bpf/features.c b/tools/lib/bpf/features.c
new file mode 100644
index 000000000000..338fd0dcd3bd
--- /dev/null
+++ b/tools/lib/bpf/features.c
@@ -0,0 +1,463 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+#include <linux/kernel.h>
+#include <linux/filter.h>
+#include "bpf.h"
+#include "libbpf.h"
+#include "libbpf_common.h"
+#include "libbpf_internal.h"
+#include "str_error.h"
+
+static inline __u64 ptr_to_u64(const void *ptr)
+{
+	return (__u64)(unsigned long)ptr;
+}
+
+static int probe_fd(int fd)
+{
+	if (fd >= 0)
+		close(fd);
+	return fd >= 0;
+}
+
+static int probe_kern_prog_name(void)
+{
+	const size_t attr_sz = offsetofend(union bpf_attr, prog_name);
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	union bpf_attr attr;
+	int ret;
+
+	memset(&attr, 0, attr_sz);
+	attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
+	attr.license = ptr_to_u64("GPL");
+	attr.insns = ptr_to_u64(insns);
+	attr.insn_cnt = (__u32)ARRAY_SIZE(insns);
+	libbpf_strlcpy(attr.prog_name, "libbpf_nametest", sizeof(attr.prog_name));
+
+	/* make sure loading with name works */
+	ret = sys_bpf_prog_load(&attr, attr_sz, PROG_LOAD_ATTEMPTS);
+	return probe_fd(ret);
+}
+
+static int probe_kern_global_data(void)
+{
+	char *cp, errmsg[STRERR_BUFSIZE];
+	struct bpf_insn insns[] = {
+		BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16),
+		BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int ret, map, insn_cnt = ARRAY_SIZE(insns);
+
+	map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_global", sizeof(int), 32, 1, NULL);
+	if (map < 0) {
+		ret = -errno;
+		cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
+		pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
+			__func__, cp, -ret);
+		return ret;
+	}
+
+	insns[0].imm = map;
+
+	ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL);
+	close(map);
+	return probe_fd(ret);
+}
+
+static int probe_kern_btf(void)
+{
+	static const char strs[] = "\0int";
+	__u32 types[] = {
+		/* int */
+		BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
+	};
+
+	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
+					     strs, sizeof(strs)));
+}
+
+static int probe_kern_btf_func(void)
+{
+	static const char strs[] = "\0int\0x\0a";
+	/* void x(int a) {} */
+	__u32 types[] = {
+		/* int */
+		BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
+		/* FUNC_PROTO */                                /* [2] */
+		BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
+		BTF_PARAM_ENC(7, 1),
+		/* FUNC x */                                    /* [3] */
+		BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2),
+	};
+
+	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
+					     strs, sizeof(strs)));
+}
+
+static int probe_kern_btf_func_global(void)
+{
+	static const char strs[] = "\0int\0x\0a";
+	/* static void x(int a) {} */
+	__u32 types[] = {
+		/* int */
+		BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
+		/* FUNC_PROTO */                                /* [2] */
+		BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
+		BTF_PARAM_ENC(7, 1),
+		/* FUNC x BTF_FUNC_GLOBAL */                    /* [3] */
+		BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 2),
+	};
+
+	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
+					     strs, sizeof(strs)));
+}
+
+static int probe_kern_btf_datasec(void)
+{
+	static const char strs[] = "\0x\0.data";
+	/* static int a; */
+	__u32 types[] = {
+		/* int */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
+		/* VAR x */                                     /* [2] */
+		BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
+		BTF_VAR_STATIC,
+		/* DATASEC val */                               /* [3] */
+		BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
+		BTF_VAR_SECINFO_ENC(2, 0, 4),
+	};
+
+	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
+					     strs, sizeof(strs)));
+}
+
+static int probe_kern_btf_float(void)
+{
+	static const char strs[] = "\0float";
+	__u32 types[] = {
+		/* float */
+		BTF_TYPE_FLOAT_ENC(1, 4),
+	};
+
+	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
+					     strs, sizeof(strs)));
+}
+
+static int probe_kern_btf_decl_tag(void)
+{
+	static const char strs[] = "\0tag";
+	__u32 types[] = {
+		/* int */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
+		/* VAR x */                                     /* [2] */
+		BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
+		BTF_VAR_STATIC,
+		/* attr */
+		BTF_TYPE_DECL_TAG_ENC(1, 2, -1),
+	};
+
+	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
+					     strs, sizeof(strs)));
+}
+
+static int probe_kern_btf_type_tag(void)
+{
+	static const char strs[] = "\0tag";
+	__u32 types[] = {
+		/* int */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),		/* [1] */
+		/* attr */
+		BTF_TYPE_TYPE_TAG_ENC(1, 1),				/* [2] */
+		/* ptr */
+		BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 2),	/* [3] */
+	};
+
+	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
+					     strs, sizeof(strs)));
+}
+
+static int probe_kern_array_mmap(void)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_MMAPABLE);
+	int fd;
+
+	fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_mmap", sizeof(int), sizeof(int), 1, &opts);
+	return probe_fd(fd);
+}
+
+static int probe_kern_exp_attach_type(void)
+{
+	LIBBPF_OPTS(bpf_prog_load_opts, opts, .expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE);
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int fd, insn_cnt = ARRAY_SIZE(insns);
+
+	/* use any valid combination of program type and (optional)
+	 * non-zero expected attach type (i.e., not a BPF_CGROUP_INET_INGRESS)
+	 * to see if kernel supports expected_attach_type field for
+	 * BPF_PROG_LOAD command
+	 */
+	fd = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, NULL, "GPL", insns, insn_cnt, &opts);
+	return probe_fd(fd);
+}
+
+static int probe_kern_probe_read_kernel(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),	/* r1 = r10 (fp) */
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),	/* r1 += -8 */
+		BPF_MOV64_IMM(BPF_REG_2, 8),		/* r2 = 8 */
+		BPF_MOV64_IMM(BPF_REG_3, 0),		/* r3 = 0 */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel),
+		BPF_EXIT_INSN(),
+	};
+	int fd, insn_cnt = ARRAY_SIZE(insns);
+
+	fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, NULL);
+	return probe_fd(fd);
+}
+
+static int probe_prog_bind_map(void)
+{
+	char *cp, errmsg[STRERR_BUFSIZE];
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int ret, map, prog, insn_cnt = ARRAY_SIZE(insns);
+
+	map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_det_bind", sizeof(int), 32, 1, NULL);
+	if (map < 0) {
+		ret = -errno;
+		cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
+		pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
+			__func__, cp, -ret);
+		return ret;
+	}
+
+	prog = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL);
+	if (prog < 0) {
+		close(map);
+		return 0;
+	}
+
+	ret = bpf_prog_bind_map(prog, map, NULL);
+
+	close(map);
+	close(prog);
+
+	return ret >= 0;
+}
+
+static int probe_module_btf(void)
+{
+	static const char strs[] = "\0int";
+	__u32 types[] = {
+		/* int */
+		BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
+	};
+	struct bpf_btf_info info;
+	__u32 len = sizeof(info);
+	char name[16];
+	int fd, err;
+
+	fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs));
+	if (fd < 0)
+		return 0; /* BTF not supported at all */
+
+	memset(&info, 0, sizeof(info));
+	info.name = ptr_to_u64(name);
+	info.name_len = sizeof(name);
+
+	/* check that BPF_OBJ_GET_INFO_BY_FD supports specifying name pointer;
+	 * kernel's module BTF support coincides with support for
+	 * name/name_len fields in struct bpf_btf_info.
+	 */
+	err = bpf_btf_get_info_by_fd(fd, &info, &len);
+	close(fd);
+	return !err;
+}
+
+static int probe_perf_link(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd, link_fd, err;
+
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL",
+				insns, ARRAY_SIZE(insns), NULL);
+	if (prog_fd < 0)
+		return -errno;
+
+	/* use invalid perf_event FD to get EBADF, if link is supported;
+	 * otherwise EINVAL should be returned
+	 */
+	link_fd = bpf_link_create(prog_fd, -1, BPF_PERF_EVENT, NULL);
+	err = -errno; /* close() can clobber errno */
+
+	if (link_fd >= 0)
+		close(link_fd);
+	close(prog_fd);
+
+	return link_fd < 0 && err == -EBADF;
+}
+
+static int probe_uprobe_multi_link(void)
+{
+	LIBBPF_OPTS(bpf_prog_load_opts, load_opts,
+		.expected_attach_type = BPF_TRACE_UPROBE_MULTI,
+	);
+	LIBBPF_OPTS(bpf_link_create_opts, link_opts);
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd, link_fd, err;
+	unsigned long offset = 0;
+
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL",
+				insns, ARRAY_SIZE(insns), &load_opts);
+	if (prog_fd < 0)
+		return -errno;
+
+	/* Creating uprobe in '/' binary should fail with -EBADF. */
+	link_opts.uprobe_multi.path = "/";
+	link_opts.uprobe_multi.offsets = &offset;
+	link_opts.uprobe_multi.cnt = 1;
+
+	link_fd = bpf_link_create(prog_fd, -1, BPF_TRACE_UPROBE_MULTI, &link_opts);
+	err = -errno; /* close() can clobber errno */
+
+	if (link_fd >= 0)
+		close(link_fd);
+	close(prog_fd);
+
+	return link_fd < 0 && err == -EBADF;
+}
+
+static int probe_kern_bpf_cookie(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie),
+		BPF_EXIT_INSN(),
+	};
+	int ret, insn_cnt = ARRAY_SIZE(insns);
+
+	ret = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL", insns, insn_cnt, NULL);
+	return probe_fd(ret);
+}
+
+static int probe_kern_btf_enum64(void)
+{
+	static const char strs[] = "\0enum64";
+	__u32 types[] = {
+		BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_ENUM64, 0, 0), 8),
+	};
+
+	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
+					     strs, sizeof(strs)));
+}
+
+typedef int (*feature_probe_fn)(void);
+
+static struct kern_feature_cache feature_cache;
+
+static struct kern_feature_desc {
+	const char *desc;
+	feature_probe_fn probe;
+} feature_probes[__FEAT_CNT] = {
+	[FEAT_PROG_NAME] = {
+		"BPF program name", probe_kern_prog_name,
+	},
+	[FEAT_GLOBAL_DATA] = {
+		"global variables", probe_kern_global_data,
+	},
+	[FEAT_BTF] = {
+		"minimal BTF", probe_kern_btf,
+	},
+	[FEAT_BTF_FUNC] = {
+		"BTF functions", probe_kern_btf_func,
+	},
+	[FEAT_BTF_GLOBAL_FUNC] = {
+		"BTF global function", probe_kern_btf_func_global,
+	},
+	[FEAT_BTF_DATASEC] = {
+		"BTF data section and variable", probe_kern_btf_datasec,
+	},
+	[FEAT_ARRAY_MMAP] = {
+		"ARRAY map mmap()", probe_kern_array_mmap,
+	},
+	[FEAT_EXP_ATTACH_TYPE] = {
+		"BPF_PROG_LOAD expected_attach_type attribute",
+		probe_kern_exp_attach_type,
+	},
+	[FEAT_PROBE_READ_KERN] = {
+		"bpf_probe_read_kernel() helper", probe_kern_probe_read_kernel,
+	},
+	[FEAT_PROG_BIND_MAP] = {
+		"BPF_PROG_BIND_MAP support", probe_prog_bind_map,
+	},
+	[FEAT_MODULE_BTF] = {
+		"module BTF support", probe_module_btf,
+	},
+	[FEAT_BTF_FLOAT] = {
+		"BTF_KIND_FLOAT support", probe_kern_btf_float,
+	},
+	[FEAT_PERF_LINK] = {
+		"BPF perf link support", probe_perf_link,
+	},
+	[FEAT_BTF_DECL_TAG] = {
+		"BTF_KIND_DECL_TAG support", probe_kern_btf_decl_tag,
+	},
+	[FEAT_BTF_TYPE_TAG] = {
+		"BTF_KIND_TYPE_TAG support", probe_kern_btf_type_tag,
+	},
+	[FEAT_MEMCG_ACCOUNT] = {
+		"memcg-based memory accounting", probe_memcg_account,
+	},
+	[FEAT_BPF_COOKIE] = {
+		"BPF cookie support", probe_kern_bpf_cookie,
+	},
+	[FEAT_BTF_ENUM64] = {
+		"BTF_KIND_ENUM64 support", probe_kern_btf_enum64,
+	},
+	[FEAT_SYSCALL_WRAPPER] = {
+		"Kernel using syscall wrapper", probe_kern_syscall_wrapper,
+	},
+	[FEAT_UPROBE_MULTI_LINK] = {
+		"BPF multi-uprobe link support", probe_uprobe_multi_link,
+	},
+};
+
+bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id)
+{
+	struct kern_feature_desc *feat = &feature_probes[feat_id];
+	int ret;
+
+	/* assume global feature cache, unless custom one is provided */
+	if (!cache)
+		cache = &feature_cache;
+
+	if (READ_ONCE(cache->res[feat_id]) == FEAT_UNKNOWN) {
+		ret = feat->probe();
+		if (ret > 0) {
+			WRITE_ONCE(cache->res[feat_id], FEAT_SUPPORTED);
+		} else if (ret == 0) {
+			WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
+		} else {
+			pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret);
+			WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
+		}
+	}
+
+	return READ_ONCE(cache->res[feat_id]) == FEAT_SUPPORTED;
+}
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index aea40c42b90e..8e70532420a3 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4639,467 +4639,6 @@ bpf_object__probe_loading(struct bpf_object *obj)
 	return 0;
 }
 
-static int probe_fd(int fd)
-{
-	if (fd >= 0)
-		close(fd);
-	return fd >= 0;
-}
-
-static int probe_kern_prog_name(void)
-{
-	const size_t attr_sz = offsetofend(union bpf_attr, prog_name);
-	struct bpf_insn insns[] = {
-		BPF_MOV64_IMM(BPF_REG_0, 0),
-		BPF_EXIT_INSN(),
-	};
-	union bpf_attr attr;
-	int ret;
-
-	memset(&attr, 0, attr_sz);
-	attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
-	attr.license = ptr_to_u64("GPL");
-	attr.insns = ptr_to_u64(insns);
-	attr.insn_cnt = (__u32)ARRAY_SIZE(insns);
-	libbpf_strlcpy(attr.prog_name, "libbpf_nametest", sizeof(attr.prog_name));
-
-	/* make sure loading with name works */
-	ret = sys_bpf_prog_load(&attr, attr_sz, PROG_LOAD_ATTEMPTS);
-	return probe_fd(ret);
-}
-
-static int probe_kern_global_data(void)
-{
-	char *cp, errmsg[STRERR_BUFSIZE];
-	struct bpf_insn insns[] = {
-		BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16),
-		BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42),
-		BPF_MOV64_IMM(BPF_REG_0, 0),
-		BPF_EXIT_INSN(),
-	};
-	int ret, map, insn_cnt = ARRAY_SIZE(insns);
-
-	map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_global", sizeof(int), 32, 1, NULL);
-	if (map < 0) {
-		ret = -errno;
-		cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
-		pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
-			__func__, cp, -ret);
-		return ret;
-	}
-
-	insns[0].imm = map;
-
-	ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL);
-	close(map);
-	return probe_fd(ret);
-}
-
-static int probe_kern_btf(void)
-{
-	static const char strs[] = "\0int";
-	__u32 types[] = {
-		/* int */
-		BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
-	};
-
-	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
-}
-
-static int probe_kern_btf_func(void)
-{
-	static const char strs[] = "\0int\0x\0a";
-	/* void x(int a) {} */
-	__u32 types[] = {
-		/* int */
-		BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
-		/* FUNC_PROTO */                                /* [2] */
-		BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
-		BTF_PARAM_ENC(7, 1),
-		/* FUNC x */                                    /* [3] */
-		BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2),
-	};
-
-	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
-}
-
-static int probe_kern_btf_func_global(void)
-{
-	static const char strs[] = "\0int\0x\0a";
-	/* static void x(int a) {} */
-	__u32 types[] = {
-		/* int */
-		BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
-		/* FUNC_PROTO */                                /* [2] */
-		BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
-		BTF_PARAM_ENC(7, 1),
-		/* FUNC x BTF_FUNC_GLOBAL */                    /* [3] */
-		BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 2),
-	};
-
-	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
-}
-
-static int probe_kern_btf_datasec(void)
-{
-	static const char strs[] = "\0x\0.data";
-	/* static int a; */
-	__u32 types[] = {
-		/* int */
-		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
-		/* VAR x */                                     /* [2] */
-		BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
-		BTF_VAR_STATIC,
-		/* DATASEC val */                               /* [3] */
-		BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
-		BTF_VAR_SECINFO_ENC(2, 0, 4),
-	};
-
-	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
-}
-
-static int probe_kern_btf_float(void)
-{
-	static const char strs[] = "\0float";
-	__u32 types[] = {
-		/* float */
-		BTF_TYPE_FLOAT_ENC(1, 4),
-	};
-
-	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
-}
-
-static int probe_kern_btf_decl_tag(void)
-{
-	static const char strs[] = "\0tag";
-	__u32 types[] = {
-		/* int */
-		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
-		/* VAR x */                                     /* [2] */
-		BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
-		BTF_VAR_STATIC,
-		/* attr */
-		BTF_TYPE_DECL_TAG_ENC(1, 2, -1),
-	};
-
-	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
-}
-
-static int probe_kern_btf_type_tag(void)
-{
-	static const char strs[] = "\0tag";
-	__u32 types[] = {
-		/* int */
-		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),		/* [1] */
-		/* attr */
-		BTF_TYPE_TYPE_TAG_ENC(1, 1),				/* [2] */
-		/* ptr */
-		BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 2),	/* [3] */
-	};
-
-	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
-}
-
-static int probe_kern_array_mmap(void)
-{
-	LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_MMAPABLE);
-	int fd;
-
-	fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_mmap", sizeof(int), sizeof(int), 1, &opts);
-	return probe_fd(fd);
-}
-
-static int probe_kern_exp_attach_type(void)
-{
-	LIBBPF_OPTS(bpf_prog_load_opts, opts, .expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE);
-	struct bpf_insn insns[] = {
-		BPF_MOV64_IMM(BPF_REG_0, 0),
-		BPF_EXIT_INSN(),
-	};
-	int fd, insn_cnt = ARRAY_SIZE(insns);
-
-	/* use any valid combination of program type and (optional)
-	 * non-zero expected attach type (i.e., not a BPF_CGROUP_INET_INGRESS)
-	 * to see if kernel supports expected_attach_type field for
-	 * BPF_PROG_LOAD command
-	 */
-	fd = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, NULL, "GPL", insns, insn_cnt, &opts);
-	return probe_fd(fd);
-}
-
-static int probe_kern_probe_read_kernel(void)
-{
-	struct bpf_insn insns[] = {
-		BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),	/* r1 = r10 (fp) */
-		BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),	/* r1 += -8 */
-		BPF_MOV64_IMM(BPF_REG_2, 8),		/* r2 = 8 */
-		BPF_MOV64_IMM(BPF_REG_3, 0),		/* r3 = 0 */
-		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel),
-		BPF_EXIT_INSN(),
-	};
-	int fd, insn_cnt = ARRAY_SIZE(insns);
-
-	fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, NULL);
-	return probe_fd(fd);
-}
-
-static int probe_prog_bind_map(void)
-{
-	char *cp, errmsg[STRERR_BUFSIZE];
-	struct bpf_insn insns[] = {
-		BPF_MOV64_IMM(BPF_REG_0, 0),
-		BPF_EXIT_INSN(),
-	};
-	int ret, map, prog, insn_cnt = ARRAY_SIZE(insns);
-
-	map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_det_bind", sizeof(int), 32, 1, NULL);
-	if (map < 0) {
-		ret = -errno;
-		cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
-		pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
-			__func__, cp, -ret);
-		return ret;
-	}
-
-	prog = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL);
-	if (prog < 0) {
-		close(map);
-		return 0;
-	}
-
-	ret = bpf_prog_bind_map(prog, map, NULL);
-
-	close(map);
-	close(prog);
-
-	return ret >= 0;
-}
-
-static int probe_module_btf(void)
-{
-	static const char strs[] = "\0int";
-	__u32 types[] = {
-		/* int */
-		BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
-	};
-	struct bpf_btf_info info;
-	__u32 len = sizeof(info);
-	char name[16];
-	int fd, err;
-
-	fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs));
-	if (fd < 0)
-		return 0; /* BTF not supported at all */
-
-	memset(&info, 0, sizeof(info));
-	info.name = ptr_to_u64(name);
-	info.name_len = sizeof(name);
-
-	/* check that BPF_OBJ_GET_INFO_BY_FD supports specifying name pointer;
-	 * kernel's module BTF support coincides with support for
-	 * name/name_len fields in struct bpf_btf_info.
-	 */
-	err = bpf_btf_get_info_by_fd(fd, &info, &len);
-	close(fd);
-	return !err;
-}
-
-static int probe_perf_link(void)
-{
-	struct bpf_insn insns[] = {
-		BPF_MOV64_IMM(BPF_REG_0, 0),
-		BPF_EXIT_INSN(),
-	};
-	int prog_fd, link_fd, err;
-
-	prog_fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL",
-				insns, ARRAY_SIZE(insns), NULL);
-	if (prog_fd < 0)
-		return -errno;
-
-	/* use invalid perf_event FD to get EBADF, if link is supported;
-	 * otherwise EINVAL should be returned
-	 */
-	link_fd = bpf_link_create(prog_fd, -1, BPF_PERF_EVENT, NULL);
-	err = -errno; /* close() can clobber errno */
-
-	if (link_fd >= 0)
-		close(link_fd);
-	close(prog_fd);
-
-	return link_fd < 0 && err == -EBADF;
-}
-
-static int probe_uprobe_multi_link(void)
-{
-	LIBBPF_OPTS(bpf_prog_load_opts, load_opts,
-		.expected_attach_type = BPF_TRACE_UPROBE_MULTI,
-	);
-	LIBBPF_OPTS(bpf_link_create_opts, link_opts);
-	struct bpf_insn insns[] = {
-		BPF_MOV64_IMM(BPF_REG_0, 0),
-		BPF_EXIT_INSN(),
-	};
-	int prog_fd, link_fd, err;
-	unsigned long offset = 0;
-
-	prog_fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL",
-				insns, ARRAY_SIZE(insns), &load_opts);
-	if (prog_fd < 0)
-		return -errno;
-
-	/* Creating uprobe in '/' binary should fail with -EBADF. */
-	link_opts.uprobe_multi.path = "/";
-	link_opts.uprobe_multi.offsets = &offset;
-	link_opts.uprobe_multi.cnt = 1;
-
-	link_fd = bpf_link_create(prog_fd, -1, BPF_TRACE_UPROBE_MULTI, &link_opts);
-	err = -errno; /* close() can clobber errno */
-
-	if (link_fd >= 0)
-		close(link_fd);
-	close(prog_fd);
-
-	return link_fd < 0 && err == -EBADF;
-}
-
-static int probe_kern_bpf_cookie(void)
-{
-	struct bpf_insn insns[] = {
-		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie),
-		BPF_EXIT_INSN(),
-	};
-	int ret, insn_cnt = ARRAY_SIZE(insns);
-
-	ret = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL", insns, insn_cnt, NULL);
-	return probe_fd(ret);
-}
-
-static int probe_kern_btf_enum64(void)
-{
-	static const char strs[] = "\0enum64";
-	__u32 types[] = {
-		BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_ENUM64, 0, 0), 8),
-	};
-
-	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
-}
-
-static int probe_kern_syscall_wrapper(void);
-
-enum kern_feature_result {
-	FEAT_UNKNOWN = 0,
-	FEAT_SUPPORTED = 1,
-	FEAT_MISSING = 2,
-};
-
-struct kern_feature_cache {
-	enum kern_feature_result res[__FEAT_CNT];
-};
-
-typedef int (*feature_probe_fn)(void);
-
-static struct kern_feature_cache feature_cache;
-
-static struct kern_feature_desc {
-	const char *desc;
-	feature_probe_fn probe;
-} feature_probes[__FEAT_CNT] = {
-	[FEAT_PROG_NAME] = {
-		"BPF program name", probe_kern_prog_name,
-	},
-	[FEAT_GLOBAL_DATA] = {
-		"global variables", probe_kern_global_data,
-	},
-	[FEAT_BTF] = {
-		"minimal BTF", probe_kern_btf,
-	},
-	[FEAT_BTF_FUNC] = {
-		"BTF functions", probe_kern_btf_func,
-	},
-	[FEAT_BTF_GLOBAL_FUNC] = {
-		"BTF global function", probe_kern_btf_func_global,
-	},
-	[FEAT_BTF_DATASEC] = {
-		"BTF data section and variable", probe_kern_btf_datasec,
-	},
-	[FEAT_ARRAY_MMAP] = {
-		"ARRAY map mmap()", probe_kern_array_mmap,
-	},
-	[FEAT_EXP_ATTACH_TYPE] = {
-		"BPF_PROG_LOAD expected_attach_type attribute",
-		probe_kern_exp_attach_type,
-	},
-	[FEAT_PROBE_READ_KERN] = {
-		"bpf_probe_read_kernel() helper", probe_kern_probe_read_kernel,
-	},
-	[FEAT_PROG_BIND_MAP] = {
-		"BPF_PROG_BIND_MAP support", probe_prog_bind_map,
-	},
-	[FEAT_MODULE_BTF] = {
-		"module BTF support", probe_module_btf,
-	},
-	[FEAT_BTF_FLOAT] = {
-		"BTF_KIND_FLOAT support", probe_kern_btf_float,
-	},
-	[FEAT_PERF_LINK] = {
-		"BPF perf link support", probe_perf_link,
-	},
-	[FEAT_BTF_DECL_TAG] = {
-		"BTF_KIND_DECL_TAG support", probe_kern_btf_decl_tag,
-	},
-	[FEAT_BTF_TYPE_TAG] = {
-		"BTF_KIND_TYPE_TAG support", probe_kern_btf_type_tag,
-	},
-	[FEAT_MEMCG_ACCOUNT] = {
-		"memcg-based memory accounting", probe_memcg_account,
-	},
-	[FEAT_BPF_COOKIE] = {
-		"BPF cookie support", probe_kern_bpf_cookie,
-	},
-	[FEAT_BTF_ENUM64] = {
-		"BTF_KIND_ENUM64 support", probe_kern_btf_enum64,
-	},
-	[FEAT_SYSCALL_WRAPPER] = {
-		"Kernel using syscall wrapper", probe_kern_syscall_wrapper,
-	},
-	[FEAT_UPROBE_MULTI_LINK] = {
-		"BPF multi-uprobe link support", probe_uprobe_multi_link,
-	},
-};
-
-bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id)
-{
-	struct kern_feature_desc *feat = &feature_probes[feat_id];
-	int ret;
-
-	/* assume global feature cache, unless custom one is provided */
-	if (!cache)
-		cache = &feature_cache;
-
-	if (READ_ONCE(cache->res[feat_id]) == FEAT_UNKNOWN) {
-		ret = feat->probe();
-		if (ret > 0) {
-			WRITE_ONCE(cache->res[feat_id], FEAT_SUPPORTED);
-		} else if (ret == 0) {
-			WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
-		} else {
-			pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret);
-			WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
-		}
-	}
-
-	return READ_ONCE(cache->res[feat_id]) == FEAT_SUPPORTED;
-}
-
 bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
 {
 	if (obj && obj->gen_loader)
@@ -10628,7 +10167,7 @@ static const char *arch_specific_syscall_pfx(void)
 #endif
 }
 
-static int probe_kern_syscall_wrapper(void)
+int probe_kern_syscall_wrapper(void)
 {
 	char syscall_name[64];
 	const char *ksys_pfx;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 754a432335e4..db4a499c0ec5 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -360,10 +360,20 @@ enum kern_feature_id {
 	__FEAT_CNT,
 };
 
-struct kern_feature_cache;
+enum kern_feature_result {
+	FEAT_UNKNOWN = 0,
+	FEAT_SUPPORTED = 1,
+	FEAT_MISSING = 2,
+};
+
+struct kern_feature_cache {
+	enum kern_feature_result res[__FEAT_CNT];
+};
+
 bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id);
 bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id);
 
+int probe_kern_syscall_wrapper(void);
 int probe_memcg_account(void);
 int bump_rlimit_memlock(void);
 
diff --git a/tools/lib/bpf/str_error.h b/tools/lib/bpf/str_error.h
index a139334d57b6..626d7ffb03d6 100644
--- a/tools/lib/bpf/str_error.h
+++ b/tools/lib/bpf/str_error.h
@@ -2,5 +2,8 @@
 #ifndef __LIBBPF_STR_ERROR_H
 #define __LIBBPF_STR_ERROR_H
 
+#define STRERR_BUFSIZE  128
+
 char *libbpf_strerror_r(int err, char *dst, int len);
+
 #endif /* __LIBBPF_STR_ERROR_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 24/29] libbpf: wire up token_fd into feature probing logic
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (22 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 23/29] libbpf: move feature detection code into its own file Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 25/29] libbpf: wire up BPF token support at BPF object level Andrii Nakryiko
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Adjust feature probing callbacks to take into account optional token_fd.
In unprivileged contexts, some feature detectors would fail to detect
kernel support just because BPF program, BPF map, or BTF object can't be
loaded due to privileged nature of those operations. So when BPF object
is loaded with BPF token, this token should be used for feature probing.

This patch is setting support for this scenario, but we don't yet pass
non-zero token FD. This will be added in the next patch.

We also switched BPF cookie detector from using kprobe program to
tracepoint one, as tracepoint is somewhat less dangerous BPF program
type and has higher likelihood of being allowed through BPF token in the
future. This change has no effect on detection behavior.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c             |   5 +-
 tools/lib/bpf/features.c        | 116 +++++++++++++++++++++-----------
 tools/lib/bpf/libbpf.c          |   2 +-
 tools/lib/bpf/libbpf_internal.h |   8 ++-
 tools/lib/bpf/libbpf_probes.c   |  11 ++-
 5 files changed, 96 insertions(+), 46 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 10bf11a758bf..cc3888c3c914 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -103,7 +103,7 @@ int sys_bpf_prog_load(union bpf_attr *attr, unsigned int size, int attempts)
  *   [0] https://lore.kernel.org/bpf/20201201215900.3569844-1-guro@fb.com/
  *   [1] d05512618056 ("bpf: Add bpf_ktime_get_coarse_ns helper")
  */
-int probe_memcg_account(void)
+int probe_memcg_account(int token_fd)
 {
 	const size_t attr_sz = offsetofend(union bpf_attr, attach_btf_obj_fd);
 	struct bpf_insn insns[] = {
@@ -120,6 +120,9 @@ int probe_memcg_account(void)
 	attr.insns = ptr_to_u64(insns);
 	attr.insn_cnt = insn_cnt;
 	attr.license = ptr_to_u64("GPL");
+	attr.prog_token_fd = token_fd;
+	if (token_fd)
+		attr.prog_flags |= BPF_F_TOKEN_FD;
 
 	prog_fd = sys_bpf_fd(BPF_PROG_LOAD, &attr, attr_sz);
 	if (prog_fd >= 0) {
diff --git a/tools/lib/bpf/features.c b/tools/lib/bpf/features.c
index 338fd0dcd3bd..7ac83111e47d 100644
--- a/tools/lib/bpf/features.c
+++ b/tools/lib/bpf/features.c
@@ -20,7 +20,7 @@ static int probe_fd(int fd)
 	return fd >= 0;
 }
 
-static int probe_kern_prog_name(void)
+static int probe_kern_prog_name(int token_fd)
 {
 	const size_t attr_sz = offsetofend(union bpf_attr, prog_name);
 	struct bpf_insn insns[] = {
@@ -35,6 +35,9 @@ static int probe_kern_prog_name(void)
 	attr.license = ptr_to_u64("GPL");
 	attr.insns = ptr_to_u64(insns);
 	attr.insn_cnt = (__u32)ARRAY_SIZE(insns);
+	attr.prog_token_fd = token_fd;
+	if (token_fd)
+		attr.prog_flags |= BPF_F_TOKEN_FD;
 	libbpf_strlcpy(attr.prog_name, "libbpf_nametest", sizeof(attr.prog_name));
 
 	/* make sure loading with name works */
@@ -42,7 +45,7 @@ static int probe_kern_prog_name(void)
 	return probe_fd(ret);
 }
 
-static int probe_kern_global_data(void)
+static int probe_kern_global_data(int token_fd)
 {
 	char *cp, errmsg[STRERR_BUFSIZE];
 	struct bpf_insn insns[] = {
@@ -51,9 +54,17 @@ static int probe_kern_global_data(void)
 		BPF_MOV64_IMM(BPF_REG_0, 0),
 		BPF_EXIT_INSN(),
 	};
+	LIBBPF_OPTS(bpf_map_create_opts, map_opts,
+		.token_fd = token_fd,
+		.map_flags = token_fd ? BPF_F_TOKEN_FD : 0,
+	);
+	LIBBPF_OPTS(bpf_prog_load_opts, prog_opts,
+		.token_fd = token_fd,
+		.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
+	);
 	int ret, map, insn_cnt = ARRAY_SIZE(insns);
 
-	map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_global", sizeof(int), 32, 1, NULL);
+	map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_global", sizeof(int), 32, 1, &map_opts);
 	if (map < 0) {
 		ret = -errno;
 		cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
@@ -64,12 +75,12 @@ static int probe_kern_global_data(void)
 
 	insns[0].imm = map;
 
-	ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL);
+	ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &prog_opts);
 	close(map);
 	return probe_fd(ret);
 }
 
-static int probe_kern_btf(void)
+static int probe_kern_btf(int token_fd)
 {
 	static const char strs[] = "\0int";
 	__u32 types[] = {
@@ -78,10 +89,10 @@ static int probe_kern_btf(void)
 	};
 
 	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
+					     strs, sizeof(strs), token_fd));
 }
 
-static int probe_kern_btf_func(void)
+static int probe_kern_btf_func(int token_fd)
 {
 	static const char strs[] = "\0int\0x\0a";
 	/* void x(int a) {} */
@@ -96,10 +107,10 @@ static int probe_kern_btf_func(void)
 	};
 
 	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
+					     strs, sizeof(strs), token_fd));
 }
 
-static int probe_kern_btf_func_global(void)
+static int probe_kern_btf_func_global(int token_fd)
 {
 	static const char strs[] = "\0int\0x\0a";
 	/* static void x(int a) {} */
@@ -114,10 +125,10 @@ static int probe_kern_btf_func_global(void)
 	};
 
 	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
+					     strs, sizeof(strs), token_fd));
 }
 
-static int probe_kern_btf_datasec(void)
+static int probe_kern_btf_datasec(int token_fd)
 {
 	static const char strs[] = "\0x\0.data";
 	/* static int a; */
@@ -133,10 +144,10 @@ static int probe_kern_btf_datasec(void)
 	};
 
 	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
+					     strs, sizeof(strs), token_fd));
 }
 
-static int probe_kern_btf_float(void)
+static int probe_kern_btf_float(int token_fd)
 {
 	static const char strs[] = "\0float";
 	__u32 types[] = {
@@ -145,10 +156,10 @@ static int probe_kern_btf_float(void)
 	};
 
 	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
+					     strs, sizeof(strs), token_fd));
 }
 
-static int probe_kern_btf_decl_tag(void)
+static int probe_kern_btf_decl_tag(int token_fd)
 {
 	static const char strs[] = "\0tag";
 	__u32 types[] = {
@@ -162,10 +173,10 @@ static int probe_kern_btf_decl_tag(void)
 	};
 
 	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
+					     strs, sizeof(strs), token_fd));
 }
 
-static int probe_kern_btf_type_tag(void)
+static int probe_kern_btf_type_tag(int token_fd)
 {
 	static const char strs[] = "\0tag";
 	__u32 types[] = {
@@ -178,21 +189,28 @@ static int probe_kern_btf_type_tag(void)
 	};
 
 	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
+					     strs, sizeof(strs), token_fd));
 }
 
-static int probe_kern_array_mmap(void)
+static int probe_kern_array_mmap(int token_fd)
 {
-	LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_MMAPABLE);
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		.map_flags = BPF_F_MMAPABLE | (token_fd ? BPF_F_TOKEN_FD : 0),
+		.token_fd = token_fd,
+	);
 	int fd;
 
 	fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_mmap", sizeof(int), sizeof(int), 1, &opts);
 	return probe_fd(fd);
 }
 
-static int probe_kern_exp_attach_type(void)
+static int probe_kern_exp_attach_type(int token_fd)
 {
-	LIBBPF_OPTS(bpf_prog_load_opts, opts, .expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE);
+	LIBBPF_OPTS(bpf_prog_load_opts, opts,
+		.expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE,
+		.token_fd = token_fd,
+		.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
+	);
 	struct bpf_insn insns[] = {
 		BPF_MOV64_IMM(BPF_REG_0, 0),
 		BPF_EXIT_INSN(),
@@ -208,8 +226,12 @@ static int probe_kern_exp_attach_type(void)
 	return probe_fd(fd);
 }
 
-static int probe_kern_probe_read_kernel(void)
+static int probe_kern_probe_read_kernel(int token_fd)
 {
+	LIBBPF_OPTS(bpf_prog_load_opts, opts,
+		.token_fd = token_fd,
+		.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
+	);
 	struct bpf_insn insns[] = {
 		BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),	/* r1 = r10 (fp) */
 		BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),	/* r1 += -8 */
@@ -220,20 +242,28 @@ static int probe_kern_probe_read_kernel(void)
 	};
 	int fd, insn_cnt = ARRAY_SIZE(insns);
 
-	fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, NULL);
+	fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts);
 	return probe_fd(fd);
 }
 
-static int probe_prog_bind_map(void)
+static int probe_prog_bind_map(int token_fd)
 {
 	char *cp, errmsg[STRERR_BUFSIZE];
 	struct bpf_insn insns[] = {
 		BPF_MOV64_IMM(BPF_REG_0, 0),
 		BPF_EXIT_INSN(),
 	};
+	LIBBPF_OPTS(bpf_map_create_opts, map_opts,
+		.token_fd = token_fd,
+		.map_flags = token_fd ? BPF_F_TOKEN_FD : 0,
+	);
+	LIBBPF_OPTS(bpf_prog_load_opts, prog_opts,
+		.token_fd = token_fd,
+		.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
+	);
 	int ret, map, prog, insn_cnt = ARRAY_SIZE(insns);
 
-	map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_det_bind", sizeof(int), 32, 1, NULL);
+	map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_det_bind", sizeof(int), 32, 1, &map_opts);
 	if (map < 0) {
 		ret = -errno;
 		cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
@@ -242,7 +272,7 @@ static int probe_prog_bind_map(void)
 		return ret;
 	}
 
-	prog = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL);
+	prog = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &prog_opts);
 	if (prog < 0) {
 		close(map);
 		return 0;
@@ -256,7 +286,7 @@ static int probe_prog_bind_map(void)
 	return ret >= 0;
 }
 
-static int probe_module_btf(void)
+static int probe_module_btf(int token_fd)
 {
 	static const char strs[] = "\0int";
 	__u32 types[] = {
@@ -268,7 +298,7 @@ static int probe_module_btf(void)
 	char name[16];
 	int fd, err;
 
-	fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs));
+	fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs), token_fd);
 	if (fd < 0)
 		return 0; /* BTF not supported at all */
 
@@ -285,16 +315,20 @@ static int probe_module_btf(void)
 	return !err;
 }
 
-static int probe_perf_link(void)
+static int probe_perf_link(int token_fd)
 {
 	struct bpf_insn insns[] = {
 		BPF_MOV64_IMM(BPF_REG_0, 0),
 		BPF_EXIT_INSN(),
 	};
+	LIBBPF_OPTS(bpf_prog_load_opts, opts,
+		.token_fd = token_fd,
+		.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
+	);
 	int prog_fd, link_fd, err;
 
 	prog_fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL",
-				insns, ARRAY_SIZE(insns), NULL);
+				insns, ARRAY_SIZE(insns), &opts);
 	if (prog_fd < 0)
 		return -errno;
 
@@ -311,10 +345,12 @@ static int probe_perf_link(void)
 	return link_fd < 0 && err == -EBADF;
 }
 
-static int probe_uprobe_multi_link(void)
+static int probe_uprobe_multi_link(int token_fd)
 {
 	LIBBPF_OPTS(bpf_prog_load_opts, load_opts,
 		.expected_attach_type = BPF_TRACE_UPROBE_MULTI,
+		.token_fd = token_fd,
+		.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
 	);
 	LIBBPF_OPTS(bpf_link_create_opts, link_opts);
 	struct bpf_insn insns[] = {
@@ -344,19 +380,23 @@ static int probe_uprobe_multi_link(void)
 	return link_fd < 0 && err == -EBADF;
 }
 
-static int probe_kern_bpf_cookie(void)
+static int probe_kern_bpf_cookie(int token_fd)
 {
 	struct bpf_insn insns[] = {
 		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie),
 		BPF_EXIT_INSN(),
 	};
+	LIBBPF_OPTS(bpf_prog_load_opts, opts,
+		.token_fd = token_fd,
+		.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
+	);
 	int ret, insn_cnt = ARRAY_SIZE(insns);
 
-	ret = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL", insns, insn_cnt, NULL);
+	ret = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts);
 	return probe_fd(ret);
 }
 
-static int probe_kern_btf_enum64(void)
+static int probe_kern_btf_enum64(int token_fd)
 {
 	static const char strs[] = "\0enum64";
 	__u32 types[] = {
@@ -364,10 +404,10 @@ static int probe_kern_btf_enum64(void)
 	};
 
 	return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
-					     strs, sizeof(strs)));
+					     strs, sizeof(strs), token_fd));
 }
 
-typedef int (*feature_probe_fn)(void);
+typedef int (*feature_probe_fn)(int /* token_fd */);
 
 static struct kern_feature_cache feature_cache;
 
@@ -448,7 +488,7 @@ bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_
 		cache = &feature_cache;
 
 	if (READ_ONCE(cache->res[feat_id]) == FEAT_UNKNOWN) {
-		ret = feat->probe();
+		ret = feat->probe(cache->token_fd);
 		if (ret > 0) {
 			WRITE_ONCE(cache->res[feat_id], FEAT_SUPPORTED);
 		} else if (ret == 0) {
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8e70532420a3..a1486309b700 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10167,7 +10167,7 @@ static const char *arch_specific_syscall_pfx(void)
 #endif
 }
 
-int probe_kern_syscall_wrapper(void)
+int probe_kern_syscall_wrapper(int token_fd)
 {
 	char syscall_name[64];
 	const char *ksys_pfx;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index db4a499c0ec5..b45566e428d7 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -368,19 +368,21 @@ enum kern_feature_result {
 
 struct kern_feature_cache {
 	enum kern_feature_result res[__FEAT_CNT];
+	int token_fd;
 };
 
 bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id);
 bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id);
 
-int probe_kern_syscall_wrapper(void);
-int probe_memcg_account(void);
+int probe_kern_syscall_wrapper(int token_fd);
+int probe_memcg_account(int token_fd);
 int bump_rlimit_memlock(void);
 
 int parse_cpu_mask_str(const char *s, bool **mask, int *mask_sz);
 int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz);
 int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
-			 const char *str_sec, size_t str_len);
+			 const char *str_sec, size_t str_len,
+			 int token_fd);
 int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level);
 
 struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf);
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 9c4db90b92b6..abd10a02d420 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -219,7 +219,8 @@ int libbpf_probe_bpf_prog_type(enum bpf_prog_type prog_type, const void *opts)
 }
 
 int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
-			 const char *str_sec, size_t str_len)
+			 const char *str_sec, size_t str_len,
+			 int token_fd)
 {
 	struct btf_header hdr = {
 		.magic = BTF_MAGIC,
@@ -229,6 +230,10 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
 		.str_off = types_len,
 		.str_len = str_len,
 	};
+	LIBBPF_OPTS(bpf_btf_load_opts, opts,
+		.token_fd = token_fd,
+		.btf_flags = token_fd ? BPF_F_TOKEN_FD : 0,
+	);
 	int btf_fd, btf_len;
 	__u8 *raw_btf;
 
@@ -241,7 +246,7 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
 	memcpy(raw_btf + hdr.hdr_len, raw_types, hdr.type_len);
 	memcpy(raw_btf + hdr.hdr_len + hdr.type_len, str_sec, hdr.str_len);
 
-	btf_fd = bpf_btf_load(raw_btf, btf_len, NULL);
+	btf_fd = bpf_btf_load(raw_btf, btf_len, &opts);
 
 	free(raw_btf);
 	return btf_fd;
@@ -271,7 +276,7 @@ static int load_local_storage_btf(void)
 	};
 
 	return libbpf__load_raw_btf((char *)types, sizeof(types),
-				     strs, sizeof(strs));
+				     strs, sizeof(strs), 0);
 }
 
 static int probe_map_create(enum bpf_map_type map_type)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 25/29] libbpf: wire up BPF token support at BPF object level
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (23 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 24/29] libbpf: wire up token_fd into feature probing logic Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 26/29] selftests/bpf: add BPF object loading tests with explicit token passing Andrii Nakryiko
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Add BPF token support to BPF object-level functionality.

BPF token is supported by BPF object logic either as an explicitly
provided BPF token from outside (through BPF FS path), or implicitly
(unless prevented through bpf_object_open_opts).

Implicit mode is assumed to be the most common one for user namespaced
unprivileged workloads. The assumption is that privileged container
manager sets up default BPF FS mount point at /sys/fs/bpf with BPF token
delegation options (delegate_{cmds,maps,progs,attachs} mount options).
BPF object during loading will attempt to create BPF token from
/sys/fs/bpf location, and pass it for all relevant operations
(currently, map creation, BTF load, and program load).

In this implicit mode, if BPF token creation fails due to whatever
reason (BPF FS is not mounted, or kernel doesn't support BPF token,
etc), this is not considered an error. BPF object loading sequence will
proceed with no BPF token.

In explicit BPF token mode, user provides explicitly custom BPF FS mount
point path. In such case, BPF object will attempt to create BPF token
from provided BPF FS location. If BPF token creation fails, that is
considered a critical error and BPF object load fails with an error.

Libbpf provides a way to disable implicit BPF token creation, if it
causes any troubles (BPF token is designed to be completely optional and
shouldn't cause any problems even if provided, but in the world of BPF
LSM, custom security logic can be installed that might change outcome
depending on the presence of BPF token). To disable libbpf's default BPF
token creation behavior user should provide either invalid BPF token FD
(negative), or empty bpf_token_path option.

BPF token presence can influence libbpf's feature probing, so if BPF
object has associated BPF token, feature probing is instructed to use
BPF object-specific feature detection cache and token FD.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/btf.c             |  10 +++-
 tools/lib/bpf/libbpf.c          | 102 ++++++++++++++++++++++++++++++--
 tools/lib/bpf/libbpf.h          |  13 +++-
 tools/lib/bpf/libbpf_internal.h |  17 +++++-
 4 files changed, 131 insertions(+), 11 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index ee95fd379d4d..ec92b87cae01 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1317,7 +1317,9 @@ struct btf *btf__parse_split(const char *path, struct btf *base_btf)
 
 static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endian);
 
-int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level)
+int btf_load_into_kernel(struct btf *btf,
+			 char *log_buf, size_t log_sz, __u32 log_level,
+			 int token_fd)
 {
 	LIBBPF_OPTS(bpf_btf_load_opts, opts);
 	__u32 buf_sz = 0, raw_size;
@@ -1367,6 +1369,10 @@ int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 lo
 		opts.log_level = log_level;
 	}
 
+	opts.token_fd = token_fd;
+	if (token_fd)
+		opts.btf_flags |= BPF_F_TOKEN_FD;
+
 	btf->fd = bpf_btf_load(raw_data, raw_size, &opts);
 	if (btf->fd < 0) {
 		/* time to turn on verbose mode and try again */
@@ -1394,7 +1400,7 @@ int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 lo
 
 int btf__load_into_kernel(struct btf *btf)
 {
-	return btf_load_into_kernel(btf, NULL, 0, 0);
+	return btf_load_into_kernel(btf, NULL, 0, 0, 0);
 }
 
 int btf__fd(const struct btf *btf)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a1486309b700..69d87d743557 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -59,6 +59,8 @@
 #define BPF_FS_MAGIC		0xcafe4a11
 #endif
 
+#define BPF_FS_DEFAULT_PATH "/sys/fs/bpf"
+
 #define BPF_INSN_SZ (sizeof(struct bpf_insn))
 
 /* vsprintf() in __base_pr() uses nonliteral format string. It may break
@@ -693,6 +695,10 @@ struct bpf_object {
 
 	struct usdt_manager *usdt_man;
 
+	struct kern_feature_cache *feat_cache;
+	char *token_path;
+	int token_fd;
+
 	char path[];
 };
 
@@ -2192,7 +2198,7 @@ static int build_map_pin_path(struct bpf_map *map, const char *path)
 	int err;
 
 	if (!path)
-		path = "/sys/fs/bpf";
+		path = BPF_FS_DEFAULT_PATH;
 
 	err = pathname_concat(buf, sizeof(buf), path, bpf_map__name(map));
 	if (err)
@@ -3279,7 +3285,7 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj)
 	} else {
 		/* currently BPF_BTF_LOAD only supports log_level 1 */
 		err = btf_load_into_kernel(kern_btf, obj->log_buf, obj->log_size,
-					   obj->log_level ? 1 : 0);
+					   obj->log_level ? 1 : 0, obj->token_fd);
 	}
 	if (sanitize) {
 		if (!err) {
@@ -4604,6 +4610,58 @@ int bpf_map__set_max_entries(struct bpf_map *map, __u32 max_entries)
 	return 0;
 }
 
+static int bpf_object_prepare_token(struct bpf_object *obj)
+{
+	const char *bpffs_path;
+	int bpffs_fd = -1, token_fd, err;
+	bool mandatory;
+	enum libbpf_print_level level;
+
+	/* token is explicitly prevented */
+	if (obj->token_path && obj->token_path[0] == '\0') {
+		pr_debug("object '%s': token is prevented, skipping...\n", obj->name);
+		return 0;
+	}
+
+	mandatory = obj->token_path != NULL;
+	level = mandatory ? LIBBPF_WARN : LIBBPF_DEBUG;
+
+	bpffs_path = obj->token_path ?: BPF_FS_DEFAULT_PATH;
+	bpffs_fd = open(bpffs_path, O_DIRECTORY, O_RDWR);
+	if (bpffs_fd < 0) {
+		err = -errno;
+		__pr(level, "object '%s': failed (%d) to open BPF FS mount at '%s'%s\n",
+		     obj->name, err, bpffs_path,
+		     mandatory ? "" : ", skipping optional step...");
+		return mandatory ? err : 0;
+	}
+
+	token_fd = bpf_token_create(bpffs_fd, 0);
+	close(bpffs_fd);
+	if (token_fd < 0) {
+		if (!mandatory && token_fd == -ENOENT) {
+			pr_debug("object '%s': BPF FS at '%s' doesn't have BPF token delegation set up, skipping...\n",
+				 obj->name, bpffs_path);
+			return 0;
+		}
+		__pr(level, "object '%s': failed (%d) to create BPF token from '%s'%s\n",
+		     obj->name, token_fd, bpffs_path,
+		     mandatory ? "" : ", skipping optional step...");
+		return mandatory ? token_fd : 0;
+	}
+
+	obj->feat_cache = calloc(1, sizeof(*obj->feat_cache));
+	if (!obj->feat_cache) {
+		close(token_fd);
+		return -ENOMEM;
+	}
+
+	obj->token_fd = token_fd;
+	obj->feat_cache->token_fd = token_fd;
+
+	return 0;
+}
+
 static int
 bpf_object__probe_loading(struct bpf_object *obj)
 {
@@ -4613,6 +4671,10 @@ bpf_object__probe_loading(struct bpf_object *obj)
 		BPF_EXIT_INSN(),
 	};
 	int ret, insn_cnt = ARRAY_SIZE(insns);
+	LIBBPF_OPTS(bpf_prog_load_opts, opts,
+		.token_fd = obj->token_fd,
+		.prog_flags = obj->token_fd ? BPF_F_TOKEN_FD : 0,
+	);
 
 	if (obj->gen_loader)
 		return 0;
@@ -4622,9 +4684,9 @@ bpf_object__probe_loading(struct bpf_object *obj)
 		pr_warn("Failed to bump RLIMIT_MEMLOCK (err = %d), you might need to do it explicitly!\n", ret);
 
 	/* make sure basic loading works */
-	ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL);
+	ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &opts);
 	if (ret < 0)
-		ret = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, NULL);
+		ret = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts);
 	if (ret < 0) {
 		ret = errno;
 		cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
@@ -4647,6 +4709,9 @@ bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
 		 */
 		return true;
 
+	if (obj->token_fd)
+		return feat_supported(obj->feat_cache, feat_id);
+
 	return feat_supported(NULL, feat_id);
 }
 
@@ -4766,6 +4831,9 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 	create_attr.map_flags = def->map_flags;
 	create_attr.numa_node = map->numa_node;
 	create_attr.map_extra = map->map_extra;
+	create_attr.token_fd = obj->token_fd;
+	if (obj->token_fd)
+		create_attr.map_flags |= BPF_F_TOKEN_FD;
 
 	if (bpf_map__is_struct_ops(map))
 		create_attr.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id;
@@ -6617,6 +6685,10 @@ static int bpf_object_load_prog(struct bpf_object *obj, struct bpf_program *prog
 	load_attr.prog_flags = prog->prog_flags;
 	load_attr.fd_array = obj->fd_array;
 
+	load_attr.token_fd = obj->token_fd;
+	if (obj->token_fd)
+		load_attr.prog_flags |= BPF_F_TOKEN_FD;
+
 	/* adjust load_attr if sec_def provides custom preload callback */
 	if (prog->sec_def && prog->sec_def->prog_prepare_load_fn) {
 		err = prog->sec_def->prog_prepare_load_fn(prog, &load_attr, prog->sec_def->cookie);
@@ -7062,7 +7134,7 @@ static int bpf_object_init_progs(struct bpf_object *obj, const struct bpf_object
 static struct bpf_object *bpf_object_open(const char *path, const void *obj_buf, size_t obj_buf_sz,
 					  const struct bpf_object_open_opts *opts)
 {
-	const char *obj_name, *kconfig, *btf_tmp_path;
+	const char *obj_name, *kconfig, *btf_tmp_path, *token_path;
 	struct bpf_object *obj;
 	char tmp_name[64];
 	int err;
@@ -7099,6 +7171,10 @@ static struct bpf_object *bpf_object_open(const char *path, const void *obj_buf,
 	if (log_size && !log_buf)
 		return ERR_PTR(-EINVAL);
 
+	token_path = OPTS_GET(opts, bpf_token_path, NULL);
+	if (token_path && strlen(token_path) >= PATH_MAX)
+		return ERR_PTR(-ENAMETOOLONG);
+
 	obj = bpf_object__new(path, obj_buf, obj_buf_sz, obj_name);
 	if (IS_ERR(obj))
 		return obj;
@@ -7107,6 +7183,14 @@ static struct bpf_object *bpf_object_open(const char *path, const void *obj_buf,
 	obj->log_size = log_size;
 	obj->log_level = log_level;
 
+	if (token_path) {
+		obj->token_path = strdup(token_path);
+		if (!obj->token_path) {
+			err = -ENOMEM;
+			goto out;
+		}
+	}
+
 	btf_tmp_path = OPTS_GET(opts, btf_custom_path, NULL);
 	if (btf_tmp_path) {
 		if (strlen(btf_tmp_path) >= PATH_MAX) {
@@ -7617,7 +7701,8 @@ static int bpf_object_load(struct bpf_object *obj, int extra_log_level, const ch
 	if (obj->gen_loader)
 		bpf_gen__init(obj->gen_loader, extra_log_level, obj->nr_programs, obj->nr_maps);
 
-	err = bpf_object__probe_loading(obj);
+	err = bpf_object_prepare_token(obj);
+	err = err ? : bpf_object__probe_loading(obj);
 	err = err ? : bpf_object__load_vmlinux_btf(obj, false);
 	err = err ? : bpf_object__resolve_externs(obj, obj->kconfig);
 	err = err ? : bpf_object__sanitize_and_load_btf(obj);
@@ -8154,6 +8239,11 @@ void bpf_object__close(struct bpf_object *obj)
 	}
 	zfree(&obj->programs);
 
+	zfree(&obj->feat_cache);
+	zfree(&obj->token_path);
+	if (obj->token_fd > 0)
+		close(obj->token_fd);
+
 	free(obj);
 }
 
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 6cd9c501624f..535ae15ed493 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -177,10 +177,21 @@ struct bpf_object_open_opts {
 	 * logs through its print callback.
 	 */
 	__u32 kernel_log_level;
+	/* Path to BPF FS mount point to derive BPF token from.
+	 *
+	 * Created BPF token will be used for all bpf() syscall operations
+	 * that accept BPF token (e.g., map creation, BTF and program loads,
+	 * etc) automatically within instantiated BPF object.
+	 *
+	 * Setting bpf_token_path option to empty string disables libbpf's
+	 * automatic attempt to create BPF token from default BPF FS mount
+	 * point (/sys/fs/bpf), in case this default behavior is undesirable.
+	 */
+	const char *bpf_token_path;
 
 	size_t :0;
 };
-#define bpf_object_open_opts__last_field kernel_log_level
+#define bpf_object_open_opts__last_field bpf_token_path
 
 /**
  * @brief **bpf_object__open()** creates a bpf_object by opening
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index b45566e428d7..4cda32298c49 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -383,7 +383,9 @@ int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz);
 int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
 			 const char *str_sec, size_t str_len,
 			 int token_fd);
-int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level);
+int btf_load_into_kernel(struct btf *btf,
+			 char *log_buf, size_t log_sz, __u32 log_level,
+			 int token_fd);
 
 struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf);
 void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
@@ -547,6 +549,17 @@ static inline bool is_ldimm64_insn(struct bpf_insn *insn)
 	return insn->code == (BPF_LD | BPF_IMM | BPF_DW);
 }
 
+/* Unconditionally dup FD, ensuring it doesn't use [0, 2] range.
+ * Original FD is not closed or altered in any other way.
+ * Preserves original FD value, if it's invalid (negative).
+ */
+static inline int dup_good_fd(int fd)
+{
+	if (fd < 0)
+		return fd;
+	return fcntl(fd, F_DUPFD_CLOEXEC, 3);
+}
+
 /* if fd is stdin, stdout, or stderr, dup to a fd greater than 2
  * Takes ownership of the fd passed in, and closes it if calling
  * fcntl(fd, F_DUPFD_CLOEXEC, 3).
@@ -558,7 +571,7 @@ static inline int ensure_good_fd(int fd)
 	if (fd < 0)
 		return fd;
 	if (fd < 3) {
-		fd = fcntl(fd, F_DUPFD_CLOEXEC, 3);
+		fd = dup_good_fd(fd);
 		saved_errno = errno;
 		close(old_fd);
 		errno = saved_errno;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 26/29] selftests/bpf: add BPF object loading tests with explicit token passing
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (24 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 25/29] libbpf: wire up BPF token support at BPF object level Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 27/29] selftests/bpf: add tests for BPF object load with implicit token Andrii Nakryiko
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Add a few tests that attempt to load BPF object containing privileged
map, program, and the one requiring mandatory BTF uploading into the
kernel (to validate token FD propagation to BPF_BTF_LOAD command).

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/token.c  | 140 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/priv_map.c  |  13 ++
 tools/testing/selftests/bpf/progs/priv_prog.c |  13 ++
 3 files changed, 166 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/priv_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/priv_prog.c

diff --git a/tools/testing/selftests/bpf/prog_tests/token.c b/tools/testing/selftests/bpf/prog_tests/token.c
index 185ed2f79315..1594d9b94b13 100644
--- a/tools/testing/selftests/bpf/prog_tests/token.c
+++ b/tools/testing/selftests/bpf/prog_tests/token.c
@@ -14,6 +14,9 @@
 #include <sys/socket.h>
 #include <sys/syscall.h>
 #include <sys/un.h>
+#include "priv_map.skel.h"
+#include "priv_prog.skel.h"
+#include "dummy_st_ops_success.skel.h"
 
 static inline int sys_mount(const char *dev_name, const char *dir_name,
 			    const char *type, unsigned long flags,
@@ -666,6 +669,104 @@ static int userns_prog_load(int mnt_fd)
 	return err;
 }
 
+static int userns_obj_priv_map(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts);
+	char buf[256];
+	struct priv_map *skel;
+	int err;
+
+	skel = priv_map__open_and_load();
+	if (!ASSERT_ERR_PTR(skel, "obj_tokenless_load")) {
+		priv_map__destroy(skel);
+		return -EINVAL;
+	}
+
+	/* use bpf_token_path to provide BPF FS path */
+	snprintf(buf, sizeof(buf), "/proc/self/fd/%d", mnt_fd);
+	opts.bpf_token_path = buf;
+	skel = priv_map__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "obj_token_path_open"))
+		return -EINVAL;
+
+	err = priv_map__load(skel);
+	priv_map__destroy(skel);
+	if (!ASSERT_OK(err, "obj_token_path_load"))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int userns_obj_priv_prog(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts);
+	char buf[256];
+	struct priv_prog *skel;
+	int err;
+
+	skel = priv_prog__open_and_load();
+	if (!ASSERT_ERR_PTR(skel, "obj_tokenless_load")) {
+		priv_prog__destroy(skel);
+		return -EINVAL;
+	}
+
+	/* use bpf_token_path to provide BPF FS path */
+	snprintf(buf, sizeof(buf), "/proc/self/fd/%d", mnt_fd);
+	opts.bpf_token_path = buf;
+	skel = priv_prog__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "obj_token_path_open"))
+		return -EINVAL;
+
+	err = priv_prog__load(skel);
+	priv_prog__destroy(skel);
+	if (!ASSERT_OK(err, "obj_token_path_load"))
+		return -EINVAL;
+
+	return 0;
+}
+
+/* this test is called with BPF FS that doesn't delegate BPF_BTF_LOAD command,
+ * which should cause struct_ops application to fail, as BTF won't be uploaded
+ * into the kernel, even if STRUCT_OPS programs themselves are allowed
+ */
+static int validate_struct_ops_load(int mnt_fd, bool expect_success)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts);
+	char buf[256];
+	struct dummy_st_ops_success *skel;
+	int err;
+
+	snprintf(buf, sizeof(buf), "/proc/self/fd/%d", mnt_fd);
+	opts.bpf_token_path = buf;
+	skel = dummy_st_ops_success__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "obj_token_path_open"))
+		return -EINVAL;
+
+	err = dummy_st_ops_success__load(skel);
+	dummy_st_ops_success__destroy(skel);
+	if (expect_success) {
+		if (!ASSERT_OK(err, "obj_token_path_load"))
+			return -EINVAL;
+	} else /* expect failure */ {
+		if (!ASSERT_ERR(err, "obj_token_path_load"))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int userns_obj_priv_btf_fail(int mnt_fd)
+{
+	return validate_struct_ops_load(mnt_fd, false /* should fail */);
+}
+
+static int userns_obj_priv_btf_success(int mnt_fd)
+{
+	return validate_struct_ops_load(mnt_fd, true /* should succeed */);
+}
+
+#define bit(n) (1ULL << (n))
+
 void test_token(void)
 {
 	if (test__start_subtest("map_token")) {
@@ -692,4 +793,43 @@ void test_token(void)
 
 		subtest_userns(&opts, userns_prog_load);
 	}
+	if (test__start_subtest("obj_priv_map")) {
+		struct bpffs_opts opts = {
+			.cmds = bit(BPF_MAP_CREATE),
+			.maps = bit(BPF_MAP_TYPE_QUEUE),
+		};
+
+		subtest_userns(&opts, userns_obj_priv_map);
+	}
+	if (test__start_subtest("obj_priv_prog")) {
+		struct bpffs_opts opts = {
+			.cmds = bit(BPF_PROG_LOAD),
+			.progs = bit(BPF_PROG_TYPE_KPROBE),
+			.attachs = ~0ULL,
+		};
+
+		subtest_userns(&opts, userns_obj_priv_prog);
+	}
+	if (test__start_subtest("obj_priv_btf_fail")) {
+		struct bpffs_opts opts = {
+			/* disallow BTF loading */
+			.cmds = bit(BPF_MAP_CREATE) | bit(BPF_PROG_LOAD),
+			.maps = bit(BPF_MAP_TYPE_STRUCT_OPS),
+			.progs = bit(BPF_PROG_TYPE_STRUCT_OPS),
+			.attachs = ~0ULL,
+		};
+
+		subtest_userns(&opts, userns_obj_priv_btf_fail);
+	}
+	if (test__start_subtest("obj_priv_btf_success")) {
+		struct bpffs_opts opts = {
+			/* allow BTF loading */
+			.cmds = bit(BPF_BTF_LOAD) | bit(BPF_MAP_CREATE) | bit(BPF_PROG_LOAD),
+			.maps = bit(BPF_MAP_TYPE_STRUCT_OPS),
+			.progs = bit(BPF_PROG_TYPE_STRUCT_OPS),
+			.attachs = ~0ULL,
+		};
+
+		subtest_userns(&opts, userns_obj_priv_btf_success);
+	}
 }
diff --git a/tools/testing/selftests/bpf/progs/priv_map.c b/tools/testing/selftests/bpf/progs/priv_map.c
new file mode 100644
index 000000000000..9085be50f03b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/priv_map.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_QUEUE);
+	__uint(max_entries, 1);
+	__type(value, __u32);
+} priv_map SEC(".maps");
diff --git a/tools/testing/selftests/bpf/progs/priv_prog.c b/tools/testing/selftests/bpf/progs/priv_prog.c
new file mode 100644
index 000000000000..3c7b2b618c8a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/priv_prog.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+SEC("kprobe")
+int kprobe_prog(void *ctx)
+{
+	return 1;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 27/29] selftests/bpf: add tests for BPF object load with implicit token
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (25 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 26/29] selftests/bpf: add BPF object loading tests with explicit token passing Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 28/29] libbpf: support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar Andrii Nakryiko
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Add a test to validate libbpf's implicit BPF token creation from default
BPF FS location (/sys/fs/bpf). Also validate that disabling this
implicit BPF token creation works.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/token.c  | 64 +++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/token.c b/tools/testing/selftests/bpf/prog_tests/token.c
index 1594d9b94b13..003f7c208f4c 100644
--- a/tools/testing/selftests/bpf/prog_tests/token.c
+++ b/tools/testing/selftests/bpf/prog_tests/token.c
@@ -12,6 +12,7 @@
 #include <linux/unistd.h>
 #include <linux/mount.h>
 #include <sys/socket.h>
+#include <sys/stat.h>
 #include <sys/syscall.h>
 #include <sys/un.h>
 #include "priv_map.skel.h"
@@ -45,6 +46,13 @@ static inline int sys_fsmount(int fs_fd, unsigned flags, unsigned ms_flags)
 	return syscall(__NR_fsmount, fs_fd, flags, ms_flags);
 }
 
+static inline int sys_move_mount(int from_dfd, const char *from_path,
+				 int to_dfd, const char *to_path,
+				 unsigned flags)
+{
+	return syscall(__NR_move_mount, from_dfd, from_path, to_dfd, to_path, flags);
+}
+
 static int drop_priv_caps(__u64 *old_caps)
 {
 	return cap_disable_effective((1ULL << CAP_BPF) |
@@ -765,6 +773,51 @@ static int userns_obj_priv_btf_success(int mnt_fd)
 	return validate_struct_ops_load(mnt_fd, true /* should succeed */);
 }
 
+static int userns_obj_priv_implicit_token(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts);
+	struct dummy_st_ops_success *skel;
+	int err;
+
+	/* before we mount BPF FS with token delegation, struct_ops skeleton
+	 * should fail to load
+	 */
+	skel = dummy_st_ops_success__open_and_load();
+	if (!ASSERT_ERR_PTR(skel, "obj_tokenless_load")) {
+		dummy_st_ops_success__destroy(skel);
+		return -EINVAL;
+	}
+
+	/* mount custom BPF FS over /sys/fs/bpf so that libbpf can create BPF
+	 * token automatically and implicitly
+	 */
+	err = sys_move_mount(mnt_fd, "", AT_FDCWD, "/sys/fs/bpf", MOVE_MOUNT_F_EMPTY_PATH);
+	if (!ASSERT_OK(err, "move_mount_bpffs"))
+		return -EINVAL;
+
+	/* now the same struct_ops skeleton should succeed thanks to libppf
+	 * creating BPF token from /sys/fs/bpf mount point
+	 */
+	skel = dummy_st_ops_success__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "obj_implicit_token_load"))
+		return -EINVAL;
+
+	dummy_st_ops_success__destroy(skel);
+
+	/* now disable implicit token through empty bpf_token_path, should fail */
+	opts.bpf_token_path = "";
+	skel = dummy_st_ops_success__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "obj_empty_token_path_open"))
+		return -EINVAL;
+
+	err = dummy_st_ops_success__load(skel);
+	dummy_st_ops_success__destroy(skel);
+	if (!ASSERT_ERR(err, "obj_empty_token_path_load"))
+		return -EINVAL;
+
+	return 0;
+}
+
 #define bit(n) (1ULL << (n))
 
 void test_token(void)
@@ -832,4 +885,15 @@ void test_token(void)
 
 		subtest_userns(&opts, userns_obj_priv_btf_success);
 	}
+	if (test__start_subtest("obj_priv_implicit_token")) {
+		struct bpffs_opts opts = {
+			/* allow BTF loading */
+			.cmds = bit(BPF_BTF_LOAD) | bit(BPF_MAP_CREATE) | bit(BPF_PROG_LOAD),
+			.maps = bit(BPF_MAP_TYPE_STRUCT_OPS),
+			.progs = bit(BPF_PROG_TYPE_STRUCT_OPS),
+			.attachs = ~0ULL,
+		};
+
+		subtest_userns(&opts, userns_obj_priv_implicit_token);
+	}
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 28/29] libbpf: support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (26 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 27/29] selftests/bpf: add tests for BPF object load with implicit token Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 22:20 ` [PATCH bpf-next 29/29] selftests/bpf: add tests for " Andrii Nakryiko
  2024-01-03 23:49 ` [PATCH bpf-next 00/29] BPF token Jakub Kicinski
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

To allow external admin authority to override default BPF FS location
(/sys/fs/bpf) for implicit BPF token creation, teach libbpf to recognize
LIBBPF_BPF_TOKEN_PATH envvar. If it is specified and user application
didn't explicitly specify bpf_token_path option, it will be treated
exactly like bpf_token_path option, overriding default /sys/fs/bpf
location and making BPF token mandatory.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 6 ++++++
 tools/lib/bpf/libbpf.h | 8 ++++++++
 2 files changed, 14 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 69d87d743557..85d6ac99ce01 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -7172,6 +7172,12 @@ static struct bpf_object *bpf_object_open(const char *path, const void *obj_buf,
 		return ERR_PTR(-EINVAL);
 
 	token_path = OPTS_GET(opts, bpf_token_path, NULL);
+	/* if user didn't specify bpf_token_path explicitly, check if
+	 * LIBBPF_BPF_TOKEN_PATH envvar was set and treat it as bpf_token_path
+	 * option
+	 */
+	if (!token_path)
+		token_path = getenv("LIBBPF_BPF_TOKEN_PATH");
 	if (token_path && strlen(token_path) >= PATH_MAX)
 		return ERR_PTR(-ENAMETOOLONG);
 
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 535ae15ed493..5723cbbfcc41 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -183,6 +183,14 @@ struct bpf_object_open_opts {
 	 * that accept BPF token (e.g., map creation, BTF and program loads,
 	 * etc) automatically within instantiated BPF object.
 	 *
+	 * If bpf_token_path is not specified, libbpf will consult
+	 * LIBBPF_BPF_TOKEN_PATH environment variable. If set, it will be
+	 * taken as a value of bpf_token_path option and will force libbpf to
+	 * either create BPF token from provided custom BPF FS path, or will
+	 * disable implicit BPF token creation, if envvar value is an empty
+	 * string. bpf_token_path overrides LIBBPF_BPF_TOKEN_PATH, if both are
+	 * set at the same time.
+	 *
 	 * Setting bpf_token_path option to empty string disables libbpf's
 	 * automatic attempt to create BPF token from default BPF FS mount
 	 * point (/sys/fs/bpf), in case this default behavior is undesirable.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH bpf-next 29/29] selftests/bpf: add tests for LIBBPF_BPF_TOKEN_PATH envvar
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (27 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 28/29] libbpf: support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar Andrii Nakryiko
@ 2024-01-03 22:20 ` Andrii Nakryiko
  2024-01-03 23:49 ` [PATCH bpf-next 00/29] BPF token Jakub Kicinski
  29 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-03 22:20 UTC (permalink / raw)
  To: bpf, netdev, paul, brauner, torvalds
  Cc: linux-fsdevel, linux-security-module, kernel-team

Add new subtest validating LIBBPF_BPF_TOKEN_PATH envvar semantics.
Extend existing test to validate that LIBBPF_BPF_TOKEN_PATH allows to
disable implicit BPF token creation by setting envvar to empty string.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/token.c  | 98 +++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/token.c b/tools/testing/selftests/bpf/prog_tests/token.c
index 003f7c208f4c..1f6aa685e6f7 100644
--- a/tools/testing/selftests/bpf/prog_tests/token.c
+++ b/tools/testing/selftests/bpf/prog_tests/token.c
@@ -773,6 +773,9 @@ static int userns_obj_priv_btf_success(int mnt_fd)
 	return validate_struct_ops_load(mnt_fd, true /* should succeed */);
 }
 
+#define TOKEN_ENVVAR "LIBBPF_BPF_TOKEN_PATH"
+#define TOKEN_BPFFS_CUSTOM "/bpf-token-fs"
+
 static int userns_obj_priv_implicit_token(int mnt_fd)
 {
 	LIBBPF_OPTS(bpf_object_open_opts, opts);
@@ -795,6 +798,20 @@ static int userns_obj_priv_implicit_token(int mnt_fd)
 	if (!ASSERT_OK(err, "move_mount_bpffs"))
 		return -EINVAL;
 
+	/* disable implicit BPF token creation by setting
+	 * LIBBPF_BPF_TOKEN_PATH envvar to empty value, load should fail
+	 */
+	err = setenv(TOKEN_ENVVAR, "", 1 /*overwrite*/);
+	if (!ASSERT_OK(err, "setenv_token_path"))
+		return -EINVAL;
+	skel = dummy_st_ops_success__open_and_load();
+	if (!ASSERT_ERR_PTR(skel, "obj_token_envvar_disabled_load")) {
+		unsetenv(TOKEN_ENVVAR);
+		dummy_st_ops_success__destroy(skel);
+		return -EINVAL;
+	}
+	unsetenv(TOKEN_ENVVAR);
+
 	/* now the same struct_ops skeleton should succeed thanks to libppf
 	 * creating BPF token from /sys/fs/bpf mount point
 	 */
@@ -818,6 +835,76 @@ static int userns_obj_priv_implicit_token(int mnt_fd)
 	return 0;
 }
 
+static int userns_obj_priv_implicit_token_envvar(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts);
+	struct dummy_st_ops_success *skel;
+	int err;
+
+	/* before we mount BPF FS with token delegation, struct_ops skeleton
+	 * should fail to load
+	 */
+	skel = dummy_st_ops_success__open_and_load();
+	if (!ASSERT_ERR_PTR(skel, "obj_tokenless_load")) {
+		dummy_st_ops_success__destroy(skel);
+		return -EINVAL;
+	}
+
+	/* mount custom BPF FS over custom location, so libbpf can't create
+	 * BPF token implicitly, unless pointed to it through
+	 * LIBBPF_BPF_TOKEN_PATH envvar
+	 */
+	rmdir(TOKEN_BPFFS_CUSTOM);
+	if (!ASSERT_OK(mkdir(TOKEN_BPFFS_CUSTOM, 0777), "mkdir_bpffs_custom"))
+		goto err_out;
+	err = sys_move_mount(mnt_fd, "", AT_FDCWD, TOKEN_BPFFS_CUSTOM, MOVE_MOUNT_F_EMPTY_PATH);
+	if (!ASSERT_OK(err, "move_mount_bpffs"))
+		goto err_out;
+
+	/* even though we have BPF FS with delegation, it's not at default
+	 * /sys/fs/bpf location, so we still fail to load until envvar is set up
+	 */
+	skel = dummy_st_ops_success__open_and_load();
+	if (!ASSERT_ERR_PTR(skel, "obj_tokenless_load2")) {
+		dummy_st_ops_success__destroy(skel);
+		goto err_out;
+	}
+
+	err = setenv(TOKEN_ENVVAR, TOKEN_BPFFS_CUSTOM, 1 /*overwrite*/);
+	if (!ASSERT_OK(err, "setenv_token_path"))
+		goto err_out;
+
+	/* now the same struct_ops skeleton should succeed thanks to libppf
+	 * creating BPF token from custom mount point
+	 */
+	skel = dummy_st_ops_success__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "obj_implicit_token_load"))
+		goto err_out;
+
+	dummy_st_ops_success__destroy(skel);
+
+	/* now disable implicit token through empty bpf_token_path, envvar
+	 * will be ignored, should fail
+	 */
+	opts.bpf_token_path = "";
+	skel = dummy_st_ops_success__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "obj_empty_token_path_open"))
+		goto err_out;
+
+	err = dummy_st_ops_success__load(skel);
+	dummy_st_ops_success__destroy(skel);
+	if (!ASSERT_ERR(err, "obj_empty_token_path_load"))
+		goto err_out;
+
+	rmdir(TOKEN_BPFFS_CUSTOM);
+	unsetenv(TOKEN_ENVVAR);
+	return 0;
+err_out:
+	rmdir(TOKEN_BPFFS_CUSTOM);
+	unsetenv(TOKEN_ENVVAR);
+	return -EINVAL;
+}
+
 #define bit(n) (1ULL << (n))
 
 void test_token(void)
@@ -896,4 +983,15 @@ void test_token(void)
 
 		subtest_userns(&opts, userns_obj_priv_implicit_token);
 	}
+	if (test__start_subtest("obj_priv_implicit_token_envvar")) {
+		struct bpffs_opts opts = {
+			/* allow BTF loading */
+			.cmds = bit(BPF_BTF_LOAD) | bit(BPF_MAP_CREATE) | bit(BPF_PROG_LOAD),
+			.maps = bit(BPF_MAP_TYPE_STRUCT_OPS),
+			.progs = bit(BPF_PROG_TYPE_STRUCT_OPS),
+			.attachs = ~0ULL,
+		};
+
+		subtest_userns(&opts, userns_obj_priv_implicit_token_envvar);
+	}
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 00/29] BPF token
  2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
                   ` (28 preceding siblings ...)
  2024-01-03 22:20 ` [PATCH bpf-next 29/29] selftests/bpf: add tests for " Andrii Nakryiko
@ 2024-01-03 23:49 ` Jakub Kicinski
  29 siblings, 0 replies; 59+ messages in thread
From: Jakub Kicinski @ 2024-01-03 23:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, paul, brauner, torvalds, linux-fsdevel,
	linux-security-module, kernel-team

On Wed, 3 Jan 2024 14:20:05 -0800 Andrii Nakryiko wrote:
> Subject: [PATCH bpf-next 00/29]

bpf-next? It should go directly to Linus, right?
The merge window starts in 4 days.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 13/29] libbpf: add BPF token support to bpf_map_create() API
  2024-01-03 22:20 ` [PATCH bpf-next 13/29] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
@ 2024-01-04 19:04   ` Linus Torvalds
  2024-01-04 19:23     ` Andrii Nakryiko
  0 siblings, 1 reply; 59+ messages in thread
From: Linus Torvalds @ 2024-01-04 19:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, paul, brauner, linux-fsdevel, linux-security-module,
	kernel-team

On Wed, 3 Jan 2024 at 14:24, Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Add ability to provide token_fd for BPF_MAP_CREATE command through
> bpf_map_create() API.

I'll try to look through the series later, but this email was marked
as spam for me.

And it seems to be due to all your emails failing DMARC, even though
the others came through:

       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org

there's no DKIM signature at all, looks like you never went through
the kernel.org smtp servers.

             Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 13/29] libbpf: add BPF token support to bpf_map_create() API
  2024-01-04 19:04   ` Linus Torvalds
@ 2024-01-04 19:23     ` Andrii Nakryiko
  0 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-04 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrii Nakryiko, bpf, netdev, paul, brauner, linux-fsdevel,
	linux-security-module, kernel-team

On Thu, Jan 4, 2024 at 11:04 AM Linus Torvalds
<torvalds@linuxfoundation.org> wrote:
>
> On Wed, 3 Jan 2024 at 14:24, Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Add ability to provide token_fd for BPF_MAP_CREATE command through
> > bpf_map_create() API.
>
> I'll try to look through the series later, but this email was marked
> as spam for me.

Great, thanks for taking a look!

>
> And it seems to be due to all your emails failing DMARC, even though
> the others came through:
>
>        dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org
>
> there's no DKIM signature at all, looks like you never went through
> the kernel.org smtp servers.

Yep, thanks for flagging, I guess I'll need to go read Konstantin's
instructions and adjust my git send-email workflow.

>
>              Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-03 22:20 ` [PATCH bpf-next 03/29] bpf: introduce BPF token object Andrii Nakryiko
@ 2024-01-05 20:25   ` Linus Torvalds
  2024-01-05 20:32     ` Matthew Wilcox
  2024-01-05 22:05     ` Andrii Nakryiko
  2024-01-05 21:45   ` Linus Torvalds
  2024-01-08 11:44   ` Christian Brauner
  2 siblings, 2 replies; 59+ messages in thread
From: Linus Torvalds @ 2024-01-05 20:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, paul, brauner, linux-fsdevel, linux-security-module,
	kernel-team

I'm still looking through the patches, but in the early parts I do
note this oddity:

On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
>
> +struct bpf_token {
> +       struct work_struct work;
> +       atomic64_t refcnt;
> +       struct user_namespace *userns;
> +       u64 allowed_cmds;
> +};

Ok, not huge, and makes sense, although I wonder if that

        atomic64_t refcnt;

should just be 'atomic_long_t' since presumably on 32-bit
architectures you can't create enough references for a 64-bit atomic
to make much sense.

Or are there references to tokens that might not use any memory?

Not a big deal, but 'atomic64_t' is very expensive on 32-bit
architectures, and doesn't seem to make much sense unless you really
specifically need 64 bits for some reason.

But regardless, this is odd:

> diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> +
> +static void bpf_token_free(struct bpf_token *token)
> +{
> +       put_user_ns(token->userns);
> +       kvfree(token);
> +}

> +int bpf_token_create(union bpf_attr *attr)
> +{
> ....
> +       token = kvzalloc(sizeof(*token), GFP_USER);

Ok, so the kvzalloc() and kvfree() certainly line up, but why use them at all?

kvmalloc() and friends are for "use kmalloc, and fall back on vmalloc
for big allocations when that fails".

For just a structure, a plain 'kzalloc()/kfree()' pair would seem to
make much more sense.

Neither of these issues are at all important, but I mention them
because they made me go "What?" when reading through the patches.

                  Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-05 20:25   ` Linus Torvalds
@ 2024-01-05 20:32     ` Matthew Wilcox
  2024-01-05 20:45       ` Linus Torvalds
  2024-01-05 22:05     ` Andrii Nakryiko
  1 sibling, 1 reply; 59+ messages in thread
From: Matthew Wilcox @ 2024-01-05 20:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrii Nakryiko, bpf, netdev, paul, brauner, linux-fsdevel,
	linux-security-module, kernel-team

On Fri, Jan 05, 2024 at 12:25:42PM -0800, Linus Torvalds wrote:
> > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > +
> > +static void bpf_token_free(struct bpf_token *token)
> > +{
> > +       put_user_ns(token->userns);
> > +       kvfree(token);
> > +}
> 
> > +int bpf_token_create(union bpf_attr *attr)
> > +{
> > ....
> > +       token = kvzalloc(sizeof(*token), GFP_USER);
> 
> Ok, so the kvzalloc() and kvfree() certainly line up, but why use them at all?
> 
> kvmalloc() and friends are for "use kmalloc, and fall back on vmalloc
> for big allocations when that fails".
> 
> For just a structure, a plain 'kzalloc()/kfree()' pair would seem to
> make much more sense.

I can't tell from the description whether there are going to be a lot of
these.  If there are, it might make sense to create a slab cache for
them rather than get them from the general-purpose kmalloc caches.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-05 20:32     ` Matthew Wilcox
@ 2024-01-05 20:45       ` Linus Torvalds
  2024-01-05 22:06         ` Andrii Nakryiko
  0 siblings, 1 reply; 59+ messages in thread
From: Linus Torvalds @ 2024-01-05 20:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrii Nakryiko, bpf, netdev, paul, brauner, linux-fsdevel,
	linux-security-module, kernel-team

On Fri, 5 Jan 2024 at 12:32, Matthew Wilcox <willy@infradead.org> wrote:
>
> I can't tell from the description whether there are going to be a lot of
> these.  If there are, it might make sense to create a slab cache for
> them rather than get them from the general-purpose kmalloc caches.

I suspect it's a "count on the fingers of your hand" thing, and having
a slab cache would be more overhead than you'd ever win.

           Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-03 22:20 ` [PATCH bpf-next 03/29] bpf: introduce BPF token object Andrii Nakryiko
  2024-01-05 20:25   ` Linus Torvalds
@ 2024-01-05 21:45   ` Linus Torvalds
  2024-01-05 22:18     ` Andrii Nakryiko
                       ` (2 more replies)
  2024-01-08 11:44   ` Christian Brauner
  2 siblings, 3 replies; 59+ messages in thread
From: Linus Torvalds @ 2024-01-05 21:45 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, paul, brauner, linux-fsdevel, linux-security-module,
	kernel-team

Ok, I've gone through the whole series now, and I don't find anything
objectionable.

Which may only mean that I didn't notice something, of course, but at
least there's nothing I'd consider obvious.

I keep coming back to this 03/29 patch, because it's kind of the heart
of it, and I have one more small nit, but it's also purely stylistic:

On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
>
> +bool bpf_token_capable(const struct bpf_token *token, int cap)
> +{
> +       /* BPF token allows ns_capable() level of capabilities, but only if
> +        * token's userns is *exactly* the same as current user's userns
> +        */
> +       if (token && current_user_ns() == token->userns) {
> +               if (ns_capable(token->userns, cap))
> +                       return true;
> +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> +                       return true;
> +       }
> +       /* otherwise fallback to capable() checks */
> +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> +}

This *feels* like it should be written as

    bool bpf_token_capable(const struct bpf_token *token, int cap)
    {
        struct user_namespace *ns = &init_ns;

        /* BPF token allows ns_capable() level of capabilities, but only if
         * token's userns is *exactly* the same as current user's userns
         */
        if (token && current_user_ns() == token->userns)
                ns = token->userns;
        return ns_capable(ns, cap) ||
                (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
    }

And yes, I realize that the function will end up later growing a

        security_bpf_token_capable(token, cap)

test inside that 'if (token ..)' statement, and this would change the
order of that test so that the LSM hook would now be done before the
capability checks are done, but that all still seems just more of an
argument for the simplification.

So the end result would be something like

    bool bpf_token_capable(const struct bpf_token *token, int cap)
    {
        struct user_namespace *ns = &init_ns;

        if (token && current_user_ns() == token->userns) {
                if (security_bpf_token_capable(token, cap) < 0)
                        return false;
                ns = token->userns;
        }
        return ns_capable(ns, cap) ||
                (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
    }

although I feel that with that LSM hook, maybe this all should return
the error code (zero or negative), not a bool for success?

Also, should "current_user_ns() != token->userns" perhaps be an error
condition, rather than a "fall back to init_ns" condition?

Again, none of this is a big deal. I do think you're dropping the LSM
error code on the floor, and are duplicating the "ns_capable()" vs
"capable()" logic as-is, but none of this is a deal breaker, just more
of my commentary on the patch and about the logic here.

And yeah, I don't exactly love how you say "ok, if there's a token and
it doesn't match, I'll not use it" rather than "if the token namespace
doesn't match, it's an error", but maybe there's some usability issue
here?

              Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-05 20:25   ` Linus Torvalds
  2024-01-05 20:32     ` Matthew Wilcox
@ 2024-01-05 22:05     ` Andrii Nakryiko
  2024-01-05 22:27       ` Alexei Starovoitov
  1 sibling, 1 reply; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-05 22:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrii Nakryiko, bpf, netdev, paul, brauner, linux-fsdevel,
	linux-security-module, kernel-team

On Fri, Jan 5, 2024 at 12:26 PM Linus Torvalds
<torvalds@linuxfoundation.org> wrote:
>
> I'm still looking through the patches, but in the early parts I do
> note this oddity:
>
> On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > +struct bpf_token {
> > +       struct work_struct work;
> > +       atomic64_t refcnt;
> > +       struct user_namespace *userns;
> > +       u64 allowed_cmds;
> > +};
>
> Ok, not huge, and makes sense, although I wonder if that
>
>         atomic64_t refcnt;
>
> should just be 'atomic_long_t' since presumably on 32-bit
> architectures you can't create enough references for a 64-bit atomic
> to make much sense.
>
> Or are there references to tokens that might not use any memory?
>
> Not a big deal, but 'atomic64_t' is very expensive on 32-bit
> architectures, and doesn't seem to make much sense unless you really
> specifically need 64 bits for some reason.

I used atomic64_t for consistency with other BPF objects (program,
etc) and not to have to worry even about hypothetical overflows.
32-bit atomic performance doesn't seem to be a big concern as a token
is passed into a pretty heavy-weight operations that create new BPF
object (map, program, BTF object), no matter how slow refcounting is
it will be lost in the noise for those operations.

>
> But regardless, this is odd:
>
> > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > +
> > +static void bpf_token_free(struct bpf_token *token)
> > +{
> > +       put_user_ns(token->userns);
> > +       kvfree(token);
> > +}
>
> > +int bpf_token_create(union bpf_attr *attr)
> > +{
> > ....
> > +       token = kvzalloc(sizeof(*token), GFP_USER);
>
> Ok, so the kvzalloc() and kvfree() certainly line up, but why use them at all?

No particular reason, kzalloc/kfree should totally work, it's a small
struct anyways. I will switch.

>
> kvmalloc() and friends are for "use kmalloc, and fall back on vmalloc
> for big allocations when that fails".
>
> For just a structure, a plain 'kzalloc()/kfree()' pair would seem to
> make much more sense.
>
> Neither of these issues are at all important, but I mention them
> because they made me go "What?" when reading through the patches.
>
>                   Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-05 20:45       ` Linus Torvalds
@ 2024-01-05 22:06         ` Andrii Nakryiko
  0 siblings, 0 replies; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-05 22:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Wilcox, Andrii Nakryiko, bpf, netdev, paul, brauner,
	linux-fsdevel, linux-security-module, kernel-team

On Fri, Jan 5, 2024 at 12:46 PM Linus Torvalds
<torvalds@linuxfoundation.org> wrote:
>
> On Fri, 5 Jan 2024 at 12:32, Matthew Wilcox <willy@infradead.org> wrote:
> >
> > I can't tell from the description whether there are going to be a lot of
> > these.  If there are, it might make sense to create a slab cache for
> > them rather than get them from the general-purpose kmalloc caches.
>
> I suspect it's a "count on the fingers of your hand" thing, and having
> a slab cache would be more overhead than you'd ever win.

Yes, you suspect right. It will be mostly one BPF token instance per
application, and even then only if the application is running within a
container that has BPF token set up (through BPF FS instance). So
yeah, slab cache seems like an overkill.

>
>            Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-05 21:45   ` Linus Torvalds
@ 2024-01-05 22:18     ` Andrii Nakryiko
  2024-01-08 12:02       ` Christian Brauner
  2024-01-08 12:01     ` Christian Brauner
  2024-01-08 16:45     ` Paul Moore
  2 siblings, 1 reply; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-05 22:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrii Nakryiko, bpf, netdev, paul, brauner, linux-fsdevel,
	linux-security-module, kernel-team

On Fri, Jan 5, 2024 at 1:45 PM Linus Torvalds
<torvalds@linuxfoundation.org> wrote:
>
> Ok, I've gone through the whole series now, and I don't find anything
> objectionable.

That's great, thanks for reviewing!

>
> Which may only mean that I didn't notice something, of course, but at
> least there's nothing I'd consider obvious.
>
> I keep coming back to this 03/29 patch, because it's kind of the heart
> of it, and I have one more small nit, but it's also purely stylistic:
>
> On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > +{
> > +       /* BPF token allows ns_capable() level of capabilities, but only if
> > +        * token's userns is *exactly* the same as current user's userns
> > +        */
> > +       if (token && current_user_ns() == token->userns) {
> > +               if (ns_capable(token->userns, cap))
> > +                       return true;
> > +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > +                       return true;
> > +       }
> > +       /* otherwise fallback to capable() checks */
> > +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > +}
>
> This *feels* like it should be written as
>
>     bool bpf_token_capable(const struct bpf_token *token, int cap)
>     {
>         struct user_namespace *ns = &init_ns;
>
>         /* BPF token allows ns_capable() level of capabilities, but only if
>          * token's userns is *exactly* the same as current user's userns
>          */
>         if (token && current_user_ns() == token->userns)
>                 ns = token->userns;
>         return ns_capable(ns, cap) ||
>                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
>     }
>
> And yes, I realize that the function will end up later growing a
>
>         security_bpf_token_capable(token, cap)
>
> test inside that 'if (token ..)' statement, and this would change the
> order of that test so that the LSM hook would now be done before the
> capability checks are done, but that all still seems just more of an
> argument for the simplification.
>
> So the end result would be something like
>
>     bool bpf_token_capable(const struct bpf_token *token, int cap)
>     {
>         struct user_namespace *ns = &init_ns;
>
>         if (token && current_user_ns() == token->userns) {
>                 if (security_bpf_token_capable(token, cap) < 0)
>                         return false;
>                 ns = token->userns;
>         }
>         return ns_capable(ns, cap) ||
>                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
>     }

Yep, it makes sense to use ns_capable with init_ns. I'll change those
two patches to end up with something like what you suggested here.

>
> although I feel that with that LSM hook, maybe this all should return
> the error code (zero or negative), not a bool for success?
>
> Also, should "current_user_ns() != token->userns" perhaps be an error
> condition, rather than a "fall back to init_ns" condition?
>
> Again, none of this is a big deal. I do think you're dropping the LSM
> error code on the floor, and are duplicating the "ns_capable()" vs
> "capable()" logic as-is, but none of this is a deal breaker, just more
> of my commentary on the patch and about the logic here.
>
> And yeah, I don't exactly love how you say "ok, if there's a token and
> it doesn't match, I'll not use it" rather than "if the token namespace
> doesn't match, it's an error", but maybe there's some usability issue
> here?

Yes, usability was the primary concern. The overall idea with BPF
token is to make most BPF applications not care or even potentially
know about its existence, and mostly leave it up to administrators
and/or container managers to set up an environment with BPF token
delegation. To make that all possible, libbpf will opportunistically
try to create BPF token from BPF FS in the container (typically
/sys/fs/bpf, but it can be tuned, of course). And so if BPF token can
actually prevent, say, BPF program loading, because it didn't allow
particular program type to be loaded or whatnot, that would be a
regression of behavior relative to if BPF token was never even used.

So I consciously wanted a behavior in which BPF token can be used as a
sort of potential/additional rights, but otherwise just fallback to
current behavior based on capable(CAP_BPF) and other caps we use.

The alternative to the above would be creating a few more APIs to
proactively check if a given BPF token instance would allow whatever
operation libbpf needs to perform, and if not, not using it. Which
would be used to achieve the exact same behavior but in a more round
about way.

And the last piece of thinking was that if the user actually would
want to fail bpf() operation if the BPF token doesn't grant such
permissions, we can add a flag that would force this behavior. Some
sort of BPF_F_TOKEN_STRICT that can be optionally specified. But I
wanted to wait for an actual production use case that would want that
(I'm not aware of any right now).

>
>               Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-05 22:05     ` Andrii Nakryiko
@ 2024-01-05 22:27       ` Alexei Starovoitov
  0 siblings, 0 replies; 59+ messages in thread
From: Alexei Starovoitov @ 2024-01-05 22:27 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, Network Development,
	Paul Moore, Christian Brauner, Linux-Fsdevel, LSM List,
	Kernel Team

On Fri, Jan 5, 2024 at 2:06 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Jan 5, 2024 at 12:26 PM Linus Torvalds
> <torvalds@linuxfoundation.org> wrote:
> >
> > I'm still looking through the patches, but in the early parts I do
> > note this oddity:
> >
> > On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > +struct bpf_token {
> > > +       struct work_struct work;
> > > +       atomic64_t refcnt;
> > > +       struct user_namespace *userns;
> > > +       u64 allowed_cmds;
> > > +};
> >
> > Ok, not huge, and makes sense, although I wonder if that
> >
> >         atomic64_t refcnt;
> >
> > should just be 'atomic_long_t' since presumably on 32-bit
> > architectures you can't create enough references for a 64-bit atomic
> > to make much sense.
> >
> > Or are there references to tokens that might not use any memory?
> >
> > Not a big deal, but 'atomic64_t' is very expensive on 32-bit
> > architectures, and doesn't seem to make much sense unless you really
> > specifically need 64 bits for some reason.
>
> I used atomic64_t for consistency with other BPF objects (program,
> etc) and not to have to worry even about hypothetical overflows.
> 32-bit atomic performance doesn't seem to be a big concern as a token
> is passed into a pretty heavy-weight operations that create new BPF
> object (map, program, BTF object), no matter how slow refcounting is
> it will be lost in the noise for those operations.

To add a bit more context here...

Back in 2016 Jann managed to overflow 32-bit prog/map counters:
"
On a system with >32Gbyte of physical memory,
the malicious application may overflow 32-bit bpf program refcnt.
It's also possible to overflow map refcnt on 1Tb system.
"
We mitigated that with fixed limits:
-       atomic_inc(&map->refcnt);
+       if (atomic_inc_return(&map->refcnt) > BPF_MAX_REFCNT) {
+               atomic_dec(&map->refcnt);
+               return ERR_PTR(-EBUSY);
+       }
but it created quite a lot of error handling pain throughout
the code, so eventually in 2019 we switched to atomic64_t refcnt
and never looked back.
I suspect Jann will be able to overflow 32-bit token refcnt,
so atomic64 was chosen for simplicity.
atomic_long_t might work too, but the effort to think it through
is not worth it at this point, since performance of
inc/dec doesn't matter here.

Eventually we can do a follow up and consistently update
all such counters to atomic_long_t.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-03 22:20 ` [PATCH bpf-next 03/29] bpf: introduce BPF token object Andrii Nakryiko
  2024-01-05 20:25   ` Linus Torvalds
  2024-01-05 21:45   ` Linus Torvalds
@ 2024-01-08 11:44   ` Christian Brauner
  2 siblings, 0 replies; 59+ messages in thread
From: Christian Brauner @ 2024-01-08 11:44 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, paul, torvalds, linux-fsdevel, linux-security-module,
	kernel-team

On Wed, Jan 03, 2024 at 02:20:08PM -0800, Andrii Nakryiko wrote:
> Add new kind of BPF kernel object, BPF token. BPF token is meant to
> allow delegating privileged BPF functionality, like loading a BPF
> program or creating a BPF map, from privileged process to a *trusted*
> unprivileged process, all while having a good amount of control over which
> privileged operations could be performed using provided BPF token.
> 
> This is achieved through mounting BPF FS instance with extra delegation
> mount options, which determine what operations are delegatable, and also
> constraining it to the owning user namespace (as mentioned in the
> previous patch).
> 
> BPF token itself is just a derivative from BPF FS and can be created
> through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
> FS FD, which can be attained through open() API by opening BPF FS mount
> point. Currently, BPF token "inherits" delegated command, map types,
> prog type, and attach type bit sets from BPF FS as is. In the future,
> having an BPF token as a separate object with its own FD, we can allow
> to further restrict BPF token's allowable set of things either at the
> creation time or after the fact, allowing the process to guard itself
> further from unintentionally trying to load undesired kind of BPF
> programs. But for now we keep things simple and just copy bit sets as is.
> 
> When BPF token is created from BPF FS mount, we take reference to the
> BPF super block's owning user namespace, and then use that namespace for
> checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> capabilities that are normally only checked against init userns (using
> capable()), but now we check them using ns_capable() instead (if BPF
> token is provided). See bpf_token_capable() for details.
> 
> Such setup means that BPF token in itself is not sufficient to grant BPF
> functionality. User namespaced process has to *also* have necessary
> combination of capabilities inside that user namespace. So while
> previously CAP_BPF was useless when granted within user namespace, now
> it gains a meaning and allows container managers and sys admins to have
> a flexible control over which processes can and need to use BPF
> functionality within the user namespace (i.e., container in practice).
> And BPF FS delegation mount options and derived BPF tokens serve as
> a per-container "flag" to grant overall ability to use bpf() (plus further
> restrict on which parts of bpf() syscalls are treated as namespaced).
> 
> Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
> within the BPF FS owning user namespace, rounding up the ns_capable()
> story of BPF token.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---

Acked-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-05 21:45   ` Linus Torvalds
  2024-01-05 22:18     ` Andrii Nakryiko
@ 2024-01-08 12:01     ` Christian Brauner
  2024-01-08 16:45     ` Paul Moore
  2 siblings, 0 replies; 59+ messages in thread
From: Christian Brauner @ 2024-01-08 12:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

> Also, should "current_user_ns() != token->userns" perhaps be an error
> condition, rather than a "fall back to init_ns" condition?

Yes, I've pointed this out before:

"Please enforce that in order to use a token the caller must be in the
same user namespace as the token as well. IOW, we don't want to yet make
it possible to use a token created in an ancestor user namespace to load
or attach bpf programs in a descendant user namespace. Let's be as
restrictive as we can: tokens are only valid within the user namespace
they were created in."

[1] Re: [PATCH v11 bpf-next 03/17] bpf: introduce BPF token object
    https://lore.kernel.org/r/20231130-katzen-anhand-7ad530f187da@brauner

> 
> Again, none of this is a big deal. I do think you're dropping the LSM
> error code on the floor, and are duplicating the "ns_capable()" vs
> "capable()" logic as-is, but none of this is a deal breaker, just more
> of my commentary on the patch and about the logic here.
> 
> And yeah, I don't exactly love how you say "ok, if there's a token and
> it doesn't match, I'll not use it" rather than "if the token namespace
> doesn't match, it's an error", but maybe there's some usability issue
> here?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-05 22:18     ` Andrii Nakryiko
@ 2024-01-08 12:02       ` Christian Brauner
  2024-01-08 23:58         ` Andrii Nakryiko
  0 siblings, 1 reply; 59+ messages in thread
From: Christian Brauner @ 2024-01-08 12:02 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

On Fri, Jan 05, 2024 at 02:18:40PM -0800, Andrii Nakryiko wrote:
> On Fri, Jan 5, 2024 at 1:45 PM Linus Torvalds
> <torvalds@linuxfoundation.org> wrote:
> >
> > Ok, I've gone through the whole series now, and I don't find anything
> > objectionable.
> 
> That's great, thanks for reviewing!
> 
> >
> > Which may only mean that I didn't notice something, of course, but at
> > least there's nothing I'd consider obvious.
> >
> > I keep coming back to this 03/29 patch, because it's kind of the heart
> > of it, and I have one more small nit, but it's also purely stylistic:
> >
> > On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > +{
> > > +       /* BPF token allows ns_capable() level of capabilities, but only if
> > > +        * token's userns is *exactly* the same as current user's userns
> > > +        */
> > > +       if (token && current_user_ns() == token->userns) {
> > > +               if (ns_capable(token->userns, cap))
> > > +                       return true;
> > > +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > +                       return true;
> > > +       }
> > > +       /* otherwise fallback to capable() checks */
> > > +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > +}
> >
> > This *feels* like it should be written as
> >
> >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> >     {
> >         struct user_namespace *ns = &init_ns;
> >
> >         /* BPF token allows ns_capable() level of capabilities, but only if
> >          * token's userns is *exactly* the same as current user's userns
> >          */
> >         if (token && current_user_ns() == token->userns)
> >                 ns = token->userns;
> >         return ns_capable(ns, cap) ||
> >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> >     }
> >
> > And yes, I realize that the function will end up later growing a
> >
> >         security_bpf_token_capable(token, cap)
> >
> > test inside that 'if (token ..)' statement, and this would change the
> > order of that test so that the LSM hook would now be done before the
> > capability checks are done, but that all still seems just more of an
> > argument for the simplification.
> >
> > So the end result would be something like
> >
> >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> >     {
> >         struct user_namespace *ns = &init_ns;
> >
> >         if (token && current_user_ns() == token->userns) {
> >                 if (security_bpf_token_capable(token, cap) < 0)
> >                         return false;
> >                 ns = token->userns;
> >         }
> >         return ns_capable(ns, cap) ||
> >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> >     }
> 
> Yep, it makes sense to use ns_capable with init_ns. I'll change those
> two patches to end up with something like what you suggested here.
> 
> >
> > although I feel that with that LSM hook, maybe this all should return
> > the error code (zero or negative), not a bool for success?
> >
> > Also, should "current_user_ns() != token->userns" perhaps be an error
> > condition, rather than a "fall back to init_ns" condition?
> >
> > Again, none of this is a big deal. I do think you're dropping the LSM
> > error code on the floor, and are duplicating the "ns_capable()" vs
> > "capable()" logic as-is, but none of this is a deal breaker, just more
> > of my commentary on the patch and about the logic here.
> >
> > And yeah, I don't exactly love how you say "ok, if there's a token and
> > it doesn't match, I'll not use it" rather than "if the token namespace
> > doesn't match, it's an error", but maybe there's some usability issue
> > here?
> 
> Yes, usability was the primary concern. The overall idea with BPF

NAK on not restricting this to not erroring out on current_user_ns()
!= token->user_ns. I've said this multiple times before.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-05 21:45   ` Linus Torvalds
  2024-01-05 22:18     ` Andrii Nakryiko
  2024-01-08 12:01     ` Christian Brauner
@ 2024-01-08 16:45     ` Paul Moore
  2024-01-09  0:07       ` Andrii Nakryiko
  2 siblings, 1 reply; 59+ messages in thread
From: Paul Moore @ 2024-01-08 16:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrii Nakryiko, bpf, netdev, brauner, linux-fsdevel,
	linux-security-module, kernel-team

On Fri, Jan 5, 2024 at 4:45 PM Linus Torvalds
<torvalds@linuxfoundation.org> wrote:
> On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > +{
> > +       /* BPF token allows ns_capable() level of capabilities, but only if
> > +        * token's userns is *exactly* the same as current user's userns
> > +        */
> > +       if (token && current_user_ns() == token->userns) {
> > +               if (ns_capable(token->userns, cap))
> > +                       return true;
> > +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > +                       return true;
> > +       }
> > +       /* otherwise fallback to capable() checks */
> > +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > +}
>
> This *feels* like it should be written as
>
>     bool bpf_token_capable(const struct bpf_token *token, int cap)
>     {
>         struct user_namespace *ns = &init_ns;
>
>         /* BPF token allows ns_capable() level of capabilities, but only if
>          * token's userns is *exactly* the same as current user's userns
>          */
>         if (token && current_user_ns() == token->userns)
>                 ns = token->userns;
>         return ns_capable(ns, cap) ||
>                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
>     }
>
> And yes, I realize that the function will end up later growing a
>
>         security_bpf_token_capable(token, cap)
>
> test inside that 'if (token ..)' statement, and this would change the
> order of that test so that the LSM hook would now be done before the
> capability checks are done, but that all still seems just more of an
> argument for the simplification.

I have no problem with rewriting things, my only ask is that we stick
with the idea of doing the capability checks before the LSM hook.  The
DAC-before-MAC (capability-before-LSM) pattern is one we try to stick
to most everywhere in the kernel and deviating from it here could
potentially result in some odd/unexpected behavior from a user
perspective.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-08 12:02       ` Christian Brauner
@ 2024-01-08 23:58         ` Andrii Nakryiko
  2024-01-09 14:52           ` Christian Brauner
  0 siblings, 1 reply; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-08 23:58 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

On Mon, Jan 8, 2024 at 4:02 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Fri, Jan 05, 2024 at 02:18:40PM -0800, Andrii Nakryiko wrote:
> > On Fri, Jan 5, 2024 at 1:45 PM Linus Torvalds
> > <torvalds@linuxfoundation.org> wrote:
> > >
> > > Ok, I've gone through the whole series now, and I don't find anything
> > > objectionable.
> >
> > That's great, thanks for reviewing!
> >
> > >
> > > Which may only mean that I didn't notice something, of course, but at
> > > least there's nothing I'd consider obvious.
> > >
> > > I keep coming back to this 03/29 patch, because it's kind of the heart
> > > of it, and I have one more small nit, but it's also purely stylistic:
> > >
> > > On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > +{
> > > > +       /* BPF token allows ns_capable() level of capabilities, but only if
> > > > +        * token's userns is *exactly* the same as current user's userns
> > > > +        */
> > > > +       if (token && current_user_ns() == token->userns) {
> > > > +               if (ns_capable(token->userns, cap))
> > > > +                       return true;
> > > > +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > > +                       return true;
> > > > +       }
> > > > +       /* otherwise fallback to capable() checks */
> > > > +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > +}
> > >
> > > This *feels* like it should be written as
> > >
> > >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> > >     {
> > >         struct user_namespace *ns = &init_ns;
> > >
> > >         /* BPF token allows ns_capable() level of capabilities, but only if
> > >          * token's userns is *exactly* the same as current user's userns
> > >          */
> > >         if (token && current_user_ns() == token->userns)
> > >                 ns = token->userns;
> > >         return ns_capable(ns, cap) ||
> > >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > >     }
> > >
> > > And yes, I realize that the function will end up later growing a
> > >
> > >         security_bpf_token_capable(token, cap)
> > >
> > > test inside that 'if (token ..)' statement, and this would change the
> > > order of that test so that the LSM hook would now be done before the
> > > capability checks are done, but that all still seems just more of an
> > > argument for the simplification.
> > >
> > > So the end result would be something like
> > >
> > >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> > >     {
> > >         struct user_namespace *ns = &init_ns;
> > >
> > >         if (token && current_user_ns() == token->userns) {
> > >                 if (security_bpf_token_capable(token, cap) < 0)
> > >                         return false;
> > >                 ns = token->userns;
> > >         }
> > >         return ns_capable(ns, cap) ||
> > >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > >     }
> >
> > Yep, it makes sense to use ns_capable with init_ns. I'll change those
> > two patches to end up with something like what you suggested here.
> >
> > >
> > > although I feel that with that LSM hook, maybe this all should return
> > > the error code (zero or negative), not a bool for success?
> > >
> > > Also, should "current_user_ns() != token->userns" perhaps be an error
> > > condition, rather than a "fall back to init_ns" condition?
> > >
> > > Again, none of this is a big deal. I do think you're dropping the LSM
> > > error code on the floor, and are duplicating the "ns_capable()" vs
> > > "capable()" logic as-is, but none of this is a deal breaker, just more
> > > of my commentary on the patch and about the logic here.
> > >
> > > And yeah, I don't exactly love how you say "ok, if there's a token and
> > > it doesn't match, I'll not use it" rather than "if the token namespace
> > > doesn't match, it's an error", but maybe there's some usability issue
> > > here?
> >
> > Yes, usability was the primary concern. The overall idea with BPF
>
> NAK on not restricting this to not erroring out on current_user_ns()
> != token->user_ns. I've said this multiple times before.

I do restrict token usage to *exact* userns in which the token was
created. See bpf_token_capable()'s

if (token && current_user_ns() == token->userns) { ... }

and in bpf_token_allow_cmd():

if (!token || current_user_ns() != token->userns)
    return false;

So I followed what you asked in [1] (just like I said I will in [2]),
unless I made some stupid mistake which I cannot even see.


What we are discussing here is a different question. It's the
difference between erroring out (that is, failing whatever BPF
operation was attempted with such token, i.e., program loading or map
creation) vs ignoring the token altogether and just using
init_ns-based capable() checks. And the latter is vastly more user
friendly when considering end-to-end integration with user-space
applications and tooling. And doesn't seem to open any security holes.

  [1] https://lore.kernel.org/r/20231130-katzen-anhand-7ad530f187da@brauner
  [2] https://lore.kernel.org/all/CAEf4BzZA2or352VkAaBsr+fsWAGO1Cs_gonH7Ffm5emXGE+2Ug@mail.gmail.com/

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-08 16:45     ` Paul Moore
@ 2024-01-09  0:07       ` Andrii Nakryiko
  2024-01-10 19:29         ` Paul Moore
  0 siblings, 1 reply; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-09  0:07 UTC (permalink / raw)
  To: Paul Moore
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, brauner,
	linux-fsdevel, linux-security-module, kernel-team

On Mon, Jan 8, 2024 at 8:45 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Fri, Jan 5, 2024 at 4:45 PM Linus Torvalds
> <torvalds@linuxfoundation.org> wrote:
> > On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > +{
> > > +       /* BPF token allows ns_capable() level of capabilities, but only if
> > > +        * token's userns is *exactly* the same as current user's userns
> > > +        */
> > > +       if (token && current_user_ns() == token->userns) {
> > > +               if (ns_capable(token->userns, cap))
> > > +                       return true;
> > > +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > +                       return true;
> > > +       }
> > > +       /* otherwise fallback to capable() checks */
> > > +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > +}
> >
> > This *feels* like it should be written as
> >
> >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> >     {
> >         struct user_namespace *ns = &init_ns;
> >
> >         /* BPF token allows ns_capable() level of capabilities, but only if
> >          * token's userns is *exactly* the same as current user's userns
> >          */
> >         if (token && current_user_ns() == token->userns)
> >                 ns = token->userns;
> >         return ns_capable(ns, cap) ||
> >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> >     }
> >
> > And yes, I realize that the function will end up later growing a
> >
> >         security_bpf_token_capable(token, cap)
> >
> > test inside that 'if (token ..)' statement, and this would change the
> > order of that test so that the LSM hook would now be done before the
> > capability checks are done, but that all still seems just more of an
> > argument for the simplification.
>
> I have no problem with rewriting things, my only ask is that we stick
> with the idea of doing the capability checks before the LSM hook.  The
> DAC-before-MAC (capability-before-LSM) pattern is one we try to stick
> to most everywhere in the kernel and deviating from it here could
> potentially result in some odd/unexpected behavior from a user
> perspective.

Makes sense, Paul. With the suggested rewrite we'll get an LSM call
before we get to ns_capable() (which we avoid doing in BPF code base,
generally speaking, after someone called this out earlier). Hmm...

I guess it will be better to keep this logic as is then, I believe it
was more of a subjective stylistical nit from Linus, so it probably is
ok to keep existing code.

Alternatively we could do something like:

struct user_namespace *ns = &init_ns;

if (token && current_user_ns() == token->userns)
    ns = token->user_ns;
else
    token = NULL;

if (ns_capable(ns, cap) || (cap != CAP_SYS_ADMIN && ns_capable(ns,
CAP_SYS_ADMIN)) {
    if (token)
        return security_bpf_token_capable(token, cap) == 0;
    return true;
}
return false;

Or something along those lines? I don't particularly care (though the
latter seems a bit more ceremonious), so please let me know the
preference, if any.


>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-08 23:58         ` Andrii Nakryiko
@ 2024-01-09 14:52           ` Christian Brauner
  2024-01-09 19:00             ` Andrii Nakryiko
  0 siblings, 1 reply; 59+ messages in thread
From: Christian Brauner @ 2024-01-09 14:52 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

On Mon, Jan 08, 2024 at 03:58:47PM -0800, Andrii Nakryiko wrote:
> On Mon, Jan 8, 2024 at 4:02 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Fri, Jan 05, 2024 at 02:18:40PM -0800, Andrii Nakryiko wrote:
> > > On Fri, Jan 5, 2024 at 1:45 PM Linus Torvalds
> > > <torvalds@linuxfoundation.org> wrote:
> > > >
> > > > Ok, I've gone through the whole series now, and I don't find anything
> > > > objectionable.
> > >
> > > That's great, thanks for reviewing!
> > >
> > > >
> > > > Which may only mean that I didn't notice something, of course, but at
> > > > least there's nothing I'd consider obvious.
> > > >
> > > > I keep coming back to this 03/29 patch, because it's kind of the heart
> > > > of it, and I have one more small nit, but it's also purely stylistic:
> > > >
> > > > On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > >
> > > > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > > +{
> > > > > +       /* BPF token allows ns_capable() level of capabilities, but only if
> > > > > +        * token's userns is *exactly* the same as current user's userns
> > > > > +        */
> > > > > +       if (token && current_user_ns() == token->userns) {
> > > > > +               if (ns_capable(token->userns, cap))
> > > > > +                       return true;
> > > > > +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > > > +                       return true;
> > > > > +       }
> > > > > +       /* otherwise fallback to capable() checks */
> > > > > +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > > +}
> > > >
> > > > This *feels* like it should be written as
> > > >
> > > >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > >     {
> > > >         struct user_namespace *ns = &init_ns;
> > > >
> > > >         /* BPF token allows ns_capable() level of capabilities, but only if
> > > >          * token's userns is *exactly* the same as current user's userns
> > > >          */
> > > >         if (token && current_user_ns() == token->userns)
> > > >                 ns = token->userns;
> > > >         return ns_capable(ns, cap) ||
> > > >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > >     }
> > > >
> > > > And yes, I realize that the function will end up later growing a
> > > >
> > > >         security_bpf_token_capable(token, cap)
> > > >
> > > > test inside that 'if (token ..)' statement, and this would change the
> > > > order of that test so that the LSM hook would now be done before the
> > > > capability checks are done, but that all still seems just more of an
> > > > argument for the simplification.
> > > >
> > > > So the end result would be something like
> > > >
> > > >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > >     {
> > > >         struct user_namespace *ns = &init_ns;
> > > >
> > > >         if (token && current_user_ns() == token->userns) {
> > > >                 if (security_bpf_token_capable(token, cap) < 0)
> > > >                         return false;
> > > >                 ns = token->userns;
> > > >         }
> > > >         return ns_capable(ns, cap) ||
> > > >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > >     }
> > >
> > > Yep, it makes sense to use ns_capable with init_ns. I'll change those
> > > two patches to end up with something like what you suggested here.
> > >
> > > >
> > > > although I feel that with that LSM hook, maybe this all should return
> > > > the error code (zero or negative), not a bool for success?
> > > >
> > > > Also, should "current_user_ns() != token->userns" perhaps be an error
> > > > condition, rather than a "fall back to init_ns" condition?
> > > >
> > > > Again, none of this is a big deal. I do think you're dropping the LSM
> > > > error code on the floor, and are duplicating the "ns_capable()" vs
> > > > "capable()" logic as-is, but none of this is a deal breaker, just more
> > > > of my commentary on the patch and about the logic here.
> > > >
> > > > And yeah, I don't exactly love how you say "ok, if there's a token and
> > > > it doesn't match, I'll not use it" rather than "if the token namespace
> > > > doesn't match, it's an error", but maybe there's some usability issue
> > > > here?
> > >
> > > Yes, usability was the primary concern. The overall idea with BPF
> >
> > NAK on not restricting this to not erroring out on current_user_ns()
> > != token->user_ns. I've said this multiple times before.
> 
> I do restrict token usage to *exact* userns in which the token was
> created. See bpf_token_capable()'s
> 
> if (token && current_user_ns() == token->userns) { ... }
> 
> and in bpf_token_allow_cmd():
> 
> if (!token || current_user_ns() != token->userns)
>     return false;
> 
> So I followed what you asked in [1] (just like I said I will in [2]),
> unless I made some stupid mistake which I cannot even see.
> 
> 
> What we are discussing here is a different question. It's the
> difference between erroring out (that is, failing whatever BPF
> operation was attempted with such token, i.e., program loading or map
> creation) vs ignoring the token altogether and just using
> init_ns-based capable() checks. And the latter is vastly more user

Look at this:

+bool bpf_token_capable(const struct bpf_token *token, int cap)
+{
+       /* BPF token allows ns_capable() level of capabilities, but only if
+        * token's userns is *exactly* the same as current user's userns
+        */
+       if (token && current_user_ns() == token->userns) {
+               if (ns_capable(token->userns, cap))
+                       return true;
+               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
+                       return true;
+       }
+       /* otherwise fallback to capable() checks */
+       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
+}

How on earth is it possible that the calling task is in a user namespace
aka current_user_ns() == token->userns while at the same time being
capable in the initial user namespace? When you enter an
unprivileged user namespace you lose all capabilities against your
ancestor user namespace and you can't reenter your ancestor user
namespace.

IOW, if current_user_ns() == token->userns and token->userns !=
init_user_ns, then current_user_ns() != init_user_ns. And therefore that
thing is essentially always false for all interesting cases, no?

Aside from that it would be semantically completely unclean. The user
has specified a token and permission checking should be based on that
token and not magically fallback to a capable check in the inital user
namespace even if that worked.

Because the only scenario where that is maybe useful is if an
unprivileged container has dropped _both_ CAP_BPF and CAP_SYS_ADMIN from
the user namespace of the container.

First of, why? What thread model do you have then? Second, if you do
stupid stuff like that then you don't get bpf in the container via bpf
tokens. Period.

Restrict the meaning and validity of a bpf token to the user namespace
and do not include escape hatches such as this. Especially not in this
initial version, please.

I'm not trying to be difficult but it's clear that the implications of
user namespaces aren't well understood here. And historicaly they are
exploit facilitators as much as exploit preventers.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-09 14:52           ` Christian Brauner
@ 2024-01-09 19:00             ` Andrii Nakryiko
  2024-01-10 14:59               ` Christian Brauner
  0 siblings, 1 reply; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-09 19:00 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

On Tue, Jan 9, 2024 at 6:52 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Mon, Jan 08, 2024 at 03:58:47PM -0800, Andrii Nakryiko wrote:
> > On Mon, Jan 8, 2024 at 4:02 AM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Fri, Jan 05, 2024 at 02:18:40PM -0800, Andrii Nakryiko wrote:
> > > > On Fri, Jan 5, 2024 at 1:45 PM Linus Torvalds
> > > > <torvalds@linuxfoundation.org> wrote:
> > > > >
> > > > > Ok, I've gone through the whole series now, and I don't find anything
> > > > > objectionable.
> > > >
> > > > That's great, thanks for reviewing!
> > > >
> > > > >
> > > > > Which may only mean that I didn't notice something, of course, but at
> > > > > least there's nothing I'd consider obvious.
> > > > >
> > > > > I keep coming back to this 03/29 patch, because it's kind of the heart
> > > > > of it, and I have one more small nit, but it's also purely stylistic:
> > > > >
> > > > > On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > > >
> > > > > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > > > +{
> > > > > > +       /* BPF token allows ns_capable() level of capabilities, but only if
> > > > > > +        * token's userns is *exactly* the same as current user's userns
> > > > > > +        */
> > > > > > +       if (token && current_user_ns() == token->userns) {
> > > > > > +               if (ns_capable(token->userns, cap))
> > > > > > +                       return true;
> > > > > > +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > > > > +                       return true;
> > > > > > +       }
> > > > > > +       /* otherwise fallback to capable() checks */
> > > > > > +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > > > +}
> > > > >
> > > > > This *feels* like it should be written as
> > > > >
> > > > >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > >     {
> > > > >         struct user_namespace *ns = &init_ns;
> > > > >
> > > > >         /* BPF token allows ns_capable() level of capabilities, but only if
> > > > >          * token's userns is *exactly* the same as current user's userns
> > > > >          */
> > > > >         if (token && current_user_ns() == token->userns)
> > > > >                 ns = token->userns;
> > > > >         return ns_capable(ns, cap) ||
> > > > >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > >     }
> > > > >
> > > > > And yes, I realize that the function will end up later growing a
> > > > >
> > > > >         security_bpf_token_capable(token, cap)
> > > > >
> > > > > test inside that 'if (token ..)' statement, and this would change the
> > > > > order of that test so that the LSM hook would now be done before the
> > > > > capability checks are done, but that all still seems just more of an
> > > > > argument for the simplification.
> > > > >
> > > > > So the end result would be something like
> > > > >
> > > > >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > >     {
> > > > >         struct user_namespace *ns = &init_ns;
> > > > >
> > > > >         if (token && current_user_ns() == token->userns) {
> > > > >                 if (security_bpf_token_capable(token, cap) < 0)
> > > > >                         return false;
> > > > >                 ns = token->userns;
> > > > >         }
> > > > >         return ns_capable(ns, cap) ||
> > > > >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > >     }
> > > >
> > > > Yep, it makes sense to use ns_capable with init_ns. I'll change those
> > > > two patches to end up with something like what you suggested here.
> > > >
> > > > >
> > > > > although I feel that with that LSM hook, maybe this all should return
> > > > > the error code (zero or negative), not a bool for success?
> > > > >
> > > > > Also, should "current_user_ns() != token->userns" perhaps be an error
> > > > > condition, rather than a "fall back to init_ns" condition?
> > > > >
> > > > > Again, none of this is a big deal. I do think you're dropping the LSM
> > > > > error code on the floor, and are duplicating the "ns_capable()" vs
> > > > > "capable()" logic as-is, but none of this is a deal breaker, just more
> > > > > of my commentary on the patch and about the logic here.
> > > > >
> > > > > And yeah, I don't exactly love how you say "ok, if there's a token and
> > > > > it doesn't match, I'll not use it" rather than "if the token namespace
> > > > > doesn't match, it's an error", but maybe there's some usability issue
> > > > > here?
> > > >
> > > > Yes, usability was the primary concern. The overall idea with BPF
> > >
> > > NAK on not restricting this to not erroring out on current_user_ns()
> > > != token->user_ns. I've said this multiple times before.
> >
> > I do restrict token usage to *exact* userns in which the token was
> > created. See bpf_token_capable()'s
> >
> > if (token && current_user_ns() == token->userns) { ... }
> >
> > and in bpf_token_allow_cmd():
> >
> > if (!token || current_user_ns() != token->userns)
> >     return false;
> >
> > So I followed what you asked in [1] (just like I said I will in [2]),
> > unless I made some stupid mistake which I cannot even see.
> >
> >
> > What we are discussing here is a different question. It's the
> > difference between erroring out (that is, failing whatever BPF
> > operation was attempted with such token, i.e., program loading or map
> > creation) vs ignoring the token altogether and just using
> > init_ns-based capable() checks. And the latter is vastly more user
>
> Look at this:
>
> +bool bpf_token_capable(const struct bpf_token *token, int cap)
> +{
> +       /* BPF token allows ns_capable() level of capabilities, but only if
> +        * token's userns is *exactly* the same as current user's userns
> +        */
> +       if (token && current_user_ns() == token->userns) {
> +               if (ns_capable(token->userns, cap))
> +                       return true;
> +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> +                       return true;
> +       }
> +       /* otherwise fallback to capable() checks */
> +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> +}
>
> How on earth is it possible that the calling task is in a user namespace
> aka current_user_ns() == token->userns while at the same time being
> capable in the initial user namespace? When you enter an
> unprivileged user namespace you lose all capabilities against your
> ancestor user namespace and you can't reenter your ancestor user
> namespace.
>
> IOW, if current_user_ns() == token->userns and token->userns !=
> init_user_ns, then current_user_ns() != init_user_ns. And therefore that
> thing is essentially always false for all interesting cases, no?
>

Are you saying that this would be better?

   if (token && current_user_ns() == token->userns) {
       if (ns_capable(token->userns, cap))
           return true;
       if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
           return true;
       if (token->userns != &init_user_ns)
           return false;
   }
   /* otherwise fallback to capable() checks */
   return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));


I.e., return false directly if token's userns is not initns (there
will be also LSM check before this condition later on)? Falling back
to capable() checks and letting it return false if we are not in
init_ns or don't have capabilities seemed fine to me, that's all.


> Aside from that it would be semantically completely unclean. The user
> has specified a token and permission checking should be based on that
> token and not magically fallback to a capable check in the inital user
> namespace even if that worked.

I tried to explain the higher-level integration setup in [0]. The
thing is that users most of the time won't be explicitly passing a
token, BPF library will be passing it, if /sys/fs/bpf happens to be
mounted with delegation options.

So I wanted to avoid potential regressions (unintended and avoidable
failures) from using BPF token, because it might be hard to tell if a
BPF token is "beneficial" and is granting required permissions
(especially if you take into account LSM interactions). So I
consistently treat BPF token as optional/add-on permissions, not the
replacement for capable() checks.

It's true that it's unlikely that BPF token will be set up in init_ns
(except for testing, perhaps), but is it a reason to return -EPERM
without doing the same checks that would be done if BPF token wasn't
provided?


  [0] https://lore.kernel.org/bpf/CAEf4Bzb6jnJL98SLPJB7Vjxo_O33W8HjJuAsyP3+6xigZtsTkA@mail.gmail.com/

>
> Because the only scenario where that is maybe useful is if an
> unprivileged container has dropped _both_ CAP_BPF and CAP_SYS_ADMIN from
> the user namespace of the container.
>
> First of, why? What thread model do you have then? Second, if you do
> stupid stuff like that then you don't get bpf in the container via bpf
> tokens. Period.
>
> Restrict the meaning and validity of a bpf token to the user namespace
> and do not include escape hatches such as this. Especially not in this
> initial version, please.

This decision fundamentally changes how BPF loader libraries like
libbpf will have to approach BPF token integration. It's not a small
thing and not something that will be easy to change later.

>
> I'm not trying to be difficult but it's clear that the implications of
> user namespaces aren't well understood here. And historicaly they are

I don't know why you are saying this. You haven't pointed out anything
that is actually broken in the existing implementation. Sure, you
might not be a fan of the approach, but is there anything
*technically* wrong with ignoring BPF token if it doesn't provide
necessary permissions for BPF operation and consistently using the
checks that would be performed with BPF token?

> exploit facilitators as much as exploit preventers.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-09 19:00             ` Andrii Nakryiko
@ 2024-01-10 14:59               ` Christian Brauner
  2024-01-11  0:42                 ` Andrii Nakryiko
  0 siblings, 1 reply; 59+ messages in thread
From: Christian Brauner @ 2024-01-10 14:59 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

On Tue, Jan 09, 2024 at 11:00:24AM -0800, Andrii Nakryiko wrote:
> On Tue, Jan 9, 2024 at 6:52 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Mon, Jan 08, 2024 at 03:58:47PM -0800, Andrii Nakryiko wrote:
> > > On Mon, Jan 8, 2024 at 4:02 AM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Fri, Jan 05, 2024 at 02:18:40PM -0800, Andrii Nakryiko wrote:
> > > > > On Fri, Jan 5, 2024 at 1:45 PM Linus Torvalds
> > > > > <torvalds@linuxfoundation.org> wrote:
> > > > > >
> > > > > > Ok, I've gone through the whole series now, and I don't find anything
> > > > > > objectionable.
> > > > >
> > > > > That's great, thanks for reviewing!
> > > > >
> > > > > >
> > > > > > Which may only mean that I didn't notice something, of course, but at
> > > > > > least there's nothing I'd consider obvious.
> > > > > >
> > > > > > I keep coming back to this 03/29 patch, because it's kind of the heart
> > > > > > of it, and I have one more small nit, but it's also purely stylistic:
> > > > > >
> > > > > > On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > > > >
> > > > > > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > > > > +{
> > > > > > > +       /* BPF token allows ns_capable() level of capabilities, but only if
> > > > > > > +        * token's userns is *exactly* the same as current user's userns
> > > > > > > +        */
> > > > > > > +       if (token && current_user_ns() == token->userns) {
> > > > > > > +               if (ns_capable(token->userns, cap))
> > > > > > > +                       return true;
> > > > > > > +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > > > > > +                       return true;
> > > > > > > +       }
> > > > > > > +       /* otherwise fallback to capable() checks */
> > > > > > > +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > > > > +}
> > > > > >
> > > > > > This *feels* like it should be written as
> > > > > >
> > > > > >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > > >     {
> > > > > >         struct user_namespace *ns = &init_ns;
> > > > > >
> > > > > >         /* BPF token allows ns_capable() level of capabilities, but only if
> > > > > >          * token's userns is *exactly* the same as current user's userns
> > > > > >          */
> > > > > >         if (token && current_user_ns() == token->userns)
> > > > > >                 ns = token->userns;
> > > > > >         return ns_capable(ns, cap) ||
> > > > > >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > > >     }
> > > > > >
> > > > > > And yes, I realize that the function will end up later growing a
> > > > > >
> > > > > >         security_bpf_token_capable(token, cap)
> > > > > >
> > > > > > test inside that 'if (token ..)' statement, and this would change the
> > > > > > order of that test so that the LSM hook would now be done before the
> > > > > > capability checks are done, but that all still seems just more of an
> > > > > > argument for the simplification.
> > > > > >
> > > > > > So the end result would be something like
> > > > > >
> > > > > >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > > >     {
> > > > > >         struct user_namespace *ns = &init_ns;
> > > > > >
> > > > > >         if (token && current_user_ns() == token->userns) {
> > > > > >                 if (security_bpf_token_capable(token, cap) < 0)
> > > > > >                         return false;
> > > > > >                 ns = token->userns;
> > > > > >         }
> > > > > >         return ns_capable(ns, cap) ||
> > > > > >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > > >     }
> > > > >
> > > > > Yep, it makes sense to use ns_capable with init_ns. I'll change those
> > > > > two patches to end up with something like what you suggested here.
> > > > >
> > > > > >
> > > > > > although I feel that with that LSM hook, maybe this all should return
> > > > > > the error code (zero or negative), not a bool for success?
> > > > > >
> > > > > > Also, should "current_user_ns() != token->userns" perhaps be an error
> > > > > > condition, rather than a "fall back to init_ns" condition?
> > > > > >
> > > > > > Again, none of this is a big deal. I do think you're dropping the LSM
> > > > > > error code on the floor, and are duplicating the "ns_capable()" vs
> > > > > > "capable()" logic as-is, but none of this is a deal breaker, just more
> > > > > > of my commentary on the patch and about the logic here.
> > > > > >
> > > > > > And yeah, I don't exactly love how you say "ok, if there's a token and
> > > > > > it doesn't match, I'll not use it" rather than "if the token namespace
> > > > > > doesn't match, it's an error", but maybe there's some usability issue
> > > > > > here?
> > > > >
> > > > > Yes, usability was the primary concern. The overall idea with BPF
> > > >
> > > > NAK on not restricting this to not erroring out on current_user_ns()
> > > > != token->user_ns. I've said this multiple times before.
> > >
> > > I do restrict token usage to *exact* userns in which the token was
> > > created. See bpf_token_capable()'s
> > >
> > > if (token && current_user_ns() == token->userns) { ... }
> > >
> > > and in bpf_token_allow_cmd():
> > >
> > > if (!token || current_user_ns() != token->userns)
> > >     return false;
> > >
> > > So I followed what you asked in [1] (just like I said I will in [2]),
> > > unless I made some stupid mistake which I cannot even see.
> > >
> > >
> > > What we are discussing here is a different question. It's the
> > > difference between erroring out (that is, failing whatever BPF
> > > operation was attempted with such token, i.e., program loading or map
> > > creation) vs ignoring the token altogether and just using
> > > init_ns-based capable() checks. And the latter is vastly more user
> >
> > Look at this:
> >
> > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > +{
> > +       /* BPF token allows ns_capable() level of capabilities, but only if
> > +        * token's userns is *exactly* the same as current user's userns
> > +        */
> > +       if (token && current_user_ns() == token->userns) {
> > +               if (ns_capable(token->userns, cap))
> > +                       return true;
> > +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > +                       return true;
> > +       }
> > +       /* otherwise fallback to capable() checks */
> > +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > +}
> >
> > How on earth is it possible that the calling task is in a user namespace
> > aka current_user_ns() == token->userns while at the same time being
> > capable in the initial user namespace? When you enter an
> > unprivileged user namespace you lose all capabilities against your
> > ancestor user namespace and you can't reenter your ancestor user
> > namespace.
> >
> > IOW, if current_user_ns() == token->userns and token->userns !=
> > init_user_ns, then current_user_ns() != init_user_ns. And therefore that
> > thing is essentially always false for all interesting cases, no?
> >
> 
> Are you saying that this would be better?
> 
>    if (token && current_user_ns() == token->userns) {
>        if (ns_capable(token->userns, cap))
>            return true;
>        if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
>            return true;
>        if (token->userns != &init_user_ns)
>            return false;
>    }
>    /* otherwise fallback to capable() checks */
>    return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> 
> 
> I.e., return false directly if token's userns is not initns (there
> will be also LSM check before this condition later on)? Falling back
> to capable() checks and letting it return false if we are not in
> init_ns or don't have capabilities seemed fine to me, that's all.
> 
> 
> > Aside from that it would be semantically completely unclean. The user
> > has specified a token and permission checking should be based on that
> > token and not magically fallback to a capable check in the inital user
> > namespace even if that worked.
> 
> I tried to explain the higher-level integration setup in [0]. The
> thing is that users most of the time won't be explicitly passing a
> token, BPF library will be passing it, if /sys/fs/bpf happens to be
> mounted with delegation options.
> 
> So I wanted to avoid potential regressions (unintended and avoidable
> failures) from using BPF token, because it might be hard to tell if a
> BPF token is "beneficial" and is granting required permissions
> (especially if you take into account LSM interactions). So I
> consistently treat BPF token as optional/add-on permissions, not the
> replacement for capable() checks.

You can always just perform the same call again without specifying the
token.

> 
> It's true that it's unlikely that BPF token will be set up in init_ns
> (except for testing, perhaps), but is it a reason to return -EPERM
> without doing the same checks that would be done if BPF token wasn't
> provided?
> 
> 
>   [0] https://lore.kernel.org/bpf/CAEf4Bzb6jnJL98SLPJB7Vjxo_O33W8HjJuAsyP3+6xigZtsTkA@mail.gmail.com/
> 
> >
> > Because the only scenario where that is maybe useful is if an
> > unprivileged container has dropped _both_ CAP_BPF and CAP_SYS_ADMIN from
> > the user namespace of the container.
> >
> > First of, why? What thread model do you have then? Second, if you do
> > stupid stuff like that then you don't get bpf in the container via bpf
> > tokens. Period.
> >
> > Restrict the meaning and validity of a bpf token to the user namespace
> > and do not include escape hatches such as this. Especially not in this
> > initial version, please.
> 
> This decision fundamentally changes how BPF loader libraries like
> libbpf will have to approach BPF token integration. It's not a small
> thing and not something that will be easy to change later.

Why? It would be relaxing permissions, not restricting it.

> 
> >
> > I'm not trying to be difficult but it's clear that the implications of
> > user namespaces aren't well understood here. And historicaly they are
> 
> I don't know why you are saying this. You haven't pointed out anything
> that is actually broken in the existing implementation. Sure, you
> might not be a fan of the approach, but is there anything
> *technically* wrong with ignoring BPF token if it doesn't provide
> necessary permissions for BPF operation and consistently using the
> checks that would be performed with BPF token?

The current check is inconsisent. It special-cases init_user_ns. The
correct thing to do for what you're intending imho is:

bool bpf_token_capable(const struct bpf_token *token, int cap)
{
        struct user_namespace *userns = &init_user_ns;

        if (token)
                userns = token->userns;
        if (ns_capable(userns, cap))
                return true;
        return cap != CAP_SYS_ADMIN && ns_capable(userns, CAP_SYS_ADMIN))

}

Because any caller located in an ancestor user namespace of
token->user_ns will be privileged wrt to the token's userns as long as
they have that capability in their user namespace.

For example, if the caller is in the init_user_ns and permissions
for CAP_WHATEVER is checked for in token->user_ns and the caller has
CAP_WHATEVER in init_user_ns then they also have it in all
descendant user namespaces.

The original intention had been to align with what we require during
token creation meaning that once a token has been created interacting
with this token is specifically confined to caller's located in the
token's user namespace.

If that's not the case then it doesn't make sense to not allow
permission checking based on regular capability semantics. IOW, why
special case init_user_ns if you're breaking the confinement restriction
anyway.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-09  0:07       ` Andrii Nakryiko
@ 2024-01-10 19:29         ` Paul Moore
  0 siblings, 0 replies; 59+ messages in thread
From: Paul Moore @ 2024-01-10 19:29 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, brauner,
	linux-fsdevel, linux-security-module, kernel-team

On Mon, Jan 8, 2024 at 7:07 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Mon, Jan 8, 2024 at 8:45 AM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Fri, Jan 5, 2024 at 4:45 PM Linus Torvalds
> > <torvalds@linuxfoundation.org> wrote:
> > > On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > +{
> > > > +       /* BPF token allows ns_capable() level of capabilities, but only if
> > > > +        * token's userns is *exactly* the same as current user's userns
> > > > +        */
> > > > +       if (token && current_user_ns() == token->userns) {
> > > > +               if (ns_capable(token->userns, cap))
> > > > +                       return true;
> > > > +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > > +                       return true;
> > > > +       }
> > > > +       /* otherwise fallback to capable() checks */
> > > > +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > +}
> > >
> > > This *feels* like it should be written as
> > >
> > >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> > >     {
> > >         struct user_namespace *ns = &init_ns;
> > >
> > >         /* BPF token allows ns_capable() level of capabilities, but only if
> > >          * token's userns is *exactly* the same as current user's userns
> > >          */
> > >         if (token && current_user_ns() == token->userns)
> > >                 ns = token->userns;
> > >         return ns_capable(ns, cap) ||
> > >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > >     }
> > >
> > > And yes, I realize that the function will end up later growing a
> > >
> > >         security_bpf_token_capable(token, cap)
> > >
> > > test inside that 'if (token ..)' statement, and this would change the
> > > order of that test so that the LSM hook would now be done before the
> > > capability checks are done, but that all still seems just more of an
> > > argument for the simplification.
> >
> > I have no problem with rewriting things, my only ask is that we stick
> > with the idea of doing the capability checks before the LSM hook.  The
> > DAC-before-MAC (capability-before-LSM) pattern is one we try to stick
> > to most everywhere in the kernel and deviating from it here could
> > potentially result in some odd/unexpected behavior from a user
> > perspective.
>
> Makes sense, Paul. With the suggested rewrite we'll get an LSM call
> before we get to ns_capable() (which we avoid doing in BPF code base,
> generally speaking, after someone called this out earlier). Hmm...
>
> I guess it will be better to keep this logic as is then, I believe it
> was more of a subjective stylistical nit from Linus, so it probably is
> ok to keep existing code.

I didn't read Linus' reply as a mandate, more as a
this-would-be-nice-to-have, and considering the access control
ordering I would just stick with what you have (ignoring Christian's
concerns, I'm only commenting on the LSM related stuff here).

If Linus is *really* upset with how the code is written I suspect
we'll hear from him on that.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-10 14:59               ` Christian Brauner
@ 2024-01-11  0:42                 ` Andrii Nakryiko
  2024-01-11 10:38                   ` Christian Brauner
  0 siblings, 1 reply; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-11  0:42 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

On Wed, Jan 10, 2024 at 6:59 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Jan 09, 2024 at 11:00:24AM -0800, Andrii Nakryiko wrote:
> > On Tue, Jan 9, 2024 at 6:52 AM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Mon, Jan 08, 2024 at 03:58:47PM -0800, Andrii Nakryiko wrote:
> > > > On Mon, Jan 8, 2024 at 4:02 AM Christian Brauner <brauner@kernel.org> wrote:
> > > > >
> > > > > On Fri, Jan 05, 2024 at 02:18:40PM -0800, Andrii Nakryiko wrote:
> > > > > > On Fri, Jan 5, 2024 at 1:45 PM Linus Torvalds
> > > > > > <torvalds@linuxfoundation.org> wrote:
> > > > > > >
> > > > > > > Ok, I've gone through the whole series now, and I don't find anything
> > > > > > > objectionable.
> > > > > >
> > > > > > That's great, thanks for reviewing!
> > > > > >
> > > > > > >
> > > > > > > Which may only mean that I didn't notice something, of course, but at
> > > > > > > least there's nothing I'd consider obvious.
> > > > > > >
> > > > > > > I keep coming back to this 03/29 patch, because it's kind of the heart
> > > > > > > of it, and I have one more small nit, but it's also purely stylistic:
> > > > > > >
> > > > > > > On Wed, 3 Jan 2024 at 14:21, Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > > > > >
> > > > > > > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > > > > > +{
> > > > > > > > +       /* BPF token allows ns_capable() level of capabilities, but only if
> > > > > > > > +        * token's userns is *exactly* the same as current user's userns
> > > > > > > > +        */
> > > > > > > > +       if (token && current_user_ns() == token->userns) {
> > > > > > > > +               if (ns_capable(token->userns, cap))
> > > > > > > > +                       return true;
> > > > > > > > +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > > > > > > +                       return true;
> > > > > > > > +       }
> > > > > > > > +       /* otherwise fallback to capable() checks */
> > > > > > > > +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > > > > > +}
> > > > > > >
> > > > > > > This *feels* like it should be written as
> > > > > > >
> > > > > > >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > > > >     {
> > > > > > >         struct user_namespace *ns = &init_ns;
> > > > > > >
> > > > > > >         /* BPF token allows ns_capable() level of capabilities, but only if
> > > > > > >          * token's userns is *exactly* the same as current user's userns
> > > > > > >          */
> > > > > > >         if (token && current_user_ns() == token->userns)
> > > > > > >                 ns = token->userns;
> > > > > > >         return ns_capable(ns, cap) ||
> > > > > > >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > > > >     }
> > > > > > >
> > > > > > > And yes, I realize that the function will end up later growing a
> > > > > > >
> > > > > > >         security_bpf_token_capable(token, cap)
> > > > > > >
> > > > > > > test inside that 'if (token ..)' statement, and this would change the
> > > > > > > order of that test so that the LSM hook would now be done before the
> > > > > > > capability checks are done, but that all still seems just more of an
> > > > > > > argument for the simplification.
> > > > > > >
> > > > > > > So the end result would be something like
> > > > > > >
> > > > > > >     bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > > > >     {
> > > > > > >         struct user_namespace *ns = &init_ns;
> > > > > > >
> > > > > > >         if (token && current_user_ns() == token->userns) {
> > > > > > >                 if (security_bpf_token_capable(token, cap) < 0)
> > > > > > >                         return false;
> > > > > > >                 ns = token->userns;
> > > > > > >         }
> > > > > > >         return ns_capable(ns, cap) ||
> > > > > > >                 (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > > > > >     }
> > > > > >
> > > > > > Yep, it makes sense to use ns_capable with init_ns. I'll change those
> > > > > > two patches to end up with something like what you suggested here.
> > > > > >
> > > > > > >
> > > > > > > although I feel that with that LSM hook, maybe this all should return
> > > > > > > the error code (zero or negative), not a bool for success?
> > > > > > >
> > > > > > > Also, should "current_user_ns() != token->userns" perhaps be an error
> > > > > > > condition, rather than a "fall back to init_ns" condition?
> > > > > > >
> > > > > > > Again, none of this is a big deal. I do think you're dropping the LSM
> > > > > > > error code on the floor, and are duplicating the "ns_capable()" vs
> > > > > > > "capable()" logic as-is, but none of this is a deal breaker, just more
> > > > > > > of my commentary on the patch and about the logic here.
> > > > > > >
> > > > > > > And yeah, I don't exactly love how you say "ok, if there's a token and
> > > > > > > it doesn't match, I'll not use it" rather than "if the token namespace
> > > > > > > doesn't match, it's an error", but maybe there's some usability issue
> > > > > > > here?
> > > > > >
> > > > > > Yes, usability was the primary concern. The overall idea with BPF
> > > > >
> > > > > NAK on not restricting this to not erroring out on current_user_ns()
> > > > > != token->user_ns. I've said this multiple times before.
> > > >
> > > > I do restrict token usage to *exact* userns in which the token was
> > > > created. See bpf_token_capable()'s
> > > >
> > > > if (token && current_user_ns() == token->userns) { ... }
> > > >
> > > > and in bpf_token_allow_cmd():
> > > >
> > > > if (!token || current_user_ns() != token->userns)
> > > >     return false;
> > > >
> > > > So I followed what you asked in [1] (just like I said I will in [2]),
> > > > unless I made some stupid mistake which I cannot even see.
> > > >
> > > >
> > > > What we are discussing here is a different question. It's the
> > > > difference between erroring out (that is, failing whatever BPF
> > > > operation was attempted with such token, i.e., program loading or map
> > > > creation) vs ignoring the token altogether and just using
> > > > init_ns-based capable() checks. And the latter is vastly more user
> > >
> > > Look at this:
> > >
> > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > +{
> > > +       /* BPF token allows ns_capable() level of capabilities, but only if
> > > +        * token's userns is *exactly* the same as current user's userns
> > > +        */
> > > +       if (token && current_user_ns() == token->userns) {
> > > +               if (ns_capable(token->userns, cap))
> > > +                       return true;
> > > +               if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > +                       return true;
> > > +       }
> > > +       /* otherwise fallback to capable() checks */
> > > +       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > +}
> > >
> > > How on earth is it possible that the calling task is in a user namespace
> > > aka current_user_ns() == token->userns while at the same time being
> > > capable in the initial user namespace? When you enter an
> > > unprivileged user namespace you lose all capabilities against your
> > > ancestor user namespace and you can't reenter your ancestor user
> > > namespace.
> > >
> > > IOW, if current_user_ns() == token->userns and token->userns !=
> > > init_user_ns, then current_user_ns() != init_user_ns. And therefore that
> > > thing is essentially always false for all interesting cases, no?
> > >
> >
> > Are you saying that this would be better?
> >
> >    if (token && current_user_ns() == token->userns) {
> >        if (ns_capable(token->userns, cap))
> >            return true;
> >        if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> >            return true;
> >        if (token->userns != &init_user_ns)
> >            return false;
> >    }
> >    /* otherwise fallback to capable() checks */
> >    return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> >
> >
> > I.e., return false directly if token's userns is not initns (there
> > will be also LSM check before this condition later on)? Falling back
> > to capable() checks and letting it return false if we are not in
> > init_ns or don't have capabilities seemed fine to me, that's all.
> >
> >
> > > Aside from that it would be semantically completely unclean. The user
> > > has specified a token and permission checking should be based on that
> > > token and not magically fallback to a capable check in the inital user
> > > namespace even if that worked.
> >
> > I tried to explain the higher-level integration setup in [0]. The
> > thing is that users most of the time won't be explicitly passing a
> > token, BPF library will be passing it, if /sys/fs/bpf happens to be
> > mounted with delegation options.
> >
> > So I wanted to avoid potential regressions (unintended and avoidable
> > failures) from using BPF token, because it might be hard to tell if a
> > BPF token is "beneficial" and is granting required permissions
> > (especially if you take into account LSM interactions). So I
> > consistently treat BPF token as optional/add-on permissions, not the
> > replacement for capable() checks.
>
> You can always just perform the same call again without specifying the
> token.

This has a bunch of problematic implications.

Retrying on any EPERM leads to inefficiency and potential confusion
for users. EPERM can be returned somewhere deeply in the verifier
after spending tons of memory and CPU doing verification. So it's a
waste to try with a token and then try without. And also, retrying
without a token can change specific failure reasons. E.g., for a BPF
program with token we can get deep enough into the verification
process and the verifier will provide log with details about what the
program is doing wrong (allowing the user to fix it relatively
easily), while without token we can bail out much earlier but with not
details what's wrong.

Similarly for other operations (map, BTF), token can provide log
details, etc, but retrying without token will strip users of these
helpful details. That's just to say that dropping a token
automatically doesn't necessarily provide the same user experience
compared to if the token is ignored, if it's not effective (besides
the performance and resource waste implications above).

We have a similar precedent with optional BTF. Libbpf will silently
retry map creation/program loading without BTF automatically. And we
had enough pain with this and invested a lot of work to prevent the
need for retry. We preventively sanitize or drop BTF, etc. For similar
reasons as above (performance and user experience with debugging).

So in short, it's a significant regression in usability and user
experience if the token isn't treated as an add-on permissions,
forcing (otherwise avoidable) complications into user-space libraries
and applications.

>
> >
> > It's true that it's unlikely that BPF token will be set up in init_ns
> > (except for testing, perhaps), but is it a reason to return -EPERM
> > without doing the same checks that would be done if BPF token wasn't
> > provided?
> >
> >
> >   [0] https://lore.kernel.org/bpf/CAEf4Bzb6jnJL98SLPJB7Vjxo_O33W8HjJuAsyP3+6xigZtsTkA@mail.gmail.com/
> >
> > >
> > > Because the only scenario where that is maybe useful is if an
> > > unprivileged container has dropped _both_ CAP_BPF and CAP_SYS_ADMIN from
> > > the user namespace of the container.
> > >
> > > First of, why? What thread model do you have then? Second, if you do
> > > stupid stuff like that then you don't get bpf in the container via bpf
> > > tokens. Period.
> > >
> > > Restrict the meaning and validity of a bpf token to the user namespace
> > > and do not include escape hatches such as this. Especially not in this
> > > initial version, please.
> >
> > This decision fundamentally changes how BPF loader libraries like
> > libbpf will have to approach BPF token integration. It's not a small
> > thing and not something that will be easy to change later.
>
> Why? It would be relaxing permissions, not restricting it.

See above, I tried to highlight some implications, though I realize
it's a bit hard to go over these intricate libbpf implications in a
succinct email without going into excessive details about how BPF
development and debugging is done nowadays with libbpf and
libbpf-based tooling.

>
> >
> > >
> > > I'm not trying to be difficult but it's clear that the implications of
> > > user namespaces aren't well understood here. And historicaly they are
> >
> > I don't know why you are saying this. You haven't pointed out anything
> > that is actually broken in the existing implementation. Sure, you
> > might not be a fan of the approach, but is there anything
> > *technically* wrong with ignoring BPF token if it doesn't provide
> > necessary permissions for BPF operation and consistently using the
> > checks that would be performed with BPF token?
>
> The current check is inconsisent. It special-cases init_user_ns. The
> correct thing to do for what you're intending imho is:
>
> bool bpf_token_capable(const struct bpf_token *token, int cap)
> {
>         struct user_namespace *userns = &init_user_ns;
>
>         if (token)
>                 userns = token->userns;
>         if (ns_capable(userns, cap))
>                 return true;
>         return cap != CAP_SYS_ADMIN && ns_capable(userns, CAP_SYS_ADMIN))
>
> }

Unfortunately the above becomes significantly more hairy when LSM
(security_bpf_token_capable) gets involved, while preserving the rule
"if token doesn't give rights, fall back to init userns checks".

I'm happy to accommodate any implementation of bpf_token_capable() as
long as it behaves as discussed above and also satisfies Paul's
requirement that capability checks should happen before LSM checks.

>
> Because any caller located in an ancestor user namespace of
> token->user_ns will be privileged wrt to the token's userns as long as
> they have that capability in their user namespace.

And with `current_user_ns() == token->userns` check we won't be using
token->userns while the caller is in ancestor user namespace, we'll
use capable() check, which will succeed only in init user_ns, assuming
corresponding CAP_xxx is actually set.

>
> For example, if the caller is in the init_user_ns and permissions
> for CAP_WHATEVER is checked for in token->user_ns and the caller has
> CAP_WHATEVER in init_user_ns then they also have it in all
> descendant user namespaces.

Right, so if they didn't use a token they would still pass
capable(CAP_WHATEVER), right?

>
> The original intention had been to align with what we require during
> token creation meaning that once a token has been created interacting
> with this token is specifically confined to caller's located in the
> token's user namespace.
>
> If that's not the case then it doesn't make sense to not allow
> permission checking based on regular capability semantics. IOW, why
> special case init_user_ns if you're breaking the confinement restriction
> anyway.

I'm sorry, perhaps I'm dense, but with `current_user_ns() ==
token->userns` check I think we do fulfill the intention to not allow
using a token in a userns different from the one in which it was
created. If that condition isn't satisfied, the token is immediately
ignored. So you can't use a token from another userns for anything,
it's just not there, effectively.

And as I tried to explain above, I do think that ignoring the token
instead of erroring out early is what we want to provide good
user-space ecosystem integration of BPF token.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-11  0:42                 ` Andrii Nakryiko
@ 2024-01-11 10:38                   ` Christian Brauner
  2024-01-11 17:41                     ` Andrii Nakryiko
  0 siblings, 1 reply; 59+ messages in thread
From: Christian Brauner @ 2024-01-11 10:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

> > The current check is inconsisent. It special-cases init_user_ns. The
> > correct thing to do for what you're intending imho is:
> >
> > bool bpf_token_capable(const struct bpf_token *token, int cap)
> > {
> >         struct user_namespace *userns = &init_user_ns;
> >
> >         if (token)
> >                 userns = token->userns;
> >         if (ns_capable(userns, cap))
> >                 return true;
> >         return cap != CAP_SYS_ADMIN && ns_capable(userns, CAP_SYS_ADMIN))
> >
> > }
> 
> Unfortunately the above becomes significantly more hairy when LSM
> (security_bpf_token_capable) gets involved, while preserving the rule
> "if token doesn't give rights, fall back to init userns checks".

Why? Please explain your reasoning in detail.

> 
> I'm happy to accommodate any implementation of bpf_token_capable() as
> long as it behaves as discussed above and also satisfies Paul's
> requirement that capability checks should happen before LSM checks.
> 
> >
> > Because any caller located in an ancestor user namespace of
> > token->user_ns will be privileged wrt to the token's userns as long as
> > they have that capability in their user namespace.
> 
> And with `current_user_ns() == token->userns` check we won't be using
> token->userns while the caller is in ancestor user namespace, we'll
> use capable() check, which will succeed only in init user_ns, assuming
> corresponding CAP_xxx is actually set.

Why? This isn't how any of our ns_capable() logic works.

This basically argues that anyone in an ancestor user namespace is not
allowed to operate on any of their descendant child namespaces unless
they are in the init_user_ns.

But that's nonsense as I'm trying to tell you. Any process in an
ancestor user namespace that is privileged over the child namespace can
just setns() into it and then pass that bpf_token_capable() check by
supplying the token.

At this point just do it properly and allow callers that are privileged
in the token user namespace to load bpf programs. It also means you get
user namespace nesting done properly.

> 
> >
> > For example, if the caller is in the init_user_ns and permissions
> > for CAP_WHATEVER is checked for in token->user_ns and the caller has
> > CAP_WHATEVER in init_user_ns then they also have it in all
> > descendant user namespaces.
> 
> Right, so if they didn't use a token they would still pass
> capable(CAP_WHATEVER), right?

Yes, I'm trying to accomodate your request but making it work
consistently.

> 
> >
> > The original intention had been to align with what we require during
> > token creation meaning that once a token has been created interacting
> > with this token is specifically confined to caller's located in the
> > token's user namespace.
> >
> > If that's not the case then it doesn't make sense to not allow
> > permission checking based on regular capability semantics. IOW, why
> > special case init_user_ns if you're breaking the confinement restriction
> > anyway.
> 
> I'm sorry, perhaps I'm dense, but with `current_user_ns() ==
> token->userns` check I think we do fulfill the intention to not allow
> using a token in a userns different from the one in which it was
> created. If that condition isn't satisfied, the token is immediately

My request originally was about never being able to interact with a
token outside of that userns. This is different as you provide an escape
hatch to init_user_ns. But if you need that and ignore the token then
please do it properly. That's what I'm trying to tell you. See below.

> ignored. So you can't use a token from another userns for anything,
> it's just not there, effectively.
> 
> And as I tried to explain above, I do think that ignoring the token
> instead of erroring out early is what we want to provide good
> user-space ecosystem integration of BPF token.

There is no erroring out early in. It's:

(1) Has a token been provided and is the caller capable wrt to the
    namespace of the token? Any caller in an ancestor user namespace
    that has the capability in that user namespace is capable wrt to
    that token. That __includes__ a callers in the init_user_ns. IOW,
    you don't need to fallback to any special checking for init_user_ns.
    It is literally covered in the if (token) branch with the added
    consistency that a process in an ancestor user namespace is
    privileged wrt to that token as well.

(2) No token has been provided. Then do what we always did and perform
    the capability checks based on the initial user namespace.

The only thing that you then still need is add that token_capable() hook
in there:

bool bpf_token_capable(const struct bpf_token *token, int cap)
{
	bool has_cap;
        struct user_namespace *userns = &init_user_ns;

        if (token)
                userns = token->userns;
        if (ns_capable(userns, cap))
                return true;
        if (cap != CAP_SYS_ADMIN && ns_capable(userns, CAP_SYS_ADMIN))
		return token ? security_bpf_token_capable(token, cap) == 0 : true;
	return false;
}

Or write it however you like. I think this is way more consistent and
gives you a more flexible permission model.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-11 10:38                   ` Christian Brauner
@ 2024-01-11 17:41                     ` Andrii Nakryiko
  2024-01-12  7:58                       ` Christian Brauner
  0 siblings, 1 reply; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-11 17:41 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

On Thu, Jan 11, 2024 at 2:38 AM Christian Brauner <brauner@kernel.org> wrote:
>
> > > The current check is inconsisent. It special-cases init_user_ns. The
> > > correct thing to do for what you're intending imho is:
> > >
> > > bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > {
> > >         struct user_namespace *userns = &init_user_ns;
> > >
> > >         if (token)
> > >                 userns = token->userns;
> > >         if (ns_capable(userns, cap))
> > >                 return true;
> > >         return cap != CAP_SYS_ADMIN && ns_capable(userns, CAP_SYS_ADMIN))
> > >
> > > }
> >
> > Unfortunately the above becomes significantly more hairy when LSM
> > (security_bpf_token_capable) gets involved, while preserving the rule
> > "if token doesn't give rights, fall back to init userns checks".
>
> Why? Please explain your reasoning in detail.

Why which part? About LSM interaction making this much hairier? Then see below.

But if your "why?" is about "pretend no token, if token doesn't give
rights", then that's what I tried to explain in my last email(s). It
significantly alters (for the worse) user-space integration story
(providing a token can be a regression, so now it's not safe to
opportunistically try to create and use BPF token; on the other hand,
automatically retrying inside libbpf makes for confusing user
experience and inefficiencies). Please let me know which parts are not
clear.

>
> >
> > I'm happy to accommodate any implementation of bpf_token_capable() as
> > long as it behaves as discussed above and also satisfies Paul's
> > requirement that capability checks should happen before LSM checks.
> >
> > >
> > > Because any caller located in an ancestor user namespace of
> > > token->user_ns will be privileged wrt to the token's userns as long as
> > > they have that capability in their user namespace.
> >
> > And with `current_user_ns() == token->userns` check we won't be using
> > token->userns while the caller is in ancestor user namespace, we'll
> > use capable() check, which will succeed only in init user_ns, assuming
> > corresponding CAP_xxx is actually set.
>
> Why? This isn't how any of our ns_capable() logic works.
>
> This basically argues that anyone in an ancestor user namespace is not
> allowed to operate on any of their descendant child namespaces unless
> they are in the init_user_ns.
>
> But that's nonsense as I'm trying to tell you. Any process in an
> ancestor user namespace that is privileged over the child namespace can
> just setns() into it and then pass that bpf_token_capable() check by
> supplying the token.
>
> At this point just do it properly and allow callers that are privileged
> in the token user namespace to load bpf programs. It also means you get
> user namespace nesting done properly.

Ok, I see. This `current_user_ns() == token->userns` check prevents
this part of cap_capable() to ever be exercised:

 if ((ns->parent == cred->user_ns) && uid_eq(ns->owner, cred->euid))
    return 0;

Got it. I'm all for not adding any unnecessary restrictions.

>
> >
> > >
> > > For example, if the caller is in the init_user_ns and permissions
> > > for CAP_WHATEVER is checked for in token->user_ns and the caller has
> > > CAP_WHATEVER in init_user_ns then they also have it in all
> > > descendant user namespaces.
> >
> > Right, so if they didn't use a token they would still pass
> > capable(CAP_WHATEVER), right?
>
> Yes, I'm trying to accomodate your request but making it work
> consistently.
>
> >
> > >
> > > The original intention had been to align with what we require during
> > > token creation meaning that once a token has been created interacting
> > > with this token is specifically confined to caller's located in the
> > > token's user namespace.
> > >
> > > If that's not the case then it doesn't make sense to not allow
> > > permission checking based on regular capability semantics. IOW, why
> > > special case init_user_ns if you're breaking the confinement restriction
> > > anyway.
> >
> > I'm sorry, perhaps I'm dense, but with `current_user_ns() ==
> > token->userns` check I think we do fulfill the intention to not allow
> > using a token in a userns different from the one in which it was
> > created. If that condition isn't satisfied, the token is immediately
>
> My request originally was about never being able to interact with a
> token outside of that userns. This is different as you provide an escape
> hatch to init_user_ns. But if you need that and ignore the token then
> please do it properly. That's what I'm trying to tell you. See below.

Yes, I do need that. Thanks for providing the full code implementation
(including LSM), it's much easier this way to converge. Let's see
below.

>
> > ignored. So you can't use a token from another userns for anything,
> > it's just not there, effectively.
> >
> > And as I tried to explain above, I do think that ignoring the token
> > instead of erroring out early is what we want to provide good
> > user-space ecosystem integration of BPF token.
>
> There is no erroring out early in. It's:
>
> (1) Has a token been provided and is the caller capable wrt to the
>     namespace of the token? Any caller in an ancestor user namespace
>     that has the capability in that user namespace is capable wrt to
>     that token. That __includes__ a callers in the init_user_ns. IOW,
>     you don't need to fallback to any special checking for init_user_ns.
>     It is literally covered in the if (token) branch with the added
>     consistency that a process in an ancestor user namespace is
>     privileged wrt to that token as well.
>
> (2) No token has been provided. Then do what we always did and perform
>     the capability checks based on the initial user namespace.
>
> The only thing that you then still need is add that token_capable() hook
> in there:
>
> bool bpf_token_capable(const struct bpf_token *token, int cap)
> {
>         bool has_cap;
>         struct user_namespace *userns = &init_user_ns;
>
>         if (token)
>                 userns = token->userns;
>         if (ns_capable(userns, cap))

Here, we still need to check security_bpf_token_capable(token, cap)
result (and only if token != NULL). And if LSM returns < 0, then drop
the token and do the original init userns check.

And I just realized that my original implementation has the same
problem. In my current implementation if we have a token we will
terminate at LSM call, regardless if LSM allows or disallows the
token. But that's inconsistent behavior and shouldn't be like that.

I will add new tests that validate LSM interactions in the next revision.

>                 return true;
>         if (cap != CAP_SYS_ADMIN && ns_capable(userns, CAP_SYS_ADMIN))
>                 return token ? security_bpf_token_capable(token, cap) == 0 : true;

here as well, even if we have a token which passes ns_capable() check,
but LSM rejects this token, we still need to forget about the token
and do capable() checks in init userns.

>         return false;
> }
>
> Or write it however you like. I think this is way more consistent and
> gives you a more flexible permission model.

Yes, I like it, thanks. Taking into account fixed LSM interactions,
here's what I came up with. Yell if you can spot anything wrong (or
just hate the style). I did have a version without extra function,
just setting the token to NULL and "goto again" approach, but I think
it's way less readable and harder to follow. So this is my version
right now:

static bool bpf_ns_capable(struct user_namespace *ns, int cap)
{
        return ns_capable(ns, cap) || (cap != CAP_SYS_ADMIN &&
ns_capable(ns, CAP_SYS_ADMIN));
}

static bool token_capable(const struct bpf_token *token, int cap)
{
        struct user_namespace *userns;

        userns = token ? token->userns : &init_user_ns;
        if (!bpf_ns_capable(userns, cap))
                return false;
        if (token && security_bpf_token_capable(token, cap) < 0)
                return false;
        return true;
}

bool bpf_token_capable(const struct bpf_token *token, int cap)
{
        /* BPF token allows ns_capable() level of capabilities, but if it
         * doesn't grant required capabilities, ignore token and fallback to
         * init userns-based checks
         */
        if (token && token_capable(token, cap))
                return true;
        return token_capable(NULL, cap);
}

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-11 17:41                     ` Andrii Nakryiko
@ 2024-01-12  7:58                       ` Christian Brauner
  2024-01-12 18:32                         ` Andrii Nakryiko
  0 siblings, 1 reply; 59+ messages in thread
From: Christian Brauner @ 2024-01-12  7:58 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

On Thu, Jan 11, 2024 at 09:41:25AM -0800, Andrii Nakryiko wrote:
> On Thu, Jan 11, 2024 at 2:38 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > > > The current check is inconsisent. It special-cases init_user_ns. The
> > > > correct thing to do for what you're intending imho is:
> > > >
> > > > bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > {
> > > >         struct user_namespace *userns = &init_user_ns;
> > > >
> > > >         if (token)
> > > >                 userns = token->userns;
> > > >         if (ns_capable(userns, cap))
> > > >                 return true;
> > > >         return cap != CAP_SYS_ADMIN && ns_capable(userns, CAP_SYS_ADMIN))
> > > >
> > > > }
> > >
> > > Unfortunately the above becomes significantly more hairy when LSM
> > > (security_bpf_token_capable) gets involved, while preserving the rule
> > > "if token doesn't give rights, fall back to init userns checks".
> >
> > Why? Please explain your reasoning in detail.
> 
> Why which part? About LSM interaction making this much hairier? Then see below.
> 
> But if your "why?" is about "pretend no token, if token doesn't give
> rights", then that's what I tried to explain in my last email(s). It
> significantly alters (for the worse) user-space integration story
> (providing a token can be a regression, so now it's not safe to
> opportunistically try to create and use BPF token; on the other hand,
> automatically retrying inside libbpf makes for confusing user
> experience and inefficiencies). Please let me know which parts are not
> clear.
> 
> >
> > >
> > > I'm happy to accommodate any implementation of bpf_token_capable() as
> > > long as it behaves as discussed above and also satisfies Paul's
> > > requirement that capability checks should happen before LSM checks.
> > >
> > > >
> > > > Because any caller located in an ancestor user namespace of
> > > > token->user_ns will be privileged wrt to the token's userns as long as
> > > > they have that capability in their user namespace.
> > >
> > > And with `current_user_ns() == token->userns` check we won't be using
> > > token->userns while the caller is in ancestor user namespace, we'll
> > > use capable() check, which will succeed only in init user_ns, assuming
> > > corresponding CAP_xxx is actually set.
> >
> > Why? This isn't how any of our ns_capable() logic works.
> >
> > This basically argues that anyone in an ancestor user namespace is not
> > allowed to operate on any of their descendant child namespaces unless
> > they are in the init_user_ns.
> >
> > But that's nonsense as I'm trying to tell you. Any process in an
> > ancestor user namespace that is privileged over the child namespace can
> > just setns() into it and then pass that bpf_token_capable() check by
> > supplying the token.
> >
> > At this point just do it properly and allow callers that are privileged
> > in the token user namespace to load bpf programs. It also means you get
> > user namespace nesting done properly.
> 
> Ok, I see. This `current_user_ns() == token->userns` check prevents
> this part of cap_capable() to ever be exercised:
> 
>  if ((ns->parent == cred->user_ns) && uid_eq(ns->owner, cred->euid))
>     return 0;
> 
> Got it. I'm all for not adding any unnecessary restrictions.
> 
> >
> > >
> > > >
> > > > For example, if the caller is in the init_user_ns and permissions
> > > > for CAP_WHATEVER is checked for in token->user_ns and the caller has
> > > > CAP_WHATEVER in init_user_ns then they also have it in all
> > > > descendant user namespaces.
> > >
> > > Right, so if they didn't use a token they would still pass
> > > capable(CAP_WHATEVER), right?
> >
> > Yes, I'm trying to accomodate your request but making it work
> > consistently.
> >
> > >
> > > >
> > > > The original intention had been to align with what we require during
> > > > token creation meaning that once a token has been created interacting
> > > > with this token is specifically confined to caller's located in the
> > > > token's user namespace.
> > > >
> > > > If that's not the case then it doesn't make sense to not allow
> > > > permission checking based on regular capability semantics. IOW, why
> > > > special case init_user_ns if you're breaking the confinement restriction
> > > > anyway.
> > >
> > > I'm sorry, perhaps I'm dense, but with `current_user_ns() ==
> > > token->userns` check I think we do fulfill the intention to not allow
> > > using a token in a userns different from the one in which it was
> > > created. If that condition isn't satisfied, the token is immediately
> >
> > My request originally was about never being able to interact with a
> > token outside of that userns. This is different as you provide an escape
> > hatch to init_user_ns. But if you need that and ignore the token then
> > please do it properly. That's what I'm trying to tell you. See below.
> 
> Yes, I do need that. Thanks for providing the full code implementation
> (including LSM), it's much easier this way to converge. Let's see
> below.
> 
> >
> > > ignored. So you can't use a token from another userns for anything,
> > > it's just not there, effectively.
> > >
> > > And as I tried to explain above, I do think that ignoring the token
> > > instead of erroring out early is what we want to provide good
> > > user-space ecosystem integration of BPF token.
> >
> > There is no erroring out early in. It's:
> >
> > (1) Has a token been provided and is the caller capable wrt to the
> >     namespace of the token? Any caller in an ancestor user namespace
> >     that has the capability in that user namespace is capable wrt to
> >     that token. That __includes__ a callers in the init_user_ns. IOW,
> >     you don't need to fallback to any special checking for init_user_ns.
> >     It is literally covered in the if (token) branch with the added
> >     consistency that a process in an ancestor user namespace is
> >     privileged wrt to that token as well.
> >
> > (2) No token has been provided. Then do what we always did and perform
> >     the capability checks based on the initial user namespace.
> >
> > The only thing that you then still need is add that token_capable() hook
> > in there:
> >
> > bool bpf_token_capable(const struct bpf_token *token, int cap)
> > {
> >         bool has_cap;
> >         struct user_namespace *userns = &init_user_ns;
> >
> >         if (token)
> >                 userns = token->userns;
> >         if (ns_capable(userns, cap))
> 
> Here, we still need to check security_bpf_token_capable(token, cap)
> result (and only if token != NULL). And if LSM returns < 0, then drop
> the token and do the original init userns check.
> 
> And I just realized that my original implementation has the same
> problem. In my current implementation if we have a token we will
> terminate at LSM call, regardless if LSM allows or disallows the
> token. But that's inconsistent behavior and shouldn't be like that.
> 
> I will add new tests that validate LSM interactions in the next revision.
> 
> >                 return true;
> >         if (cap != CAP_SYS_ADMIN && ns_capable(userns, CAP_SYS_ADMIN))
> >                 return token ? security_bpf_token_capable(token, cap) == 0 : true;
> 
> here as well, even if we have a token which passes ns_capable() check,
> but LSM rejects this token, we still need to forget about the token
> and do capable() checks in init userns.
> 
> >         return false;
> > }
> >
> > Or write it however you like. I think this is way more consistent and
> > gives you a more flexible permission model.
> 
> Yes, I like it, thanks. Taking into account fixed LSM interactions,
> here's what I came up with. Yell if you can spot anything wrong (or
> just hate the style). I did have a version without extra function,
> just setting the token to NULL and "goto again" approach, but I think
> it's way less readable and harder to follow. So this is my version
> right now:
> 
> static bool bpf_ns_capable(struct user_namespace *ns, int cap)
> {
>         return ns_capable(ns, cap) || (cap != CAP_SYS_ADMIN &&
> ns_capable(ns, CAP_SYS_ADMIN));
> }
> 
> static bool token_capable(const struct bpf_token *token, int cap)
> {
>         struct user_namespace *userns;
> 
>         userns = token ? token->userns : &init_user_ns;
>         if (!bpf_ns_capable(userns, cap))
>                 return false;
>         if (token && security_bpf_token_capable(token, cap) < 0)
>                 return false;
>         return true;
> }
> 
> bool bpf_token_capable(const struct bpf_token *token, int cap)
> {
>         /* BPF token allows ns_capable() level of capabilities, but if it
>          * doesn't grant required capabilities, ignore token and fallback to
>          * init userns-based checks
>          */
>         if (token && token_capable(token, cap))
>                 return true;
>         return token_capable(NULL, cap);
> }

My point is that the capable logic will walk upwards the user namespace
hierarchy from the token->userns until the user namespace of the caller
and terminate when it reached the init_user_ns.

A caller is located in some namespace at the point where they call this
function. They provided a token. The caller isn't capable in the
namespace of the token so the function falls back to init_user_ns. Two
interesting cases:

(1) The caller wasn't in an ancestor userns of the token. If that's the
    case then it follows that the caller also wasn't in the init_user_ns
    because the init_user_ns is a descendant of all other user
    namespaces. So falling back will fail.

(2) The caller was in the same or an ancestor user namespace of the
    token but didn't have the capability in that user namespace:
    
     (i) They were in a non-init_user_ns. Therefore they can't be
         privileged in init_user_ns.
    (ii) They were in init_user_ns. Therefore, they lacked privileges in
         the init_user_ns.
    
In both cases your fallback will do nothing iiuc.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-12  7:58                       ` Christian Brauner
@ 2024-01-12 18:32                         ` Andrii Nakryiko
  2024-01-12 19:16                           ` Christian Brauner
  0 siblings, 1 reply; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-12 18:32 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

On Thu, Jan 11, 2024 at 11:58 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Thu, Jan 11, 2024 at 09:41:25AM -0800, Andrii Nakryiko wrote:
> > On Thu, Jan 11, 2024 at 2:38 AM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > > > The current check is inconsisent. It special-cases init_user_ns. The
> > > > > correct thing to do for what you're intending imho is:
> > > > >
> > > > > bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > > {
> > > > >         struct user_namespace *userns = &init_user_ns;
> > > > >
> > > > >         if (token)
> > > > >                 userns = token->userns;
> > > > >         if (ns_capable(userns, cap))
> > > > >                 return true;
> > > > >         return cap != CAP_SYS_ADMIN && ns_capable(userns, CAP_SYS_ADMIN))
> > > > >
> > > > > }
> > > >
> > > > Unfortunately the above becomes significantly more hairy when LSM
> > > > (security_bpf_token_capable) gets involved, while preserving the rule
> > > > "if token doesn't give rights, fall back to init userns checks".
> > >
> > > Why? Please explain your reasoning in detail.
> >
> > Why which part? About LSM interaction making this much hairier? Then see below.
> >
> > But if your "why?" is about "pretend no token, if token doesn't give
> > rights", then that's what I tried to explain in my last email(s). It
> > significantly alters (for the worse) user-space integration story
> > (providing a token can be a regression, so now it's not safe to
> > opportunistically try to create and use BPF token; on the other hand,
> > automatically retrying inside libbpf makes for confusing user
> > experience and inefficiencies). Please let me know which parts are not
> > clear.
> >
> > >
> > > >
> > > > I'm happy to accommodate any implementation of bpf_token_capable() as
> > > > long as it behaves as discussed above and also satisfies Paul's
> > > > requirement that capability checks should happen before LSM checks.
> > > >
> > > > >
> > > > > Because any caller located in an ancestor user namespace of
> > > > > token->user_ns will be privileged wrt to the token's userns as long as
> > > > > they have that capability in their user namespace.
> > > >
> > > > And with `current_user_ns() == token->userns` check we won't be using
> > > > token->userns while the caller is in ancestor user namespace, we'll
> > > > use capable() check, which will succeed only in init user_ns, assuming
> > > > corresponding CAP_xxx is actually set.
> > >
> > > Why? This isn't how any of our ns_capable() logic works.
> > >
> > > This basically argues that anyone in an ancestor user namespace is not
> > > allowed to operate on any of their descendant child namespaces unless
> > > they are in the init_user_ns.
> > >
> > > But that's nonsense as I'm trying to tell you. Any process in an
> > > ancestor user namespace that is privileged over the child namespace can
> > > just setns() into it and then pass that bpf_token_capable() check by
> > > supplying the token.
> > >
> > > At this point just do it properly and allow callers that are privileged
> > > in the token user namespace to load bpf programs. It also means you get
> > > user namespace nesting done properly.
> >
> > Ok, I see. This `current_user_ns() == token->userns` check prevents
> > this part of cap_capable() to ever be exercised:
> >
> >  if ((ns->parent == cred->user_ns) && uid_eq(ns->owner, cred->euid))
> >     return 0;
> >
> > Got it. I'm all for not adding any unnecessary restrictions.
> >
> > >
> > > >
> > > > >
> > > > > For example, if the caller is in the init_user_ns and permissions
> > > > > for CAP_WHATEVER is checked for in token->user_ns and the caller has
> > > > > CAP_WHATEVER in init_user_ns then they also have it in all
> > > > > descendant user namespaces.
> > > >
> > > > Right, so if they didn't use a token they would still pass
> > > > capable(CAP_WHATEVER), right?
> > >
> > > Yes, I'm trying to accomodate your request but making it work
> > > consistently.
> > >
> > > >
> > > > >
> > > > > The original intention had been to align with what we require during
> > > > > token creation meaning that once a token has been created interacting
> > > > > with this token is specifically confined to caller's located in the
> > > > > token's user namespace.
> > > > >
> > > > > If that's not the case then it doesn't make sense to not allow
> > > > > permission checking based on regular capability semantics. IOW, why
> > > > > special case init_user_ns if you're breaking the confinement restriction
> > > > > anyway.
> > > >
> > > > I'm sorry, perhaps I'm dense, but with `current_user_ns() ==
> > > > token->userns` check I think we do fulfill the intention to not allow
> > > > using a token in a userns different from the one in which it was
> > > > created. If that condition isn't satisfied, the token is immediately
> > >
> > > My request originally was about never being able to interact with a
> > > token outside of that userns. This is different as you provide an escape
> > > hatch to init_user_ns. But if you need that and ignore the token then
> > > please do it properly. That's what I'm trying to tell you. See below.
> >
> > Yes, I do need that. Thanks for providing the full code implementation
> > (including LSM), it's much easier this way to converge. Let's see
> > below.
> >
> > >
> > > > ignored. So you can't use a token from another userns for anything,
> > > > it's just not there, effectively.
> > > >
> > > > And as I tried to explain above, I do think that ignoring the token
> > > > instead of erroring out early is what we want to provide good
> > > > user-space ecosystem integration of BPF token.
> > >
> > > There is no erroring out early in. It's:
> > >
> > > (1) Has a token been provided and is the caller capable wrt to the
> > >     namespace of the token? Any caller in an ancestor user namespace
> > >     that has the capability in that user namespace is capable wrt to
> > >     that token. That __includes__ a callers in the init_user_ns. IOW,
> > >     you don't need to fallback to any special checking for init_user_ns.
> > >     It is literally covered in the if (token) branch with the added
> > >     consistency that a process in an ancestor user namespace is
> > >     privileged wrt to that token as well.
> > >
> > > (2) No token has been provided. Then do what we always did and perform
> > >     the capability checks based on the initial user namespace.
> > >
> > > The only thing that you then still need is add that token_capable() hook
> > > in there:
> > >
> > > bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > {
> > >         bool has_cap;
> > >         struct user_namespace *userns = &init_user_ns;
> > >
> > >         if (token)
> > >                 userns = token->userns;
> > >         if (ns_capable(userns, cap))
> >
> > Here, we still need to check security_bpf_token_capable(token, cap)
> > result (and only if token != NULL). And if LSM returns < 0, then drop
> > the token and do the original init userns check.
> >
> > And I just realized that my original implementation has the same
> > problem. In my current implementation if we have a token we will
> > terminate at LSM call, regardless if LSM allows or disallows the
> > token. But that's inconsistent behavior and shouldn't be like that.
> >
> > I will add new tests that validate LSM interactions in the next revision.
> >
> > >                 return true;
> > >         if (cap != CAP_SYS_ADMIN && ns_capable(userns, CAP_SYS_ADMIN))
> > >                 return token ? security_bpf_token_capable(token, cap) == 0 : true;
> >
> > here as well, even if we have a token which passes ns_capable() check,
> > but LSM rejects this token, we still need to forget about the token
> > and do capable() checks in init userns.
> >
> > >         return false;
> > > }
> > >
> > > Or write it however you like. I think this is way more consistent and
> > > gives you a more flexible permission model.
> >
> > Yes, I like it, thanks. Taking into account fixed LSM interactions,
> > here's what I came up with. Yell if you can spot anything wrong (or
> > just hate the style). I did have a version without extra function,
> > just setting the token to NULL and "goto again" approach, but I think
> > it's way less readable and harder to follow. So this is my version
> > right now:
> >
> > static bool bpf_ns_capable(struct user_namespace *ns, int cap)
> > {
> >         return ns_capable(ns, cap) || (cap != CAP_SYS_ADMIN &&
> > ns_capable(ns, CAP_SYS_ADMIN));
> > }
> >
> > static bool token_capable(const struct bpf_token *token, int cap)
> > {
> >         struct user_namespace *userns;
> >
> >         userns = token ? token->userns : &init_user_ns;
> >         if (!bpf_ns_capable(userns, cap))
> >                 return false;
> >         if (token && security_bpf_token_capable(token, cap) < 0)
> >                 return false;
> >         return true;
> > }
> >
> > bool bpf_token_capable(const struct bpf_token *token, int cap)
> > {
> >         /* BPF token allows ns_capable() level of capabilities, but if it
> >          * doesn't grant required capabilities, ignore token and fallback to
> >          * init userns-based checks
> >          */
> >         if (token && token_capable(token, cap))
> >                 return true;
> >         return token_capable(NULL, cap);
> > }
>
> My point is that the capable logic will walk upwards the user namespace
> hierarchy from the token->userns until the user namespace of the caller
> and terminate when it reached the init_user_ns.
>
> A caller is located in some namespace at the point where they call this
> function. They provided a token. The caller isn't capable in the
> namespace of the token so the function falls back to init_user_ns. Two
> interesting cases:
>
> (1) The caller wasn't in an ancestor userns of the token. If that's the
>     case then it follows that the caller also wasn't in the init_user_ns
>     because the init_user_ns is a descendant of all other user
>     namespaces. So falling back will fail.

agreed

>
> (2) The caller was in the same or an ancestor user namespace of the
>     token but didn't have the capability in that user namespace:
>
>      (i) They were in a non-init_user_ns. Therefore they can't be
>          privileged in init_user_ns.
>     (ii) They were in init_user_ns. Therefore, they lacked privileges in
>          the init_user_ns.
>
> In both cases your fallback will do nothing iiuc.

agreed as well

And I agree in general that there isn't a *practically useful* case
where this would matter much. But there is still (at least one) case
where there could be a regression: if token is created in
init_user_ns, caller has CAP_BPF in init_user_ns, caller passes that
token to BPF_PROG_LOAD, and LSM policy rejects that token in
security_bpf_token_capable(). Without the above implementation such
operation will be rejected, even though if there was no token passed
it would succeed. With my implementation above it will succeed as
expected.

Again, I get that those are unlikely corner cases. But this is kernel
API and it should behave consistently. I'd like to avoid having an
asterisk listing exceptional cases where behavior might not be logical
(however unlikely) and stuff like that. The promise is simple:
"providing a token can never regress permissions an application would
have without a token". And libraries like libbpf then can take that as
a contract to work with.

I like this latest implementation above as it is straightforward to
follow and satisfies that contract "by construction":

bool bpf_token_capable(const struct bpf_token *token, int cap)
{
       if (token && token_capable(token, cap))
            return true;
       return token_capable(NULL, cap);
}

So in summary, assuming we are converging and the above
bpf_token_capable() implementation doesn't have any more hidden
issues, I only have s/kvzalloc/kzalloc/, s/kvfree/kfree/ change that
Linus asked for on the kernel side. I'm planning to roll those into
corresponding existing patches. Besides that I'm adding a few new
tests to validate LSM interactions, which I'll add as a separate patch
to the series.

I'm thinking of sending the bpf/bpf-next patch set for v2, just like I
did with v1. But if PR directly to Linus is more preferable, please
someone do let me know, thank you!

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-12 18:32                         ` Andrii Nakryiko
@ 2024-01-12 19:16                           ` Christian Brauner
  2024-01-14  2:29                             ` Andrii Nakryiko
  0 siblings, 1 reply; 59+ messages in thread
From: Christian Brauner @ 2024-01-12 19:16 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

> > My point is that the capable logic will walk upwards the user namespace
> > hierarchy from the token->userns until the user namespace of the caller
> > and terminate when it reached the init_user_ns.
> >
> > A caller is located in some namespace at the point where they call this
> > function. They provided a token. The caller isn't capable in the
> > namespace of the token so the function falls back to init_user_ns. Two
> > interesting cases:
> >
> > (1) The caller wasn't in an ancestor userns of the token. If that's the
> >     case then it follows that the caller also wasn't in the init_user_ns
> >     because the init_user_ns is a descendant of all other user
> >     namespaces. So falling back will fail.
> 
> agreed
> 
> >
> > (2) The caller was in the same or an ancestor user namespace of the
> >     token but didn't have the capability in that user namespace:
> >
> >      (i) They were in a non-init_user_ns. Therefore they can't be
> >          privileged in init_user_ns.
> >     (ii) They were in init_user_ns. Therefore, they lacked privileges in
> >          the init_user_ns.
> >
> > In both cases your fallback will do nothing iiuc.
> 
> agreed as well
> 
> And I agree in general that there isn't a *practically useful* case
> where this would matter much. But there is still (at least one) case
> where there could be a regression: if token is created in
> init_user_ns, caller has CAP_BPF in init_user_ns, caller passes that
> token to BPF_PROG_LOAD, and LSM policy rejects that token in
> security_bpf_token_capable(). Without the above implementation such
> operation will be rejected, even though if there was no token passed
> it would succeed. With my implementation above it will succeed as
> expected.

If that's the case then prevent the creation of tokens in the
init_user_ns and be done with it. If you fallback anyway then this is
the correct solution.

Make this change, please. I'm not willing to support this weird fallback
stuff which is even hard to reason about.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-12 19:16                           ` Christian Brauner
@ 2024-01-14  2:29                             ` Andrii Nakryiko
  2024-01-16 16:37                               ` Christian Brauner
  0 siblings, 1 reply; 59+ messages in thread
From: Andrii Nakryiko @ 2024-01-14  2:29 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

On Fri, Jan 12, 2024 at 11:17 AM Christian Brauner <brauner@kernel.org> wrote:
>
> > > My point is that the capable logic will walk upwards the user namespace
> > > hierarchy from the token->userns until the user namespace of the caller
> > > and terminate when it reached the init_user_ns.
> > >
> > > A caller is located in some namespace at the point where they call this
> > > function. They provided a token. The caller isn't capable in the
> > > namespace of the token so the function falls back to init_user_ns. Two
> > > interesting cases:
> > >
> > > (1) The caller wasn't in an ancestor userns of the token. If that's the
> > >     case then it follows that the caller also wasn't in the init_user_ns
> > >     because the init_user_ns is a descendant of all other user
> > >     namespaces. So falling back will fail.
> >
> > agreed
> >
> > >
> > > (2) The caller was in the same or an ancestor user namespace of the
> > >     token but didn't have the capability in that user namespace:
> > >
> > >      (i) They were in a non-init_user_ns. Therefore they can't be
> > >          privileged in init_user_ns.
> > >     (ii) They were in init_user_ns. Therefore, they lacked privileges in
> > >          the init_user_ns.
> > >
> > > In both cases your fallback will do nothing iiuc.
> >
> > agreed as well
> >
> > And I agree in general that there isn't a *practically useful* case
> > where this would matter much. But there is still (at least one) case
> > where there could be a regression: if token is created in
> > init_user_ns, caller has CAP_BPF in init_user_ns, caller passes that
> > token to BPF_PROG_LOAD, and LSM policy rejects that token in
> > security_bpf_token_capable(). Without the above implementation such
> > operation will be rejected, even though if there was no token passed
> > it would succeed. With my implementation above it will succeed as
> > expected.
>
> If that's the case then prevent the creation of tokens in the
> init_user_ns and be done with it. If you fallback anyway then this is
> the correct solution.
>
> Make this change, please. I'm not willing to support this weird fallback
> stuff which is even hard to reason about.

Alright, added an extra check. Ok, so in summary I have the changes
below compared to v1 (plus a few extra LSM-related test cases added):

diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index a86fccd57e2d..7d04378560fd 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -9,18 +9,22 @@
 #include <linux/user_namespace.h>
 #include <linux/security.h>

+static bool bpf_ns_capable(struct user_namespace *ns, int cap)
+{
+       return ns_capable(ns, cap) || (cap != CAP_SYS_ADMIN &&
ns_capable(ns, CAP_SYS_ADMIN));
+}
+
 bool bpf_token_capable(const struct bpf_token *token, int cap)
 {
-       /* BPF token allows ns_capable() level of capabilities, but only if
-        * token's userns is *exactly* the same as current user's userns
-        */
-       if (token && current_user_ns() == token->userns) {
-               if (ns_capable(token->userns, cap) ||
-                   (cap != CAP_SYS_ADMIN && ns_capable(token->userns,
CAP_SYS_ADMIN)))
-                       return security_bpf_token_capable(token, cap) == 0;
-       }
-       /* otherwise fallback to capable() checks */
-       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
+       struct user_namespace *userns;
+
+       /* BPF token allows ns_capable() level of capabilities */
+       userns = token ? token->userns : &init_user_ns;
+       if (!bpf_ns_capable(userns, cap))
+               return false;
+       if (token && security_bpf_token_capable(token, cap) < 0)
+               return false;
+       return true;
 }

 void bpf_token_inc(struct bpf_token *token)
@@ -32,7 +36,7 @@ static void bpf_token_free(struct bpf_token *token)
 {
        security_bpf_token_free(token);
        put_user_ns(token->userns);
-       kvfree(token);
+       kfree(token);
 }

 static void bpf_token_put_deferred(struct work_struct *work)
@@ -152,6 +156,12 @@ int bpf_token_create(union bpf_attr *attr)
                goto out_path;
        }

+       /* Creating BPF token in init_user_ns doesn't make much sense. */
+       if (current_user_ns() == &init_user_ns) {
+               err = -EOPNOTSUPP;
+               goto out_path;
+       }
+
        mnt_opts = path.dentry->d_sb->s_fs_info;
        if (mnt_opts->delegate_cmds == 0 &&
            mnt_opts->delegate_maps == 0 &&
@@ -179,7 +189,7 @@ int bpf_token_create(union bpf_attr *attr)
                goto out_path;
        }

-       token = kvzalloc(sizeof(*token), GFP_USER);
+       token = kzalloc(sizeof(*token), GFP_USER);
        if (!token) {
                err = -ENOMEM;
                goto out_file;

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH bpf-next 03/29] bpf: introduce BPF token object
  2024-01-14  2:29                             ` Andrii Nakryiko
@ 2024-01-16 16:37                               ` Christian Brauner
  0 siblings, 0 replies; 59+ messages in thread
From: Christian Brauner @ 2024-01-16 16:37 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Linus Torvalds, Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
	linux-security-module, kernel-team

On Sat, Jan 13, 2024 at 06:29:33PM -0800, Andrii Nakryiko wrote:
> On Fri, Jan 12, 2024 at 11:17 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > > > My point is that the capable logic will walk upwards the user namespace
> > > > hierarchy from the token->userns until the user namespace of the caller
> > > > and terminate when it reached the init_user_ns.
> > > >
> > > > A caller is located in some namespace at the point where they call this
> > > > function. They provided a token. The caller isn't capable in the
> > > > namespace of the token so the function falls back to init_user_ns. Two
> > > > interesting cases:
> > > >
> > > > (1) The caller wasn't in an ancestor userns of the token. If that's the
> > > >     case then it follows that the caller also wasn't in the init_user_ns
> > > >     because the init_user_ns is a descendant of all other user
> > > >     namespaces. So falling back will fail.
> > >
> > > agreed
> > >
> > > >
> > > > (2) The caller was in the same or an ancestor user namespace of the
> > > >     token but didn't have the capability in that user namespace:
> > > >
> > > >      (i) They were in a non-init_user_ns. Therefore they can't be
> > > >          privileged in init_user_ns.
> > > >     (ii) They were in init_user_ns. Therefore, they lacked privileges in
> > > >          the init_user_ns.
> > > >
> > > > In both cases your fallback will do nothing iiuc.
> > >
> > > agreed as well
> > >
> > > And I agree in general that there isn't a *practically useful* case
> > > where this would matter much. But there is still (at least one) case
> > > where there could be a regression: if token is created in
> > > init_user_ns, caller has CAP_BPF in init_user_ns, caller passes that
> > > token to BPF_PROG_LOAD, and LSM policy rejects that token in
> > > security_bpf_token_capable(). Without the above implementation such
> > > operation will be rejected, even though if there was no token passed
> > > it would succeed. With my implementation above it will succeed as
> > > expected.
> >
> > If that's the case then prevent the creation of tokens in the
> > init_user_ns and be done with it. If you fallback anyway then this is
> > the correct solution.
> >
> > Make this change, please. I'm not willing to support this weird fallback
> > stuff which is even hard to reason about.
> 
> Alright, added an extra check. Ok, so in summary I have the changes
> below compared to v1 (plus a few extra LSM-related test cases added):
> 
> diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> index a86fccd57e2d..7d04378560fd 100644
> --- a/kernel/bpf/token.c
> +++ b/kernel/bpf/token.c
> @@ -9,18 +9,22 @@
>  #include <linux/user_namespace.h>
>  #include <linux/security.h>
> 
> +static bool bpf_ns_capable(struct user_namespace *ns, int cap)
> +{
> +       return ns_capable(ns, cap) || (cap != CAP_SYS_ADMIN &&
> ns_capable(ns, CAP_SYS_ADMIN));
> +}
> +
>  bool bpf_token_capable(const struct bpf_token *token, int cap)
>  {
> -       /* BPF token allows ns_capable() level of capabilities, but only if
> -        * token's userns is *exactly* the same as current user's userns
> -        */
> -       if (token && current_user_ns() == token->userns) {
> -               if (ns_capable(token->userns, cap) ||
> -                   (cap != CAP_SYS_ADMIN && ns_capable(token->userns,
> CAP_SYS_ADMIN)))
> -                       return security_bpf_token_capable(token, cap) == 0;
> -       }
> -       /* otherwise fallback to capable() checks */
> -       return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> +       struct user_namespace *userns;
> +
> +       /* BPF token allows ns_capable() level of capabilities */
> +       userns = token ? token->userns : &init_user_ns;
> +       if (!bpf_ns_capable(userns, cap))
> +               return false;
> +       if (token && security_bpf_token_capable(token, cap) < 0)
> +               return false;
> +       return true;
>  }
> 
>  void bpf_token_inc(struct bpf_token *token)
> @@ -32,7 +36,7 @@ static void bpf_token_free(struct bpf_token *token)
>  {
>         security_bpf_token_free(token);
>         put_user_ns(token->userns);
> -       kvfree(token);
> +       kfree(token);
>  }
> 
>  static void bpf_token_put_deferred(struct work_struct *work)
> @@ -152,6 +156,12 @@ int bpf_token_create(union bpf_attr *attr)
>                 goto out_path;
>         }
> 
> +       /* Creating BPF token in init_user_ns doesn't make much sense. */
> +       if (current_user_ns() == &init_user_ns) {
> +               err = -EOPNOTSUPP;
> +               goto out_path;
> +       }
> +
>         mnt_opts = path.dentry->d_sb->s_fs_info;
>         if (mnt_opts->delegate_cmds == 0 &&
>             mnt_opts->delegate_maps == 0 &&
> @@ -179,7 +189,7 @@ int bpf_token_create(union bpf_attr *attr)
>                 goto out_path;
>         }
> 
> -       token = kvzalloc(sizeof(*token), GFP_USER);
> +       token = kzalloc(sizeof(*token), GFP_USER);
>         if (!token) {
>                 err = -ENOMEM;
>                 goto out_file;

Thank you! Looks good,

Acked-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2024-01-16 16:37 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-03 22:20 [PATCH bpf-next 00/29] BPF token Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 01/29] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 02/29] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 03/29] bpf: introduce BPF token object Andrii Nakryiko
2024-01-05 20:25   ` Linus Torvalds
2024-01-05 20:32     ` Matthew Wilcox
2024-01-05 20:45       ` Linus Torvalds
2024-01-05 22:06         ` Andrii Nakryiko
2024-01-05 22:05     ` Andrii Nakryiko
2024-01-05 22:27       ` Alexei Starovoitov
2024-01-05 21:45   ` Linus Torvalds
2024-01-05 22:18     ` Andrii Nakryiko
2024-01-08 12:02       ` Christian Brauner
2024-01-08 23:58         ` Andrii Nakryiko
2024-01-09 14:52           ` Christian Brauner
2024-01-09 19:00             ` Andrii Nakryiko
2024-01-10 14:59               ` Christian Brauner
2024-01-11  0:42                 ` Andrii Nakryiko
2024-01-11 10:38                   ` Christian Brauner
2024-01-11 17:41                     ` Andrii Nakryiko
2024-01-12  7:58                       ` Christian Brauner
2024-01-12 18:32                         ` Andrii Nakryiko
2024-01-12 19:16                           ` Christian Brauner
2024-01-14  2:29                             ` Andrii Nakryiko
2024-01-16 16:37                               ` Christian Brauner
2024-01-08 12:01     ` Christian Brauner
2024-01-08 16:45     ` Paul Moore
2024-01-09  0:07       ` Andrii Nakryiko
2024-01-10 19:29         ` Paul Moore
2024-01-08 11:44   ` Christian Brauner
2024-01-03 22:20 ` [PATCH bpf-next 04/29] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 05/29] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 06/29] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 07/29] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 08/29] bpf: consistently use BPF token throughout BPF verifier logic Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 09/29] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 10/29] bpf,lsm: refactor bpf_map_alloc/bpf_map_free " Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 11/29] bpf,lsm: add BPF token " Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 12/29] libbpf: add bpf_token_create() API Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 13/29] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
2024-01-04 19:04   ` Linus Torvalds
2024-01-04 19:23     ` Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 14/29] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 15/29] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 16/29] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 17/29] bpf,selinux: allocate bpf_security_struct per BPF token Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 18/29] bpf: fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 19/29] bpf: support symbolic BPF FS delegation mount options Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 20/29] selftests/bpf: utilize string values for delegate_xxx " Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 21/29] libbpf: split feature detectors definitions from cached results Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 22/29] libbpf: further decouple feature checking logic from bpf_object Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 23/29] libbpf: move feature detection code into its own file Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 24/29] libbpf: wire up token_fd into feature probing logic Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 25/29] libbpf: wire up BPF token support at BPF object level Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 26/29] selftests/bpf: add BPF object loading tests with explicit token passing Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 27/29] selftests/bpf: add tests for BPF object load with implicit token Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 28/29] libbpf: support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar Andrii Nakryiko
2024-01-03 22:20 ` [PATCH bpf-next 29/29] selftests/bpf: add tests for " Andrii Nakryiko
2024-01-03 23:49 ` [PATCH bpf-next 00/29] BPF token Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).