[PATCH bpf-next v2 00/14] Exceptions

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next v2 00/14] Exceptions - 1/2
@ 2023-08-09 11:41 Kumar Kartikeya Dwivedi
  2023-08-09 11:41 ` [PATCH bpf-next v2 01/14] arch/x86: Implement arch_bpf_stack_walk Kumar Kartikeya Dwivedi
                   ` (14 more replies)
  0 siblings, 15 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

This series implements the _first_ part of the runtime and verifier
support needed to enable BPF exceptions. Exceptions thrown from programs
are processed as an immediate exit from the program, which unwinds all
the active stack frames until the main stack frame, and returns to the
BPF program's caller. The ability to perform this unwinding safely
allows the program to test conditions that are always true at runtime
but which the verifier has no visibility into.

Thus, it also reduces verification effort by safely terminating
redundant paths that can be taken within a program.

The patches to perform runtime resource cleanup during the
frame-by-frame unwinding will be posted as a follow-up to this set.

It must be noted that exceptions are not an error handling mechanism for
unlikely runtime conditions, but a way to safely terminate the execution
of a program in presence of conditions that should never occur at
runtime. They are meant to serve higher-level primitives such as program
assertions.

The following kfuncs and macros are introduced:

Assertion macros are also introduced, please see patch 13 for their
documentation.

/* Description
 *	Throw a BPF exception from the program, immediately terminating its
 *	execution and unwinding the stack. The supplied 'cookie' parameter
 *	will be the return value of the program when an exception is thrown,
 *	and the default exception callback is used. Otherwise, if an exception
 *	callback is set using the '__exception_cb(callback)' declaration tag
 *	on the main program, the 'cookie' parameter will be the callback's only
 *	input argument.
 *
 *	Thus, in case of default exception callback, 'cookie' is subjected to
 *	constraints on the program's return value (as with R0 on exit).
 *	Otherwise, the return value of the marked exception callback will be
 *	subjected to the same checks.
 *
 *	Note that throwing an exception with lingering resources (locks,
 *	references, etc.) will lead to a verification error.
 *
 *	Note that callbacks *cannot* call this helper.
 * Returns
 *	Never.
 * Throws
 *	An exception with the specified 'cookie' value.
 */
extern void bpf_throw(u64 cookie) __ksym;

/* This macro must be used to mark the exception callback corresponding to the
 * main program. For example:
 *
 * int exception_cb(u64 cookie) {
 *	return cookie;
 * }
 *
 * SEC("tc")
 * __exception_cb(exception_cb)
 * int main_prog(struct __sk_buff *ctx) {
 *	...
 *	return TC_ACT_OK;
 * }
 *
 * Here, exception callback for the main program will be 'exception_cb'. Note
 * that this attribute can only be used once, and multiple exception callbacks
 * specified for the main program will lead to verification error.
 */
\#define __exception_cb(name) __attribute__((btf_decl_tag("exception_callback:" #name)))

As such, a program can only install an exception handler once for the
lifetime of a BPF program, and this handler cannot be changed at
runtime. The purpose of the handler is to simply interpret the cookie
value supplied by the bpf_throw call, and execute user-defined logic
corresponding to it. The primary purpose of allowing a handler is to
control the return value of the program. The default handler returns the
cookie value passed to bpf_throw when an exception is thrown.

Fixing the handler for the lifetime of the program eliminates tricky and
expensive handling in case of runtime changes of the handler callback
when programs begin to nest, where it becomes more complex to save and
restore the active handler at runtime.

This version of offline unwinding based BPF exceptions is truly zero
overhead, with the exception of generation of a default callback which
contains a few instructions to return a default return value (0) when no
exception callback is supplied by the user.

Callbacks are disallowed from throwing BPF exceptions for now, since
such exceptions need to cross the callback helper boundary (and
therefore must care about unwinding kernel state), however it is
possible to lift this restriction in the future follow-up.

Exceptions terminate propogating at program boundaries, hence both
BPF_PROG_TYPE_EXT and tail call targets return to their caller context
the return value of the exception callback, in the event that they throw
an exception. Thus, exceptions do not cross extension or tail call
boundary.

However, this is mostly an implementation choice, and can be changed to
suit more user-friendly semantics.

Known issues
------------

 * Just asm volatile ("call bpf_throw" :::) does not emit DATASEC .ksyms
   for bpf_throw, there needs to be explicit call in C for clang to emit
   the DATASEC info in BTF, leading to errors during compilation.

Changelog:
----------
v1 -> v2
v1: https://lore.kernel.org/bpf/20230713023232.1411523-1-memxor@gmail.com

 * Address all comments from Alexei.
 * Fix a few bugs and corner cases in the implementations found during
   testing. Also add new selftests for these cases.
 * Reinstate patch to consider ksym.end part of the program (but
   reworked to cover other corner cases).
 * Implement new style of tagging exception callbacks, add libbpf
   support for the new declaration tag.
 * Limit support to 64-bit integer types for assertion macros. The
   compiler ends up performing shifts or bitwise and operations when
   finally making use of the value, which defeats the purpose of the
   macro. On noalu32 mode, the shifts may also happen before use,
   hurting reliability.
 * Comprehensively test assertion macros and their side effects on the
   verifier state, register bounds, etc.
 * Fix a KASAN false positive warning.

RFC v1 -> v1
RFC v1: https://lore.kernel.org/bpf/20230405004239.1375399-1-memxor@gmail.com

 * Completely rework the unwinding infrastructure to use offline
   unwinding support.
 * Remove the runtime exception state and program rewriting code.
 * Make bpf_set_exception_callback idempotent to avoid vexing
   synchronization and state clobbering issues in presence of program
   nesting.
 * Disable bpf_throw within callback functions, for now.
 * Allow bpf_throw in tail call programs and extension programs,
   removing limitations of rewrite based unwinding.
 * Expand selftests.

Kumar Kartikeya Dwivedi (14):
  arch/x86: Implement arch_bpf_stack_walk
  bpf: Implement support for adding hidden subprogs
  bpf: Implement BPF exceptions
  bpf: Refactor check_btf_func and split into two phases
  bpf: Add support for custom exception callbacks
  bpf: Perform CFG walk for exception callback
  bpf: Treat first argument as return value for bpf_throw
  bpf: Prevent KASAN false positive with bpf_throw
  bpf: Detect IP == ksym.end as part of BPF program
  bpf: Disallow extensions to exception callbacks
  bpf: Fix kfunc callback register type handling
  libbpf: Add support for custom exception callbacks
  selftests/bpf: Add BPF assertion macros
  selftests/bpf: Add tests for BPF exceptions

 arch/x86/net/bpf_jit_comp.c                   | 118 ++++-
 include/linux/bpf.h                           |   8 +-
 include/linux/bpf_verifier.h                  |   8 +-
 include/linux/filter.h                        |   8 +
 include/linux/kasan.h                         |   2 +
 kernel/bpf/btf.c                              |  29 +-
 kernel/bpf/core.c                             |  29 +-
 kernel/bpf/helpers.c                          |  45 ++
 kernel/bpf/syscall.c                          |   2 +-
 kernel/bpf/verifier.c                         | 455 +++++++++++++++---
 tools/lib/bpf/libbpf.c                        | 166 ++++++-
 tools/testing/selftests/bpf/DENYLIST.aarch64  |   1 +
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../testing/selftests/bpf/bpf_experimental.h  | 287 +++++++++++
 .../selftests/bpf/prog_tests/exceptions.c     | 324 +++++++++++++
 .../testing/selftests/bpf/progs/exceptions.c  | 368 ++++++++++++++
 .../selftests/bpf/progs/exceptions_assert.c   | 135 ++++++
 .../selftests/bpf/progs/exceptions_ext.c      |  59 +++
 .../selftests/bpf/progs/exceptions_fail.c     | 347 +++++++++++++
 19 files changed, 2271 insertions(+), 121 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/exceptions.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_assert.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_fail.c

base-commit: 2adbb7637fd1fcec93f4680ddb5ddbbd1a91aefb
-- 
2.41.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 01/14] arch/x86: Implement arch_bpf_stack_walk
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-09 11:41 ` [PATCH bpf-next v2 02/14] bpf: Implement support for adding hidden subprogs Kumar Kartikeya Dwivedi
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

The plumbing for offline unwinding when we throw an exception in
programs would require walking the stack, hence introduce a new
arch_bpf_stack_walk function. This is provided when the JIT supports
exceptions, i.e. bpf_jit_supports_exceptions is true. The arch-specific
code is really minimal, hence it should be straightforward to extend
this support to other architectures as well, as it reuses the logic of
arch_stack_walk, but allowing access to unwind_state data.

Once the stack pointer and frame pointer are known for the main subprog
during the unwinding, we know the stack layout and location of any
callee-saved registers which must be restored before we return back to
the kernel. This handling will be added in the subsequent patches.

Note that while we primarily unwind through BPF frames, which are
effectively CONFIG_UNWINDER_FRAME_POINTER, we still need one of this or
CONFIG_UNWINDER_ORC to be able to unwind through the bpf_throw frame
from which we begin walking the stack. We also require both sp and bp
(stack and frame pointers) from the unwind_state structure, which are
only available when one of these two options are enabled.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c | 28 ++++++++++++++++++++++++++++
 include/linux/filter.h      |  2 ++
 kernel/bpf/core.c           |  9 +++++++++
 3 files changed, 39 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a5930042139d..a0a0054014e0 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -16,6 +16,7 @@
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
 #include <asm/text-patching.h>
+#include <asm/unwind.h>
 
 static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
 {
@@ -2913,3 +2914,30 @@ void bpf_jit_free(struct bpf_prog *prog)
 
 	bpf_prog_unlock_free(prog);
 }
+
+bool bpf_jit_supports_exceptions(void)
+{
+	/* We unwind through both kernel frames (starting from within bpf_throw
+	 * call) and BPF frames. Therefore we require one of ORC or FP unwinder
+	 * to be enabled to walk kernel frames and reach BPF frames in the stack
+	 * trace.
+	 */
+	return IS_ENABLED(CONFIG_UNWINDER_ORC) || IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER);
+}
+
+void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie)
+{
+#if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER)
+	struct unwind_state state;
+	unsigned long addr;
+
+	for (unwind_start(&state, current, NULL, NULL); !unwind_done(&state);
+	     unwind_next_frame(&state)) {
+		addr = unwind_get_return_address(&state);
+		if (!addr || !consume_fn(cookie, (u64)addr, (u64)state.sp, (u64)state.bp))
+			break;
+	}
+	return;
+#endif
+	WARN(1, "verification of programs using bpf_throw should have failed\n");
+}
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 761af6b3cf2b..9fd8f0dc4077 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -912,6 +912,8 @@ bool bpf_jit_needs_zext(void);
 bool bpf_jit_supports_subprog_tailcalls(void);
 bool bpf_jit_supports_kfunc_call(void);
 bool bpf_jit_supports_far_kfunc_call(void);
+bool bpf_jit_supports_exceptions(void);
+void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie);
 bool bpf_helper_changes_pkt_data(void *func);
 
 static inline bool bpf_dump_raw_ok(const struct cred *cred)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 0f8f036d8bd1..526059386e9d 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2914,6 +2914,15 @@ int __weak bpf_arch_text_invalidate(void *dst, size_t len)
 	return -ENOTSUPP;
 }
 
+bool __weak bpf_jit_supports_exceptions(void)
+{
+	return false;
+}
+
+void __weak arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie)
+{
+}
+
 #ifdef CONFIG_BPF_SYSCALL
 static int __init bpf_global_ma_init(void)
 {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 02/14] bpf: Implement support for adding hidden subprogs
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
  2023-08-09 11:41 ` [PATCH bpf-next v2 01/14] arch/x86: Implement arch_bpf_stack_walk Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-09 11:41 ` [PATCH bpf-next v2 03/14] bpf: Implement BPF exceptions Kumar Kartikeya Dwivedi
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

Introduce support in the verifier for generating a subprogram and
include it as part of a BPF program dynamically after the do_check phase
is complete. The first user will be the next patch which generates
default exception callbacks if none are set for the program. The phase
of invocation will be do_misc_fixups. Note that this is an internal
verifier function, and should be used with instruction blocks which
uphold the invariants stated in check_subprogs.

Since these subprogs are always appended to the end of the instruction
sequence of the program, it becomes relatively inexpensive to do the
related adjustments to the subprog_info of the program. Only the fake
exit subprogram is shifted forward, making room for our new subprog.

This is useful to insert a new subprogram, get it JITed, and obtain its
function pointer. The next patch will use this functionality to insert a
default exception callback which will be invoked after unwinding the
stack.

Note that these added subprograms are invisible to userspace, and never
reported in BPF_OBJ_GET_INFO_BY_ID etc. For now, only a single
subprogram is supported, but more can be easily supported in the future.

To this end, two function counts are introduced now, the existing
func_cnt, and real_func_cnt, the latter including hidden programs. This
allows us to conver the JIT code to use the real_func_cnt for management
of resources while syscall path continues working with existing
func_cnt.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h          |  1 +
 include/linux/bpf_verifier.h |  3 ++-
 kernel/bpf/core.c            | 12 ++++++------
 kernel/bpf/syscall.c         |  2 +-
 kernel/bpf/verifier.c        | 36 +++++++++++++++++++++++++++++++++---
 5 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index db3fe5a61b05..751f565037f9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1377,6 +1377,7 @@ struct bpf_prog_aux {
 	u32 stack_depth;
 	u32 id;
 	u32 func_cnt; /* used by non-func prog as the number of func progs */
+	u32 real_func_cnt; /* includes hidden progs, only used for JIT and freeing progs */
 	u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */
 	u32 attach_btf_id; /* in-kernel BTF type id to attach to */
 	u32 ctx_arg_info_size;
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index f70f9ac884d2..beb0e9e01bd5 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -587,6 +587,7 @@ struct bpf_verifier_env {
 	u32 used_map_cnt;		/* number of used maps */
 	u32 used_btf_cnt;		/* number of used BTF objects */
 	u32 id_gen;			/* used to generate unique reg IDs */
+	u32 hidden_subprog_cnt;		/* number of hidden subprogs */
 	bool explore_alu_limits;
 	bool allow_ptr_leaks;
 	bool allow_uninit_stack;
@@ -597,7 +598,7 @@ struct bpf_verifier_env {
 	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
 	const struct bpf_line_info *prev_linfo;
 	struct bpf_verifier_log log;
-	struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 1];
+	struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 2]; /* max + 2 for the fake and exception subprogs */
 	union {
 		struct bpf_idmap idmap_scratch;
 		struct bpf_idset idset_scratch;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 526059386e9d..2e5907d15118 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -212,7 +212,7 @@ void bpf_prog_fill_jited_linfo(struct bpf_prog *prog,
 	const struct bpf_line_info *linfo;
 	void **jited_linfo;
 
-	if (!prog->aux->jited_linfo)
+	if (!prog->aux->jited_linfo || prog->aux->func_idx > prog->aux->func_cnt)
 		/* Userspace did not provide linfo */
 		return;
 
@@ -539,7 +539,7 @@ static void bpf_prog_kallsyms_del_subprogs(struct bpf_prog *fp)
 {
 	int i;
 
-	for (i = 0; i < fp->aux->func_cnt; i++)
+	for (i = 0; i < fp->aux->real_func_cnt; i++)
 		bpf_prog_kallsyms_del(fp->aux->func[i]);
 }
 
@@ -589,7 +589,7 @@ bpf_prog_ksym_set_name(struct bpf_prog *prog)
 	sym  = bin2hex(sym, prog->tag, sizeof(prog->tag));
 
 	/* prog->aux->name will be ignored if full btf name is available */
-	if (prog->aux->func_info_cnt) {
+	if (prog->aux->func_info_cnt && prog->aux->func_idx < prog->aux->func_info_cnt) {
 		type = btf_type_by_id(prog->aux->btf,
 				      prog->aux->func_info[prog->aux->func_idx].type_id);
 		func_name = btf_name_by_offset(prog->aux->btf, type->name_off);
@@ -1208,7 +1208,7 @@ int bpf_jit_get_func_addr(const struct bpf_prog *prog,
 		if (!extra_pass)
 			addr = NULL;
 		else if (prog->aux->func &&
-			 off >= 0 && off < prog->aux->func_cnt)
+			 off >= 0 && off < prog->aux->real_func_cnt)
 			addr = (u8 *)prog->aux->func[off]->bpf_func;
 		else
 			return -EINVAL;
@@ -2721,7 +2721,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
 #endif
 	if (aux->dst_trampoline)
 		bpf_trampoline_put(aux->dst_trampoline);
-	for (i = 0; i < aux->func_cnt; i++) {
+	for (i = 0; i < aux->real_func_cnt; i++) {
 		/* We can just unlink the subprog poke descriptor table as
 		 * it was originally linked to the main program and is also
 		 * released along with it.
@@ -2729,7 +2729,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
 		aux->func[i]->aux->poke_tab = NULL;
 		bpf_jit_free(aux->func[i]);
 	}
-	if (aux->func_cnt) {
+	if (aux->real_func_cnt) {
 		kfree(aux->func);
 		bpf_prog_unlock_free(aux->prog);
 	} else {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7f4e8c357a6a..d90f5001da83 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2746,7 +2746,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	 * period before we can tear down JIT memory since symbols
 	 * are already exposed under kallsyms.
 	 */
-	__bpf_prog_put_noref(prog, prog->aux->func_cnt);
+	__bpf_prog_put_noref(prog, prog->aux->real_func_cnt);
 	return err;
 free_prog_sec:
 	free_uid(prog->aux->user);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4ccca1f6c998..ed90e22d7600 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -15108,7 +15108,8 @@ static void adjust_btf_func(struct bpf_verifier_env *env)
 	if (!aux->func_info)
 		return;
 
-	for (i = 0; i < env->subprog_cnt; i++)
+	/* func_info is not available for hidden subprogs */
+	for (i = 0; i < env->subprog_cnt - env->hidden_subprog_cnt; i++)
 		aux->func_info[i].insn_off = env->subprog_info[i].start;
 }
 
@@ -18053,7 +18054,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		 * the call instruction, as an index for this list
 		 */
 		func[i]->aux->func = func;
-		func[i]->aux->func_cnt = env->subprog_cnt;
+		func[i]->aux->func_cnt = env->subprog_cnt - env->hidden_subprog_cnt;
+		func[i]->aux->real_func_cnt = env->subprog_cnt;
 	}
 	for (i = 0; i < env->subprog_cnt; i++) {
 		old_bpf_func = func[i]->bpf_func;
@@ -18099,7 +18101,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 	prog->aux->extable = func[0]->aux->extable;
 	prog->aux->num_exentries = func[0]->aux->num_exentries;
 	prog->aux->func = func;
-	prog->aux->func_cnt = env->subprog_cnt;
+	prog->aux->func_cnt = env->subprog_cnt - env->hidden_subprog_cnt;
+	prog->aux->real_func_cnt = env->subprog_cnt;
 	bpf_prog_jit_attempt_done(prog);
 	return 0;
 out_free:
@@ -18307,6 +18310,33 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	return 0;
 }
 
+/* The function requires that first instruction in 'patch' is insnsi[prog->len - 1] */
+static __maybe_unused int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *patch, int len)
+{
+	struct bpf_subprog_info *info = env->subprog_info;
+	int cnt = env->subprog_cnt;
+	struct bpf_prog *prog;
+
+	/* We only reserve one slot for hidden subprogs in subprog_info. */
+	if (env->hidden_subprog_cnt) {
+		verbose(env, "verifier internal error: only one hidden subprog supported\n");
+		return -EFAULT;
+	}
+	/* We're not patching any existing instruction, just appending the new
+	 * ones for the hidden subprog. Hence all of the adjustment operations
+	 * in bpf_patch_insn_data are no-ops.
+	 */
+	prog = bpf_patch_insn_data(env, env->prog->len - 1, patch, len);
+	if (!prog)
+		return -ENOMEM;
+	env->prog = prog;
+	info[cnt + 1].start = info[cnt].start;
+	info[cnt].start = prog->len - len + 1;
+	env->subprog_cnt++;
+	env->hidden_subprog_cnt++;
+	return 0;
+}
+
 /* Do various post-verification rewrites in a single program pass.
  * These rewrites simplify JIT and interpreter implementations.
  */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 03/14] bpf: Implement BPF exceptions
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
  2023-08-09 11:41 ` [PATCH bpf-next v2 01/14] arch/x86: Implement arch_bpf_stack_walk Kumar Kartikeya Dwivedi
  2023-08-09 11:41 ` [PATCH bpf-next v2 02/14] bpf: Implement support for adding hidden subprogs Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-22  5:12   ` Alexei Starovoitov
  2023-08-09 11:41 ` [PATCH bpf-next v2 04/14] bpf: Refactor check_btf_func and split into two phases Kumar Kartikeya Dwivedi
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

This patch implements BPF exceptions, and introduces a bpf_throw kfunc
to allow programs to throw exceptions during their execution at runtime.
A bpf_throw invocation is treated as an immediate termination of the
program, returning back to its caller within the kernel, unwinding all
stack frames.

This allows the program to simplify its implementation, by testing for
runtime conditions which the verifier has no visibility into, and assert
that they are true. In case they are not, the program can simply throw
an exception from the other branch.

BPF exceptions are explicitly *NOT* an unlikely slowpath error handling
primitive, and this objective has guided design choices of the
implementation of the them within the kernel (with the bulk of the cost
for unwinding the stack offloaded to the bpf_throw kfunc).

The implementation of this mechanism requires use of add_hidden_subprog
mechanism introduced in the previous patch, which generates a couple of
instructions to move R1 to R0 and exit. The JIT then rewrites the
prologue of this subprog to take the stack pointer and frame pointer as
inputs and reset the stack frame, popping all callee-saved registers
saved by the main subprog. The bpf_throw function then walks the stack
at runtime, and invokes this exception subprog with the stack and frame
pointers as parameters.

Reviewers must take note that currently the main program is made to save
all callee-saved registers on x86_64 during entry into the program. This
is because we must do an equivalent of a lightweight context switch when
unwinding the stack, therefore we need the callee-saved registers of the
caller of the BPF program to be able to return with a sane state.

Note that we have to additionally handle r12, even though it is not used
by the program, because when throwing the exception the program makes an
entry into the kernel which could clobber r12 after saving it on the
stack. To be able to preserve the value we received on program entry, we
push r12 and restore it from the generated subprogram when unwinding the
stack.

For now, bpf_throw invocation fails when lingering resources or locks
exist in that path of the program. In a future followup, bpf_throw will
be extended to perform frame-by-frame unwinding to release lingering
resources for each stack frame, removing this limitation.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c                   |  90 ++++++++++++--
 include/linux/bpf.h                           |   3 +
 include/linux/bpf_verifier.h                  |   4 +
 include/linux/filter.h                        |   6 +
 kernel/bpf/core.c                             |   2 +-
 kernel/bpf/helpers.c                          |  38 ++++++
 kernel/bpf/verifier.c                         | 116 +++++++++++++++---
 .../testing/selftests/bpf/bpf_experimental.h  |  16 +++
 8 files changed, 248 insertions(+), 27 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a0a0054014e0..7f28705da26e 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -18,6 +18,8 @@
 #include <asm/text-patching.h>
 #include <asm/unwind.h>
 
+static bool all_callee_regs_used[4] = {true, true, true, true};
+
 static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
 {
 	if (len == 1)
@@ -256,6 +258,14 @@ struct jit_context {
 /* Number of bytes that will be skipped on tailcall */
 #define X86_TAIL_CALL_OFFSET	(11 + ENDBR_INSN_SIZE)
 
+static void push_r12(u8 **pprog)
+{
+	u8 *prog = *pprog;
+
+	EMIT2(0x41, 0x54);   /* push r12 */
+	*pprog = prog;
+}
+
 static void push_callee_regs(u8 **pprog, bool *callee_regs_used)
 {
 	u8 *prog = *pprog;
@@ -271,6 +281,14 @@ static void push_callee_regs(u8 **pprog, bool *callee_regs_used)
 	*pprog = prog;
 }
 
+static void pop_r12(u8 **pprog)
+{
+	u8 *prog = *pprog;
+
+	EMIT2(0x41, 0x5C);   /* pop r12 */
+	*pprog = prog;
+}
+
 static void pop_callee_regs(u8 **pprog, bool *callee_regs_used)
 {
 	u8 *prog = *pprog;
@@ -292,7 +310,8 @@ static void pop_callee_regs(u8 **pprog, bool *callee_regs_used)
  * while jumping to another program
  */
 static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
-			  bool tail_call_reachable, bool is_subprog)
+			  bool tail_call_reachable, bool is_subprog,
+			  bool is_exception_cb)
 {
 	u8 *prog = *pprog;
 
@@ -308,8 +327,22 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
 		else
 			EMIT2(0x66, 0x90); /* nop2 */
 	}
-	EMIT1(0x55);             /* push rbp */
-	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
+	/* Exception callback receives FP as second parameter */
+	if (is_exception_cb) {
+		EMIT3(0x48, 0x89, 0xF4); /* mov rsp, rsi */
+		EMIT3(0x48, 0x89, 0xD5); /* mov rbp, rdx */
+		/* The main frame must have exception_boundary as true, so we
+		 * first restore those callee-saved regs from stack, before
+		 * reusing the stack frame.
+		 */
+		pop_callee_regs(&prog, all_callee_regs_used);
+		pop_r12(&prog);
+		/* Reset the stack frame. */
+		EMIT3(0x48, 0x89, 0xEC); /* mov rsp, rbp */
+	} else {
+		EMIT1(0x55);             /* push rbp */
+		EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
+	}
 
 	/* X86_TAIL_CALL_OFFSET is here */
 	EMIT_ENDBR();
@@ -468,7 +501,8 @@ static void emit_return(u8 **pprog, u8 *ip)
  *   goto *(prog->bpf_func + prologue_size);
  * out:
  */
-static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
+static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
+					u8 **pprog, bool *callee_regs_used,
 					u32 stack_depth, u8 *ip,
 					struct jit_context *ctx)
 {
@@ -518,7 +552,12 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
 	EMIT2(X86_JE, offset);                    /* je out */
 
-	pop_callee_regs(&prog, callee_regs_used);
+	if (bpf_prog->aux->exception_boundary) {
+		pop_callee_regs(&prog, all_callee_regs_used);
+		pop_r12(&prog);
+	} else {
+		pop_callee_regs(&prog, callee_regs_used);
+	}
 
 	EMIT1(0x58);                              /* pop rax */
 	if (stack_depth)
@@ -542,7 +581,8 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	*pprog = prog;
 }
 
-static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
+static void emit_bpf_tail_call_direct(struct bpf_prog *bpf_prog,
+				      struct bpf_jit_poke_descriptor *poke,
 				      u8 **pprog, u8 *ip,
 				      bool *callee_regs_used, u32 stack_depth,
 				      struct jit_context *ctx)
@@ -571,7 +611,13 @@ static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
 	emit_jump(&prog, (u8 *)poke->tailcall_target + X86_PATCH_SIZE,
 		  poke->tailcall_bypass);
 
-	pop_callee_regs(&prog, callee_regs_used);
+	if (bpf_prog->aux->exception_boundary) {
+		pop_callee_regs(&prog, all_callee_regs_used);
+		pop_r12(&prog);
+	} else {
+		pop_callee_regs(&prog, callee_regs_used);
+	}
+
 	EMIT1(0x58);                                  /* pop rax */
 	if (stack_depth)
 		EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
@@ -1042,8 +1088,21 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 
 	emit_prologue(&prog, bpf_prog->aux->stack_depth,
 		      bpf_prog_was_classic(bpf_prog), tail_call_reachable,
-		      bpf_prog->aux->func_idx != 0);
-	push_callee_regs(&prog, callee_regs_used);
+		      bpf_prog->aux->func_idx != 0, bpf_prog->aux->exception_cb);
+	/* Exception callback will clobber callee regs for its own use, and
+	 * restore the original callee regs from main prog's stack frame.
+	 */
+	if (bpf_prog->aux->exception_boundary) {
+		/* We also need to save r12, which is not mapped to any BPF
+		 * register, as we throw after entry into the kernel, which may
+		 * overwrite r12.
+		 */
+		push_r12(&prog);
+		push_callee_regs(&prog, all_callee_regs_used);
+	} else {
+		push_callee_regs(&prog, callee_regs_used);
+	}
+
 
 	ilen = prog - temp;
 	if (rw_image)
@@ -1642,13 +1701,15 @@ st:			if (is_imm8(insn->off))
 
 		case BPF_JMP | BPF_TAIL_CALL:
 			if (imm32)
-				emit_bpf_tail_call_direct(&bpf_prog->aux->poke_tab[imm32 - 1],
+				emit_bpf_tail_call_direct(bpf_prog,
+							  &bpf_prog->aux->poke_tab[imm32 - 1],
 							  &prog, image + addrs[i - 1],
 							  callee_regs_used,
 							  bpf_prog->aux->stack_depth,
 							  ctx);
 			else
-				emit_bpf_tail_call_indirect(&prog,
+				emit_bpf_tail_call_indirect(bpf_prog,
+							    &prog,
 							    callee_regs_used,
 							    bpf_prog->aux->stack_depth,
 							    image + addrs[i - 1],
@@ -1901,7 +1962,12 @@ st:			if (is_imm8(insn->off))
 			seen_exit = true;
 			/* Update cleanup_addr */
 			ctx->cleanup_addr = proglen;
-			pop_callee_regs(&prog, callee_regs_used);
+			if (bpf_prog->aux->exception_boundary) {
+				pop_callee_regs(&prog, all_callee_regs_used);
+				pop_r12(&prog);
+			} else {
+				pop_callee_regs(&prog, callee_regs_used);
+			}
 			EMIT1(0xC9);         /* leave */
 			emit_return(&prog, image + addrs[i - 1] + (prog - temp));
 			break;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 751f565037f9..e938e75b0998 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1398,6 +1398,8 @@ struct bpf_prog_aux {
 	bool sleepable;
 	bool tail_call_reachable;
 	bool xdp_has_frags;
+	bool exception_cb;
+	bool exception_boundary;
 	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
 	const struct btf_type *attach_func_proto;
 	/* function name for valid attach_btf_id */
@@ -1420,6 +1422,7 @@ struct bpf_prog_aux {
 	int cgroup_atype; /* enum cgroup_bpf_attach_type */
 	struct bpf_map *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
 	char name[BPF_OBJ_NAME_LEN];
+	unsigned int (*bpf_exception_cb)(u64 cookie, u64 sp, u64 bp);
 #ifdef CONFIG_SECURITY
 	void *security;
 #endif
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index beb0e9e01bd5..9e6c25ecac9f 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -540,7 +540,9 @@ struct bpf_subprog_info {
 	bool has_tail_call;
 	bool tail_call_reachable;
 	bool has_ld_abs;
+	bool is_cb;
 	bool is_async_cb;
+	bool is_exception_cb;
 };
 
 struct bpf_verifier_env;
@@ -588,6 +590,7 @@ struct bpf_verifier_env {
 	u32 used_btf_cnt;		/* number of used BTF objects */
 	u32 id_gen;			/* used to generate unique reg IDs */
 	u32 hidden_subprog_cnt;		/* number of hidden subprogs */
+	int exception_callback_subprog;
 	bool explore_alu_limits;
 	bool allow_ptr_leaks;
 	bool allow_uninit_stack;
@@ -595,6 +598,7 @@ struct bpf_verifier_env {
 	bool bypass_spec_v1;
 	bool bypass_spec_v4;
 	bool seen_direct_write;
+	bool seen_exception;
 	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
 	const struct bpf_line_info *prev_linfo;
 	struct bpf_verifier_log log;
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9fd8f0dc4077..389e550a6a25 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1129,6 +1129,7 @@ const char *__bpf_address_lookup(unsigned long addr, unsigned long *size,
 bool is_bpf_text_address(unsigned long addr);
 int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
 		    char *sym);
+struct bpf_prog *bpf_prog_ksym_find(unsigned long addr);
 
 static inline const char *
 bpf_address_lookup(unsigned long addr, unsigned long *size,
@@ -1196,6 +1197,11 @@ static inline int bpf_get_kallsym(unsigned int symnum, unsigned long *value,
 	return -ERANGE;
 }
 
+static inline struct bpf_prog *bpf_prog_ksym_find(unsigned long addr)
+{
+	return NULL;
+}
+
 static inline const char *
 bpf_address_lookup(unsigned long addr, unsigned long *size,
 		   unsigned long *off, char **modname, char *sym)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 2e5907d15118..ef362d7b09a5 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -733,7 +733,7 @@ bool is_bpf_text_address(unsigned long addr)
 	return ret;
 }
 
-static struct bpf_prog *bpf_prog_ksym_find(unsigned long addr)
+struct bpf_prog *bpf_prog_ksym_find(unsigned long addr)
 {
 	struct bpf_ksym *ksym = bpf_ksym_find(addr);
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index eb91cae0612a..af4add1e3a31 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2429,6 +2429,43 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
 	rcu_read_unlock();
 }
 
+struct bpf_throw_ctx {
+	struct bpf_prog_aux *aux;
+	u64 sp;
+	u64 bp;
+	int cnt;
+};
+
+static bool bpf_stack_walker(void *cookie, u64 ip, u64 sp, u64 bp)
+{
+	struct bpf_throw_ctx *ctx = cookie;
+	struct bpf_prog *prog;
+
+	if (!is_bpf_text_address(ip))
+		return !ctx->cnt;
+	prog = bpf_prog_ksym_find(ip);
+	ctx->cnt++;
+	if (!prog->aux->id)
+		return true;
+	ctx->aux = prog->aux;
+	ctx->sp = sp;
+	ctx->bp = bp;
+	return false;
+}
+
+__bpf_kfunc void bpf_throw(u64 cookie)
+{
+	struct bpf_throw_ctx ctx = {};
+
+	arch_bpf_stack_walk(bpf_stack_walker, &ctx);
+	WARN_ON_ONCE(!ctx.aux);
+	if (ctx.aux)
+		WARN_ON_ONCE(!ctx.aux->exception_boundary);
+	WARN_ON_ONCE(!ctx.bp);
+	WARN_ON_ONCE(!ctx.cnt);
+	ctx.aux->bpf_exception_cb(cookie, ctx.sp, ctx.bp);
+}
+
 __diag_pop();
 
 BTF_SET8_START(generic_btf_ids)
@@ -2456,6 +2493,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
 #endif
 BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_throw)
 BTF_SET8_END(generic_btf_ids)
 
 static const struct btf_kfunc_id_set generic_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ed90e22d7600..2ac0be088dd5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -543,6 +543,7 @@ static bool is_dynptr_ref_function(enum bpf_func_id func_id)
 }
 
 static bool is_callback_calling_kfunc(u32 btf_id);
+static bool is_bpf_throw_kfunc(struct bpf_insn *insn);
 
 static bool is_callback_calling_function(enum bpf_func_id func_id)
 {
@@ -1748,7 +1749,9 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 		return -ENOMEM;
 	dst_state->jmp_history_cnt = src->jmp_history_cnt;
 
-	/* if dst has more stack frames then src frame, free them */
+	/* if dst has more stack frames then src frame, free them, this is also
+	 * necessary in case of exceptional exits using bpf_throw.
+	 */
 	for (i = src->curframe + 1; i <= dst_state->curframe; i++) {
 		free_func_state(dst_state->frame[i]);
 		dst_state->frame[i] = NULL;
@@ -2868,7 +2871,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
 		if (i == subprog_end - 1) {
 			/* to avoid fall-through from one subprog into another
 			 * the last insn of the subprog should be either exit
-			 * or unconditional jump back
+			 * or unconditional jump back or bpf_throw call
 			 */
 			if (code != (BPF_JMP | BPF_EXIT) &&
 			    code != (BPF_JMP32 | BPF_JA) &&
@@ -5645,6 +5648,27 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx)
 	for (; i < subprog_end; i++) {
 		int next_insn, sidx;
 
+		if (bpf_pseudo_kfunc_call(insn + i) && !insn[i].off) {
+			bool err = false;
+
+			if (!is_bpf_throw_kfunc(insn + i))
+				continue;
+			if (subprog[idx].is_cb)
+				err = true;
+			for (int c = 0; c < frame && !err; c++) {
+				if (subprog[ret_prog[c]].is_cb) {
+					err = true;
+					break;
+				}
+			}
+			if (!err)
+				continue;
+			verbose(env,
+				"bpf_throw kfunc (insn %d) cannot be called from callback subprog %d\n",
+				i, idx);
+			return -EINVAL;
+		}
+
 		if (!bpf_pseudo_call(insn + i) && !bpf_pseudo_func(insn + i))
 			continue;
 		/* remember insn and function to return to */
@@ -8905,6 +8929,7 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	 * callbacks
 	 */
 	if (set_callee_state_cb != set_callee_state) {
+		env->subprog_info[subprog].is_cb = true;
 		if (bpf_pseudo_kfunc_call(insn) &&
 		    !is_callback_calling_kfunc(insn->imm)) {
 			verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n",
@@ -9294,7 +9319,8 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 		verbose(env, "to caller at %d:\n", *insn_idx);
 		print_verifier_state(env, caller, true);
 	}
-	/* clear everything in the callee */
+	/* clear everything in the callee. In case of exceptional exits using
+	 * bpf_throw, this will be done by copy_verifier_state for extra frames. */
 	free_func_state(callee);
 	state->frame[state->curframe--] = NULL;
 	return 0;
@@ -9418,17 +9444,17 @@ record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
 	return 0;
 }
 
-static int check_reference_leak(struct bpf_verifier_env *env)
+static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exit)
 {
 	struct bpf_func_state *state = cur_func(env);
 	bool refs_lingering = false;
 	int i;
 
-	if (state->frameno && !state->in_callback_fn)
+	if (!exception_exit && state->frameno && !state->in_callback_fn)
 		return 0;
 
 	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->in_callback_fn && state->refs[i].callback_ref != state->frameno)
+		if (!exception_exit && state->in_callback_fn && state->refs[i].callback_ref != state->frameno)
 			continue;
 		verbose(env, "Unreleased reference id=%d alloc_insn=%d\n",
 			state->refs[i].id, state->refs[i].insn_idx);
@@ -9662,7 +9688,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 
 	switch (func_id) {
 	case BPF_FUNC_tail_call:
-		err = check_reference_leak(env);
+		err = check_reference_leak(env, false);
 		if (err) {
 			verbose(env, "tail_call would lead to reference leak\n");
 			return err;
@@ -10271,6 +10297,7 @@ enum special_kfunc_type {
 	KF_bpf_dynptr_slice,
 	KF_bpf_dynptr_slice_rdwr,
 	KF_bpf_dynptr_clone,
+	KF_bpf_throw,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -10291,6 +10318,7 @@ BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_dynptr_clone)
+BTF_ID(func, bpf_throw)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
@@ -10313,6 +10341,7 @@ BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_dynptr_clone)
+BTF_ID(func, bpf_throw)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -10626,6 +10655,12 @@ static bool is_callback_calling_kfunc(u32 btf_id)
 	return btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl];
 }
 
+static bool is_bpf_throw_kfunc(struct bpf_insn *insn)
+{
+	return bpf_pseudo_kfunc_call(insn) && insn->off == 0 &&
+	       insn->imm == special_kfunc_list[KF_bpf_throw];
+}
+
 static bool is_rbtree_lock_required_kfunc(u32 btf_id)
 {
 	return is_bpf_rbtree_api_kfunc(btf_id);
@@ -11400,6 +11435,15 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 	}
 
+	if (meta.func_id == special_kfunc_list[KF_bpf_throw]) {
+		if (!bpf_jit_supports_exceptions()) {
+			verbose(env, "JIT does not support calling kfunc %s#%d\n",
+				func_name, meta.func_id);
+			return -ENOTSUPP;
+		}
+		env->seen_exception = true;
+	}
+
 	for (i = 0; i < CALLER_SAVED_REGS; i++)
 		mark_reg_not_init(env, regs, caller_saved[i]);
 
@@ -14423,7 +14467,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	 * gen_ld_abs() may terminate the program at runtime, leading to
 	 * reference leak.
 	 */
-	err = check_reference_leak(env);
+	err = check_reference_leak(env, false);
 	if (err) {
 		verbose(env, "BPF_LD_[ABS|IND] cannot be mixed with socket references\n");
 		return err;
@@ -16437,6 +16481,7 @@ static int do_check(struct bpf_verifier_env *env)
 	int prev_insn_idx = -1;
 
 	for (;;) {
+		bool exception_exit = false;
 		struct bpf_insn *insn;
 		u8 class;
 		int err;
@@ -16651,12 +16696,17 @@ static int do_check(struct bpf_verifier_env *env)
 						return -EINVAL;
 					}
 				}
-				if (insn->src_reg == BPF_PSEUDO_CALL)
+				if (insn->src_reg == BPF_PSEUDO_CALL) {
 					err = check_func_call(env, insn, &env->insn_idx);
-				else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL)
+				} else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
 					err = check_kfunc_call(env, insn, &env->insn_idx);
-				else
+					if (!err && is_bpf_throw_kfunc(insn)) {
+						exception_exit = true;
+						goto process_bpf_exit_full;
+					}
+				} else {
 					err = check_helper_call(env, insn, &env->insn_idx);
+				}
 				if (err)
 					return err;
 
@@ -16686,7 +16736,7 @@ static int do_check(struct bpf_verifier_env *env)
 					verbose(env, "BPF_EXIT uses reserved fields\n");
 					return -EINVAL;
 				}
-
+process_bpf_exit_full:
 				if (env->cur_state->active_lock.ptr &&
 				    !in_rbtree_lock_required_cb(env)) {
 					verbose(env, "bpf_spin_unlock is missing\n");
@@ -16704,10 +16754,23 @@ static int do_check(struct bpf_verifier_env *env)
 				 * function, for which reference_state must
 				 * match caller reference state when it exits.
 				 */
-				err = check_reference_leak(env);
+				err = check_reference_leak(env, exception_exit);
 				if (err)
 					return err;
 
+				/* The side effect of the prepare_func_exit
+				 * which is being skipped is that it frees
+				 * bpf_func_state. Typically, process_bpf_exit
+				 * will only be hit with outermost exit.
+				 * copy_verifier_state in pop_stack will handle
+				 * freeing of any extra bpf_func_state left over
+				 * from not processing all nested function
+				 * exits. We also skip return code checks as
+				 * they are not needed for exceptional exits.
+				 */
+				if (exception_exit)
+					goto process_bpf_exit;
+
 				if (state->curframe) {
 					/* exit from nested function */
 					err = prepare_func_exit(env, &env->insn_idx);
@@ -18015,6 +18078,9 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		}
 		func[i]->aux->num_exentries = num_exentries;
 		func[i]->aux->tail_call_reachable = env->subprog_info[i].tail_call_reachable;
+		func[i]->aux->exception_cb = env->subprog_info[i].is_exception_cb;
+		if (!i)
+			func[i]->aux->exception_boundary = env->seen_exception;
 		func[i] = bpf_int_jit_compile(func[i]);
 		if (!func[i]->jited) {
 			err = -ENOTSUPP;
@@ -18103,6 +18169,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 	prog->aux->func = func;
 	prog->aux->func_cnt = env->subprog_cnt - env->hidden_subprog_cnt;
 	prog->aux->real_func_cnt = env->subprog_cnt;
+	prog->aux->bpf_exception_cb = (void *)func[env->exception_callback_subprog]->bpf_func;
+	prog->aux->exception_boundary = func[0]->aux->exception_boundary;
 	bpf_prog_jit_attempt_done(prog);
 	return 0;
 out_free:
@@ -18311,7 +18379,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 }
 
 /* The function requires that first instruction in 'patch' is insnsi[prog->len - 1] */
-static __maybe_unused int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *patch, int len)
+static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *patch, int len)
 {
 	struct bpf_subprog_info *info = env->subprog_info;
 	int cnt = env->subprog_cnt;
@@ -18355,6 +18423,26 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 	struct bpf_map *map_ptr;
 	int i, ret, cnt, delta = 0;
 
+	if (env->seen_exception && !env->exception_callback_subprog) {
+		struct bpf_insn patch[] = {
+			env->prog->insnsi[insn_cnt - 1],
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		};
+
+		ret = add_hidden_subprog(env, patch, ARRAY_SIZE(patch));
+		if (ret < 0)
+			return ret;
+		prog = env->prog;
+		insn = prog->insnsi;
+
+		env->exception_callback_subprog = env->subprog_cnt - 1;
+		/* Don't update insn_cnt, as invent_subprog always appends insns */
+		env->subprog_info[env->exception_callback_subprog].is_cb = true;
+		env->subprog_info[env->exception_callback_subprog].is_async_cb = true;
+		env->subprog_info[env->exception_callback_subprog].is_exception_cb = true;
+	}
+
 	for (i = 0; i < insn_cnt; i++, insn++) {
 		/* Make divide-by-zero exceptions impossible. */
 		if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 209811b1993a..952a40cbe09c 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -131,4 +131,20 @@ extern int bpf_rbtree_add_impl(struct bpf_rb_root *root, struct bpf_rb_node *nod
  */
 extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
 
+/* Description
+ *	Throw a BPF exception from the program, immediately terminating its
+ *	execution and unwinding the stack. The supplied 'cookie' parameter
+ *	will be the return value of the program when an exception is thrown.
+ *
+ *	Note that throwing an exception with lingering resources (locks,
+ *	references, etc.) will lead to a verification error.
+ *
+ *	Note that callbacks *cannot* call this helper.
+ * Returns
+ *	Never.
+ * Throws
+ *	An exception with the specified 'cookie' value.
+ */
+extern void bpf_throw(u64 cookie) __ksym;
+
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 04/14] bpf: Refactor check_btf_func and split into two phases
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2023-08-09 11:41 ` [PATCH bpf-next v2 03/14] bpf: Implement BPF exceptions Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-09 11:41 ` [PATCH bpf-next v2 05/14] bpf: Add support for custom exception callbacks Kumar Kartikeya Dwivedi
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

This patch splits the check_btf_info's check_btf_func check into two
separate phases.  The first phase sets up the BTF and prepares
func_info, but does not perform any validation of required invariants
for subprogs just yet. This is left to the second phase, which happens
where check_btf_info executes currently, and performs the line_info and
CO-RE relocation.

The reason to perform this split is to obtain the userspace supplied
func_info information before we perform the add_subprog call, where we
would now require finding and adding subprogs that may not have a
bpf_pseudo_call or bpf_pseudo_func instruction in the program.

We require this as we want to enable userspace to supply exception
callbacks that can override the default hidden subprogram generated by
the verifier (which performs a hardcoded action). In such a case, the
exception callback may never be referenced in an instruction, but will
still be suitably annotated (by way of BTF declaration tags). For
finding this exception callback, we would require the program's BTF
information, and the supplied func_info information which maps BTF type
IDs to subprograms.

Since the exception callback won't actually be referenced through
instructions, later checks in check_cfg and do_check_subprogs will not
verify the subprog. This means that add_subprog needs to add them in the
add_subprog_and_kfunc phase before we move forward, which is why the BTF
and func_info are required at that point.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 128 +++++++++++++++++++++++++++++++++---------
 1 file changed, 100 insertions(+), 28 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2ac0be088dd5..d0f6c984272b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -15013,20 +15013,18 @@ static int check_abnormal_return(struct bpf_verifier_env *env)
 #define MIN_BPF_FUNCINFO_SIZE	8
 #define MAX_FUNCINFO_REC_SIZE	252
 
-static int check_btf_func(struct bpf_verifier_env *env,
-			  const union bpf_attr *attr,
-			  bpfptr_t uattr)
+static int check_btf_func_early(struct bpf_verifier_env *env,
+				const union bpf_attr *attr,
+				bpfptr_t uattr)
 {
-	const struct btf_type *type, *func_proto, *ret_type;
-	u32 i, nfuncs, urec_size, min_size;
 	u32 krec_size = sizeof(struct bpf_func_info);
+	const struct btf_type *type, *func_proto;
+	u32 i, nfuncs, urec_size, min_size;
 	struct bpf_func_info *krecord;
-	struct bpf_func_info_aux *info_aux = NULL;
 	struct bpf_prog *prog;
 	const struct btf *btf;
-	bpfptr_t urecord;
 	u32 prev_offset = 0;
-	bool scalar_return;
+	bpfptr_t urecord;
 	int ret = -ENOMEM;
 
 	nfuncs = attr->func_info_cnt;
@@ -15036,11 +15034,6 @@ static int check_btf_func(struct bpf_verifier_env *env,
 		return 0;
 	}
 
-	if (nfuncs != env->subprog_cnt) {
-		verbose(env, "number of funcs in func_info doesn't match number of subprogs\n");
-		return -EINVAL;
-	}
-
 	urec_size = attr->func_info_rec_size;
 	if (urec_size < MIN_BPF_FUNCINFO_SIZE ||
 	    urec_size > MAX_FUNCINFO_REC_SIZE ||
@@ -15058,9 +15051,6 @@ static int check_btf_func(struct bpf_verifier_env *env,
 	krecord = kvcalloc(nfuncs, krec_size, GFP_KERNEL | __GFP_NOWARN);
 	if (!krecord)
 		return -ENOMEM;
-	info_aux = kcalloc(nfuncs, sizeof(*info_aux), GFP_KERNEL | __GFP_NOWARN);
-	if (!info_aux)
-		goto err_free;
 
 	for (i = 0; i < nfuncs; i++) {
 		ret = bpf_check_uarg_tail_zero(urecord, krec_size, urec_size);
@@ -15099,11 +15089,6 @@ static int check_btf_func(struct bpf_verifier_env *env,
 			goto err_free;
 		}
 
-		if (env->subprog_info[i].start != krecord[i].insn_off) {
-			verbose(env, "func_info BTF section doesn't match subprog layout in BPF program\n");
-			goto err_free;
-		}
-
 		/* check type_id */
 		type = btf_type_by_id(btf, krecord[i].type_id);
 		if (!type || !btf_type_is_func(type)) {
@@ -15111,12 +15096,80 @@ static int check_btf_func(struct bpf_verifier_env *env,
 				krecord[i].type_id);
 			goto err_free;
 		}
-		info_aux[i].linkage = BTF_INFO_VLEN(type->info);
 
 		func_proto = btf_type_by_id(btf, type->type);
 		if (unlikely(!func_proto || !btf_type_is_func_proto(func_proto)))
 			/* btf_func_check() already verified it during BTF load */
 			goto err_free;
+
+		prev_offset = krecord[i].insn_off;
+		bpfptr_add(&urecord, urec_size);
+	}
+
+	prog->aux->func_info = krecord;
+	prog->aux->func_info_cnt = nfuncs;
+	return 0;
+
+err_free:
+	kvfree(krecord);
+	return ret;
+}
+
+static int check_btf_func(struct bpf_verifier_env *env,
+			  const union bpf_attr *attr,
+			  bpfptr_t uattr)
+{
+	const struct btf_type *type, *func_proto, *ret_type;
+	u32 i, nfuncs, urec_size, min_size;
+	u32 krec_size = sizeof(struct bpf_func_info);
+	struct bpf_func_info *krecord;
+	struct bpf_func_info_aux *info_aux = NULL;
+	struct bpf_prog *prog;
+	const struct btf *btf;
+	bpfptr_t urecord;
+	u32 prev_offset = 0;
+	bool scalar_return;
+	int ret = -ENOMEM;
+
+	nfuncs = attr->func_info_cnt;
+	if (!nfuncs) {
+		if (check_abnormal_return(env))
+			return -EINVAL;
+		return 0;
+	}
+	if (nfuncs != env->subprog_cnt) {
+		verbose(env, "number of funcs in func_info doesn't match number of subprogs\n");
+		return -EINVAL;
+	}
+
+	urec_size = attr->func_info_rec_size;
+
+	prog = env->prog;
+	btf = prog->aux->btf;
+
+	urecord = make_bpfptr(attr->func_info, uattr.is_kernel);
+	min_size = min_t(u32, krec_size, urec_size);
+
+	krecord = prog->aux->func_info;
+	info_aux = kcalloc(nfuncs, sizeof(*info_aux), GFP_KERNEL | __GFP_NOWARN);
+	if (!info_aux)
+		return -ENOMEM;
+
+	for (i = 0; i < nfuncs; i++) {
+		/* check insn_off */
+		ret = -EINVAL;
+
+		if (env->subprog_info[i].start != krecord[i].insn_off) {
+			verbose(env, "func_info BTF section doesn't match subprog layout in BPF program\n");
+			goto err_free;
+		}
+
+		/* Already checked type_id */
+		type = btf_type_by_id(btf, krecord[i].type_id);
+		info_aux[i].linkage = BTF_INFO_VLEN(type->info);
+		/* Already checked func_proto */
+		func_proto = btf_type_by_id(btf, type->type);
+
 		ret_type = btf_type_skip_modifiers(btf, func_proto->type, NULL);
 		scalar_return =
 			btf_type_is_small_int(ret_type) || btf_is_any_enum(ret_type);
@@ -15133,13 +15186,10 @@ static int check_btf_func(struct bpf_verifier_env *env,
 		bpfptr_add(&urecord, urec_size);
 	}
 
-	prog->aux->func_info = krecord;
-	prog->aux->func_info_cnt = nfuncs;
 	prog->aux->func_info_aux = info_aux;
 	return 0;
 
 err_free:
-	kvfree(krecord);
 	kfree(info_aux);
 	return ret;
 }
@@ -15357,9 +15407,9 @@ static int check_core_relo(struct bpf_verifier_env *env,
 	return err;
 }
 
-static int check_btf_info(struct bpf_verifier_env *env,
-			  const union bpf_attr *attr,
-			  bpfptr_t uattr)
+static int check_btf_info_early(struct bpf_verifier_env *env,
+				const union bpf_attr *attr,
+				bpfptr_t uattr)
 {
 	struct btf *btf;
 	int err;
@@ -15379,6 +15429,24 @@ static int check_btf_info(struct bpf_verifier_env *env,
 	}
 	env->prog->aux->btf = btf;
 
+	err = check_btf_func_early(env, attr, uattr);
+	if (err)
+		return err;
+	return 0;
+}
+
+static int check_btf_info(struct bpf_verifier_env *env,
+			  const union bpf_attr *attr,
+			  bpfptr_t uattr)
+{
+	int err;
+
+	if (!attr->func_info_cnt && !attr->line_info_cnt) {
+		if (check_abnormal_return(env))
+			return -EINVAL;
+		return 0;
+	}
+
 	err = check_btf_func(env, attr, uattr);
 	if (err)
 		return err;
@@ -19842,6 +19910,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	if (!env->explored_states)
 		goto skip_full_check;
 
+	ret = check_btf_info_early(env, attr, uattr);
+	if (ret < 0)
+		goto skip_full_check;
+
 	ret = add_subprog_and_kfunc(env);
 	if (ret < 0)
 		goto skip_full_check;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 05/14] bpf: Add support for custom exception callbacks
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2023-08-09 11:41 ` [PATCH bpf-next v2 04/14] bpf: Refactor check_btf_func and split into two phases Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-28 22:11   ` Martin KaFai Lau
  2023-08-09 11:41 ` [PATCH bpf-next v2 06/14] bpf: Perform CFG walk for exception callback Kumar Kartikeya Dwivedi
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

By default, the subprog generated by the verifier to handle a thrown
exception hardcodes a return value of 0. To allow user-defined logic
and modification of the return value when an exception is thrown,
introduce the 'exception_callback:' declaration tag, which marks a
callback as the default exception handler for the program.

The format of the declaration tag is 'exception_callback:<value>', where
<value> is the name of the exception callback. Each main program can be
tagged using this BTF declaratiion tag to associate it with an exception
callback. In case the tag is absent, the default callback is used.

As such, the exception callback cannot be modified at runtime, only set
during verification.

Allowing modification of the callback for the current program execution
at runtime leads to issues when the programs begin to nest, as any
per-CPU state maintaing this information will have to be saved and
restored. We don't want it to stay in bpf_prog_aux as this takes a
global effect for all programs. An alternative solution is spilling
the callback pointer at a known location on the program stack on entry,
and then passing this location to bpf_throw as a parameter.

However, since exceptions are geared more towards a use case where they
are ideally never invoked, optimizing for this use case and adding to
the complexity has diminishing returns.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                           |   4 +-
 include/linux/bpf_verifier.h                  |   1 +
 kernel/bpf/btf.c                              |  29 +++--
 kernel/bpf/verifier.c                         | 118 ++++++++++++++++--
 .../testing/selftests/bpf/bpf_experimental.h  |  31 ++++-
 5 files changed, 165 insertions(+), 18 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e938e75b0998..2125d77ce2e1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2364,9 +2364,11 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
 int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
 			   struct bpf_reg_state *regs);
 int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
-			  struct bpf_reg_state *reg);
+			  struct bpf_reg_state *reg, bool is_ex_cb);
 int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,
 			 struct btf *btf, const struct btf_type *t);
+const char *btf_find_decl_tag_value(const struct btf *btf, const struct btf_type *pt,
+				    int comp_idx, const char *tag_key);
 
 struct bpf_prog *bpf_prog_by_id(u32 id);
 struct bpf_link *bpf_link_by_id(u32 id);
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 9e6c25ecac9f..801ada8e614e 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -300,6 +300,7 @@ struct bpf_func_state {
 	bool in_callback_fn;
 	struct tnum callback_ret_range;
 	bool in_async_callback_fn;
+	bool in_exception_callback_fn;
 
 	/* The following fields should be last. See copy_func_state() */
 	int acquired_refs;
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 249657c466dd..8da0eac3dcbd 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3308,10 +3308,10 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 	return BTF_FIELD_FOUND;
 }
 
-static const char *btf_find_decl_tag_value(const struct btf *btf,
-					   const struct btf_type *pt,
-					   int comp_idx, const char *tag_key)
+const char *btf_find_decl_tag_value(const struct btf *btf, const struct btf_type *pt,
+				    int comp_idx, const char *tag_key)
 {
+	const char *value = NULL;
 	int i;
 
 	for (i = 1; i < btf_nr_types(btf); i++) {
@@ -3325,9 +3325,14 @@ static const char *btf_find_decl_tag_value(const struct btf *btf,
 			continue;
 		if (strncmp(__btf_name_by_offset(btf, t->name_off), tag_key, len))
 			continue;
-		return __btf_name_by_offset(btf, t->name_off) + len;
+		/* Prevent duplicate entries for same type */
+		if (value)
+			return ERR_PTR(-EEXIST);
+		value = __btf_name_by_offset(btf, t->name_off) + len;
 	}
-	return NULL;
+	if (!value)
+		return ERR_PTR(-ENOENT);
+	return value;
 }
 
 static int
@@ -3345,7 +3350,7 @@ btf_find_graph_root(const struct btf *btf, const struct btf_type *pt,
 	if (t->size != sz)
 		return BTF_FIELD_IGNORE;
 	value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:");
-	if (!value_type)
+	if (IS_ERR(value_type))
 		return -EINVAL;
 	node_field_name = strstr(value_type, ":");
 	if (!node_field_name)
@@ -6949,7 +6954,7 @@ int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
  * (either PTR_TO_CTX or SCALAR_VALUE).
  */
 int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
-			  struct bpf_reg_state *regs)
+			  struct bpf_reg_state *regs, bool is_ex_cb)
 {
 	struct bpf_verifier_log *log = &env->log;
 	struct bpf_prog *prog = env->prog;
@@ -7006,7 +7011,7 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
 			tname, nargs, MAX_BPF_FUNC_REG_ARGS);
 		return -EINVAL;
 	}
-	/* check that function returns int */
+	/* check that function returns int, exception cb also requires this */
 	t = btf_type_by_id(btf, t->type);
 	while (btf_type_is_modifier(t))
 		t = btf_type_by_id(btf, t->type);
@@ -7055,6 +7060,14 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
 			i, btf_type_str(t), tname);
 		return -EINVAL;
 	}
+	/* We have already ensured that the callback returns an integer, just
+	 * like all global subprogs. We need to determine it only has a single
+	 * scalar argument.
+	 */
+	if (is_ex_cb && (nargs != 1 || regs[BPF_REG_1].type != SCALAR_VALUE)) {
+		bpf_log(log, "exception cb only supports single integer argument\n");
+		return -EINVAL;
+	}
 	return 0;
 }
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d0f6c984272b..9d67d0633c59 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2457,6 +2457,73 @@ static int add_subprog(struct bpf_verifier_env *env, int off)
 	return env->subprog_cnt - 1;
 }
 
+static int bpf_find_exception_callback_insn_off(struct bpf_verifier_env *env)
+{
+	struct bpf_prog_aux *aux = env->prog->aux;
+	struct btf *btf = aux->btf;
+	const struct btf_type *t;
+	const char *name;
+	u32 main_btf_id;
+	int ret, i, j;
+
+	/* Non-zero func_info_cnt implies valid btf */
+	if (!aux->func_info_cnt)
+		return 0;
+	main_btf_id = aux->func_info[0].type_id;
+
+	t = btf_type_by_id(btf, main_btf_id);
+	if (!t) {
+		verbose(env, "invalid btf id for main subprog in func_info\n");
+		return -EINVAL;
+	}
+
+	name = btf_find_decl_tag_value(btf, t, -1, "exception_callback:");
+	if (IS_ERR(name)) {
+		ret = PTR_ERR(name);
+		/* If there is no tag present, there is no exception callback */
+		if (ret == -ENOENT)
+			ret = 0;
+		else if (ret == -EEXIST)
+			verbose(env, "multiple exception callback tags for main subprog\n");
+		return ret;
+	}
+
+	ret = -ENOENT;
+	for (i = 0; i < btf_nr_types(btf); i++) {
+		t = btf_type_by_id(btf, i);
+		if (!btf_type_is_func(t))
+			continue;
+		if (strcmp(name, btf_name_by_offset(btf, t->name_off)))
+			continue;
+		if (btf_func_linkage(t) != BTF_FUNC_GLOBAL) {
+			verbose(env, "exception callback '%s' must have global linkage\n", name);
+			return -EINVAL;
+		}
+
+		ret = 0;
+		for (j = 0; j < aux->func_info_cnt; j++) {
+			if (aux->func_info[j].type_id != i)
+				continue;
+			ret = aux->func_info[j].insn_off;
+			/* Further func_info and subprog checks will also happen
+			 * later, so assume this is the right insn_off for now.
+			 */
+			if (!ret) {
+				verbose(env, "invalid exception callback insn_off in func_info: 0\n");
+				ret = -EINVAL;
+			}
+		}
+		if (!ret) {
+			verbose(env, "exception callback type id not found in func_info\n");
+			ret = -EINVAL;
+		}
+		break;
+	}
+	if (ret == -ENOENT)
+		verbose(env, "exception callback '%s' could not be found in BTF\n", name);
+	return ret;
+}
+
 #define MAX_KFUNC_DESCS 256
 #define MAX_KFUNC_BTFS	256
 
@@ -2796,8 +2863,8 @@ bpf_jit_find_kfunc_model(const struct bpf_prog *prog,
 static int add_subprog_and_kfunc(struct bpf_verifier_env *env)
 {
 	struct bpf_subprog_info *subprog = env->subprog_info;
+	int i, ret, insn_cnt = env->prog->len, ex_cb_insn;
 	struct bpf_insn *insn = env->prog->insnsi;
-	int i, ret, insn_cnt = env->prog->len;
 
 	/* Add entry function. */
 	ret = add_subprog(env, 0);
@@ -2823,6 +2890,26 @@ static int add_subprog_and_kfunc(struct bpf_verifier_env *env)
 			return ret;
 	}
 
+	ret = bpf_find_exception_callback_insn_off(env);
+	if (ret < 0)
+		return ret;
+	ex_cb_insn = ret;
+
+	/* If ex_cb_insn > 0, this means that the main program has a subprog
+	 * marked using BTF decl tag to serve as the exception callback.
+	 */
+	if (ex_cb_insn) {
+		ret = add_subprog(env, ex_cb_insn);
+		if (ret < 0)
+			return ret;
+		for (i = 1; i < env->subprog_cnt; i++) {
+			if (env->subprog_info[i].start != ex_cb_insn)
+				continue;
+			env->exception_callback_subprog = i;
+			break;
+		}
+	}
+
 	/* Add a fake 'exit' subprog which could simplify subprog iteration
 	 * logic. 'subprog_cnt' should not be increased.
 	 */
@@ -5691,6 +5778,10 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx)
 			/* async callbacks don't increase bpf prog stack size unless called directly */
 			if (!bpf_pseudo_call(insn + i))
 				continue;
+			if (subprog[sidx].is_exception_cb) {
+				verbose(env, "insn %d cannot call exception cb directly\n", i);
+				return -EINVAL;
+			}
 		}
 		i = next_insn;
 		idx = sidx;
@@ -5712,8 +5803,13 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx)
 	 * tail call counter throughout bpf2bpf calls combined with tailcalls
 	 */
 	if (tail_call_reachable)
-		for (j = 0; j < frame; j++)
+		for (j = 0; j < frame; j++) {
+			if (subprog[ret_prog[j]].is_exception_cb) {
+				verbose(env, "cannot tail call within exception cb\n");
+				return -EINVAL;
+			}
 			subprog[ret_prog[j]].tail_call_reachable = true;
+		}
 	if (subprog[0].tail_call_reachable)
 		env->prog->aux->tail_call_reachable = true;
 
@@ -14528,7 +14624,7 @@ static int check_return_code(struct bpf_verifier_env *env)
 	const bool is_subprog = frame->subprogno;
 
 	/* LSM and struct_ops func-ptr's return type could be "void" */
-	if (!is_subprog) {
+	if (!is_subprog || frame->in_exception_callback_fn) {
 		switch (prog_type) {
 		case BPF_PROG_TYPE_LSM:
 			if (prog->expected_attach_type == BPF_LSM_CGROUP)
@@ -14576,7 +14672,7 @@ static int check_return_code(struct bpf_verifier_env *env)
 		return 0;
 	}
 
-	if (is_subprog) {
+	if (is_subprog && !frame->in_exception_callback_fn) {
 		if (reg->type != SCALAR_VALUE) {
 			verbose(env, "At subprogram exit the register R0 is not a scalar value (%s)\n",
 				reg_type_str(env, reg->type));
@@ -19189,7 +19285,7 @@ static void free_states(struct bpf_verifier_env *env)
 	}
 }
 
-static int do_check_common(struct bpf_verifier_env *env, int subprog)
+static int do_check_common(struct bpf_verifier_env *env, int subprog, bool is_ex_cb)
 {
 	bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
 	struct bpf_verifier_state *state;
@@ -19220,7 +19316,7 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog)
 
 	regs = state->frame[state->curframe]->regs;
 	if (subprog || env->prog->type == BPF_PROG_TYPE_EXT) {
-		ret = btf_prepare_func_args(env, subprog, regs);
+		ret = btf_prepare_func_args(env, subprog, regs, is_ex_cb);
 		if (ret)
 			goto out;
 		for (i = BPF_REG_1; i <= BPF_REG_5; i++) {
@@ -19236,6 +19332,12 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog)
 				regs[i].id = ++env->id_gen;
 			}
 		}
+		if (is_ex_cb) {
+			state->frame[0]->in_exception_callback_fn = true;
+			env->subprog_info[subprog].is_cb = true;
+			env->subprog_info[subprog].is_async_cb = true;
+			env->subprog_info[subprog].is_exception_cb = true;
+		}
 	} else {
 		/* 1st arg to a function */
 		regs[BPF_REG_1].type = PTR_TO_CTX;
@@ -19300,7 +19402,7 @@ static int do_check_subprogs(struct bpf_verifier_env *env)
 			continue;
 		env->insn_idx = env->subprog_info[i].start;
 		WARN_ON_ONCE(env->insn_idx == 0);
-		ret = do_check_common(env, i);
+		ret = do_check_common(env, i, env->exception_callback_subprog == i);
 		if (ret) {
 			return ret;
 		} else if (env->log.level & BPF_LOG_LEVEL) {
@@ -19317,7 +19419,7 @@ static int do_check_main(struct bpf_verifier_env *env)
 	int ret;
 
 	env->insn_idx = 0;
-	ret = do_check_common(env, 0);
+	ret = do_check_common(env, 0, false);
 	if (!ret)
 		env->prog->aux->stack_depth = env->subprog_info[0].stack_depth;
 	return ret;
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 952a40cbe09c..612ac86873af 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -134,7 +134,16 @@ extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
 /* Description
  *	Throw a BPF exception from the program, immediately terminating its
  *	execution and unwinding the stack. The supplied 'cookie' parameter
- *	will be the return value of the program when an exception is thrown.
+ *	will be the return value of the program when an exception is thrown,
+ *	and the default exception callback is used. Otherwise, if an exception
+ *	callback is set using the '__exception_cb(callback)' declaration tag
+ *	on the main program, the 'cookie' parameter will be the callback's only
+ *	input argument.
+ *
+ *	Thus, in case of default exception callback, 'cookie' is subjected to
+ *	constraints on the program's return value (as with R0 on exit).
+ *	Otherwise, the return value of the marked exception callback will be
+ *	subjected to the same checks.
  *
  *	Note that throwing an exception with lingering resources (locks,
  *	references, etc.) will lead to a verification error.
@@ -147,4 +156,24 @@ extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
  */
 extern void bpf_throw(u64 cookie) __ksym;
 
+/* This macro must be used to mark the exception callback corresponding to the
+ * main program. For example:
+ *
+ * int exception_cb(u64 cookie) {
+ *	return cookie;
+ * }
+ *
+ * SEC("tc")
+ * __exception_cb(exception_cb)
+ * int main_prog(struct __sk_buff *ctx) {
+ *	...
+ *	return TC_ACT_OK;
+ * }
+ *
+ * Here, exception callback for the main program will be 'exception_cb'. Note
+ * that this attribute can only be used once, and multiple exception callbacks
+ * specified for the main program will lead to verification error.
+ */
+#define __exception_cb(name) __attribute__((btf_decl_tag("exception_callback:" #name)))
+
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 06/14] bpf: Perform CFG walk for exception callback
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2023-08-09 11:41 ` [PATCH bpf-next v2 05/14] bpf: Add support for custom exception callbacks Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-09 11:41 ` [PATCH bpf-next v2 07/14] bpf: Treat first argument as return value for bpf_throw Kumar Kartikeya Dwivedi
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

Since exception callbacks are not referenced using bpf_pseudo_func and
bpf_pseudo_call instructions, check_cfg traversal will never explore
instructions of the exception callback. Even after adding the subprog,
the program will then fail with a 'unreachable insn' error.

We thus need to begin walking from the start of the exception callback
again in check_cfg after a complete CFG traversal finishes, so as to
explore the CFG rooted at the exception callback.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9d67d0633c59..c22ba0423d27 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -15029,8 +15029,8 @@ static int check_cfg(struct bpf_verifier_env *env)
 {
 	int insn_cnt = env->prog->len;
 	int *insn_stack, *insn_state;
-	int ret = 0;
-	int i;
+	int ex_insn_beg, i, ret = 0;
+	bool ex_done = false;
 
 	insn_state = env->cfg.insn_state = kvcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
 	if (!insn_state)
@@ -15046,6 +15046,7 @@ static int check_cfg(struct bpf_verifier_env *env)
 	insn_stack[0] = 0; /* 0 is the first instruction */
 	env->cfg.cur_stack = 1;
 
+walk_cfg:
 	while (env->cfg.cur_stack > 0) {
 		int t = insn_stack[env->cfg.cur_stack - 1];
 
@@ -15072,6 +15073,16 @@ static int check_cfg(struct bpf_verifier_env *env)
 		goto err_free;
 	}
 
+	if (env->exception_callback_subprog && !ex_done) {
+		ex_insn_beg = env->subprog_info[env->exception_callback_subprog].start;
+
+		insn_state[ex_insn_beg] = DISCOVERED;
+		insn_stack[0] = ex_insn_beg;
+		env->cfg.cur_stack = 1;
+		ex_done = true;
+		goto walk_cfg;
+	}
+
 	for (i = 0; i < insn_cnt; i++) {
 		if (insn_state[i] != EXPLORED) {
 			verbose(env, "unreachable insn %d\n", i);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 07/14] bpf: Treat first argument as return value for bpf_throw
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2023-08-09 11:41 ` [PATCH bpf-next v2 06/14] bpf: Perform CFG walk for exception callback Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-09 11:41 ` [PATCH bpf-next v2 08/14] bpf: Prevent KASAN false positive with bpf_throw Kumar Kartikeya Dwivedi
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

In case of the default exception callback, change the behavior of
bpf_throw, where the passed cookie value is no longer ignored, but
is instead the return value of the default exception callback. As
such, we need to place restrictions on the value being passed into
bpf_throw in such a case, only allowing those permitted by the
check_return_code function.

Thus, bpf_throw can now control the return value of the program from
each call site without having the user install a custom exception
callback just to override the return value when an exception is thrown.

We also modify the hidden subprog instructions to now move BPF_REG_1 to
BPF_REG_0, so as to set the return value before exit in the default
callback.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c22ba0423d27..a0e1a1d1f5d3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11415,6 +11415,8 @@ static int fetch_kfunc_meta(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int check_return_code(struct bpf_verifier_env *env, int regno);
+
 static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    int *insn_idx_p)
 {
@@ -11538,6 +11540,15 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			return -ENOTSUPP;
 		}
 		env->seen_exception = true;
+
+		/* In the case of the default callback, the cookie value passed
+		 * to bpf_throw becomes the return value of the program.
+		 */
+		if (!env->exception_callback_subprog) {
+			err = check_return_code(env, BPF_REG_1);
+			if (err < 0)
+				return err;
+		}
 	}
 
 	for (i = 0; i < CALLER_SAVED_REGS; i++)
@@ -14612,7 +14623,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	return 0;
 }
 
-static int check_return_code(struct bpf_verifier_env *env)
+static int check_return_code(struct bpf_verifier_env *env, int regno)
 {
 	struct tnum enforce_attach_type_range = tnum_unknown;
 	const struct bpf_prog *prog = env->prog;
@@ -14646,22 +14657,22 @@ static int check_return_code(struct bpf_verifier_env *env)
 	 * of bpf_exit, which means that program wrote
 	 * something into it earlier
 	 */
-	err = check_reg_arg(env, BPF_REG_0, SRC_OP);
+	err = check_reg_arg(env, regno, SRC_OP);
 	if (err)
 		return err;
 
-	if (is_pointer_value(env, BPF_REG_0)) {
-		verbose(env, "R0 leaks addr as return value\n");
+	if (is_pointer_value(env, regno)) {
+		verbose(env, "R%d leaks addr as return value\n", regno);
 		return -EACCES;
 	}
 
-	reg = cur_regs(env) + BPF_REG_0;
+	reg = cur_regs(env) + regno;
 
 	if (frame->in_async_callback_fn) {
 		/* enforce return zero from async callbacks like timer */
 		if (reg->type != SCALAR_VALUE) {
-			verbose(env, "In async callback the register R0 is not a known value (%s)\n",
-				reg_type_str(env, reg->type));
+			verbose(env, "In async callback the register R%d is not a known value (%s)\n",
+				regno, reg_type_str(env, reg->type));
 			return -EINVAL;
 		}
 
@@ -14674,8 +14685,8 @@ static int check_return_code(struct bpf_verifier_env *env)
 
 	if (is_subprog && !frame->in_exception_callback_fn) {
 		if (reg->type != SCALAR_VALUE) {
-			verbose(env, "At subprogram exit the register R0 is not a scalar value (%s)\n",
-				reg_type_str(env, reg->type));
+			verbose(env, "At subprogram exit the register R%d is not a scalar value (%s)\n",
+				regno, reg_type_str(env, reg->type));
 			return -EINVAL;
 		}
 		return 0;
@@ -14757,8 +14768,8 @@ static int check_return_code(struct bpf_verifier_env *env)
 	}
 
 	if (reg->type != SCALAR_VALUE) {
-		verbose(env, "At program exit the register R0 is not a known value (%s)\n",
-			reg_type_str(env, reg->type));
+		verbose(env, "At program exit the register R%d is not a known value (%s)\n",
+			regno, reg_type_str(env, reg->type));
 		return -EINVAL;
 	}
 
@@ -16955,7 +16966,7 @@ static int do_check(struct bpf_verifier_env *env)
 					continue;
 				}
 
-				err = check_return_code(env);
+				err = check_return_code(env, BPF_REG_0);
 				if (err)
 					return err;
 process_bpf_exit:
@@ -18601,7 +18612,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 	if (env->seen_exception && !env->exception_callback_subprog) {
 		struct bpf_insn patch[] = {
 			env->prog->insnsi[insn_cnt - 1],
-			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_1),
 			BPF_EXIT_INSN(),
 		};
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 08/14] bpf: Prevent KASAN false positive with bpf_throw
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (6 preceding siblings ...)
  2023-08-09 11:41 ` [PATCH bpf-next v2 07/14] bpf: Treat first argument as return value for bpf_throw Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-22 16:23   ` Alexei Starovoitov
  2023-08-30 16:53   ` Andrey Konovalov
  2023-08-09 11:41 ` [PATCH bpf-next v2 09/14] bpf: Detect IP == ksym.end as part of BPF program Kumar Kartikeya Dwivedi
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song,
	David Vernet

The KASAN stack instrumentation when CONFIG_KASAN_STACK is true poisons
the stack of a function when it is entered and unpoisons it when
leaving. However, in the case of bpf_throw, we will never return as we
switch our stack frame to the BPF exception callback. Later, this
discrepancy will lead to confusing KASAN splats when kernel resumes
execution on return from the BPF program.

Fix this by unpoisoning everything below the stack pointer of the BPF
program, which should cover the range that would not be unpoisoned. An
example splat is below:

BUG: KASAN: stack-out-of-bounds in stack_trace_consume_entry+0x14e/0x170
Write of size 8 at addr ffffc900013af958 by task test_progs/227

CPU: 0 PID: 227 Comm: test_progs Not tainted 6.5.0-rc2-g43f1c6c9052a-dirty #26
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-2.fc39 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x4a/0x80
 print_report+0xcf/0x670
 ? arch_stack_walk+0x79/0x100
 kasan_report+0xda/0x110
 ? stack_trace_consume_entry+0x14e/0x170
 ? stack_trace_consume_entry+0x14e/0x170
 ? __pfx_stack_trace_consume_entry+0x10/0x10
 stack_trace_consume_entry+0x14e/0x170
 ? __sys_bpf+0xf2e/0x41b0
 arch_stack_walk+0x8b/0x100
 ? __sys_bpf+0xf2e/0x41b0
 ? bpf_prog_test_run_skb+0x341/0x1c70
 ? bpf_prog_test_run_skb+0x341/0x1c70
 stack_trace_save+0x9b/0xd0
 ? __pfx_stack_trace_save+0x10/0x10
 ? __kasan_slab_free+0x109/0x180
 ? bpf_prog_test_run_skb+0x341/0x1c70
 ? __sys_bpf+0xf2e/0x41b0
 ? __x64_sys_bpf+0x78/0xc0
 ? do_syscall_64+0x3c/0x90
 ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
 kasan_save_stack+0x33/0x60
 ? kasan_save_stack+0x33/0x60
 ? kasan_set_track+0x25/0x30
 ? kasan_save_free_info+0x2b/0x50
 ? __kasan_slab_free+0x109/0x180
 ? kmem_cache_free+0x191/0x460
 ? bpf_prog_test_run_skb+0x341/0x1c70
 kasan_set_track+0x25/0x30
 kasan_save_free_info+0x2b/0x50
 __kasan_slab_free+0x109/0x180
 kmem_cache_free+0x191/0x460
 bpf_prog_test_run_skb+0x341/0x1c70
 ? __pfx_bpf_prog_test_run_skb+0x10/0x10
 ? __fget_light+0x51/0x220
 __sys_bpf+0xf2e/0x41b0
 ? __might_fault+0xa2/0x170
 ? __pfx___sys_bpf+0x10/0x10
 ? lock_release+0x1de/0x620
 ? __might_fault+0xcd/0x170
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
 __x64_sys_bpf+0x78/0xc0
 ? syscall_enter_from_user_mode+0x20/0x50
 do_syscall_64+0x3c/0x90
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7f0fbb38880d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d
89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f3 45 12 00 f7 d8 64
89 01 48
RSP: 002b:00007ffe13907de8 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 00007ffe13908708 RCX: 00007f0fbb38880d
RDX: 0000000000000050 RSI: 00007ffe13907e20 RDI: 000000000000000a
RBP: 00007ffe13907e00 R08: 0000000000000000 R09: 00007ffe13907e20
R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000003
R13: 0000000000000000 R14: 00007f0fbb532000 R15: 0000000000cfbd90
 </TASK>

The buggy address belongs to stack of task test_progs/227
KASAN internal error: frame info validation failed; invalid marker: 0

The buggy address belongs to the virtual mapping at
 [ffffc900013a8000, ffffc900013b1000) created by:
 kernel_clone+0xcd/0x600

The buggy address belongs to the physical page:
page:00000000b70f4332 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11418f
flags: 0x2fffe0000000000(node=0|zone=2|lastcpupid=0x7fff)
page_type: 0xffffffff()
raw: 02fffe0000000000 0000000000000000 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffffc900013af800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffc900013af880: 00 00 00 f1 f1 f1 f1 00 00 00 f3 f3 f3 f3 f3 00
>ffffc900013af900: 00 00 00 00 00 00 00 00 00 00 00 f1 00 00 00 00
                                                    ^
 ffffc900013af980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffc900013afa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint

Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/kasan.h | 2 ++
 kernel/bpf/helpers.c  | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 819b6bc8ac08..7a463f814db2 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -283,8 +283,10 @@ static inline bool kasan_check_byte(const void *address)
 
 #if defined(CONFIG_KASAN) && defined(CONFIG_KASAN_STACK)
 void kasan_unpoison_task_stack(struct task_struct *task);
+asmlinkage void kasan_unpoison_task_stack_below(const void *watermark);
 #else
 static inline void kasan_unpoison_task_stack(struct task_struct *task) {}
+static inline void kasan_unpoison_task_stack_below(const void *watermark) {}
 #endif
 
 #ifdef CONFIG_KASAN_GENERIC
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index af4add1e3a31..64a07232c58f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -22,6 +22,7 @@
 #include <linux/security.h>
 #include <linux/btf_ids.h>
 #include <linux/bpf_mem_alloc.h>
+#include <linux/kasan.h>
 
 #include "../../lib/kstrtox.h"
 
@@ -2463,6 +2464,11 @@ __bpf_kfunc void bpf_throw(u64 cookie)
 		WARN_ON_ONCE(!ctx.aux->exception_boundary);
 	WARN_ON_ONCE(!ctx.bp);
 	WARN_ON_ONCE(!ctx.cnt);
+	/* Prevent KASAN false positives for CONFIG_KASAN_STACK by unpoisoning
+	 * deeper stack depths than ctx.sp as we do not return from bpf_throw,
+	 * which skips compiler generated instrumentation to do the same.
+	 */
+	kasan_unpoison_task_stack_below((void *)ctx.sp);
 	ctx.aux->bpf_exception_cb(cookie, ctx.sp, ctx.bp);
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 09/14] bpf: Detect IP == ksym.end as part of BPF program
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (7 preceding siblings ...)
  2023-08-09 11:41 ` [PATCH bpf-next v2 08/14] bpf: Prevent KASAN false positive with bpf_throw Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-09 11:41 ` [PATCH bpf-next v2 10/14] bpf: Disallow extensions to exception callbacks Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

Now that bpf_throw kfunc is the first such call instruction that has
noreturn semantics within the verifier, this also kicks in dead code
elimination in unprecedented ways. For one, any instruction following
a bpf_throw call will never be marked as seen. Moreover, if a callchain
ends up throwing, any instructions after the call instruction to the
eventually throwing subprog in callers will also never be marked as
seen.

The tempting way to fix this would be to emit extra 'int3' instructions
which bump the jited_len of a program, and ensure that during runtime
when a program throws, we can discover its boundaries even if the call
instruction to bpf_throw (or to subprogs that always throw) is emitted
as the final instruction in the program.

An example of such a program would be this:

do_something():
	...
	r0 = 0
	exit

foo():
	r1 = 0
	call bpf_throw
	r0 = 0
	exit

bar(cond):
	if r1 != 0 goto pc+2
	call do_something
	exit
	call foo
	r0 = 0  // Never seen by verifier
	exit	//

main(ctx):
	r1 = ...
	call bar
	r0 = 0
	exit

Here, if we do end up throwing, the stacktrace would be the following:

bpf_throw
foo
bar
main

In bar, the final instruction emitted will be the call to foo, as such,
the return address will be the subsequent instruction (which the JIT
emits as int3 on x86). This will end up lying outside the jited_len of
the program, thus, when unwinding, we will fail to discover the return
address as belonging to any program and end up in a panic due to the
unreliable stack unwinding of BPF programs that we never expect.

To remedy this case, make bpf_prog_ksym_find treat IP == ksym.end as
part of the BPF program, so that is_bpf_text_address returns true when
such a case occurs, and we are able to unwind reliably when the final
instruction ends up being a call instruction.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index ef362d7b09a5..08d52059655c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -623,7 +623,11 @@ static __always_inline int bpf_tree_comp(void *key, struct latch_tree_node *n)

 	if (val < ksym->start)
 		return -1;
-	if (val >= ksym->end)
+	/* Ensure that we detect return addresses as part of the program, when
+	 * the final instruction is a call for a program part of the stack
+	 * trace. Therefore, do val > ksym->end instead of val >= ksym->end.
+	 */
+	if (val > ksym->end)
 		return  1;

 	return 0;
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 10/14] bpf: Disallow extensions to exception callbacks
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (8 preceding siblings ...)
  2023-08-09 11:41 ` [PATCH bpf-next v2 09/14] bpf: Detect IP == ksym.end as part of BPF program Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-22  5:09   ` Alexei Starovoitov
  2023-08-09 11:41 ` [PATCH bpf-next v2 11/14] bpf: Fix kfunc callback register type handling Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

During testing, it was discovered that extensions to exception callbacks
had no checks, upon running a testcase, the kernel ended up running off
the end of a program having final call as bpf_throw, and hitting int3
instructions.

The reason is that while the default exception callback would have reset
the stack frame to return back to the main program's caller, the
replacing extension program will simply return back to bpf_throw, which
will instead return back to the program and the program will continue
execution, now in an undefined state where anything could happen.

The way to support extensions to an exception callback would be to mark
the BPF_PROG_TYPE_EXT main subprog as an exception_cb, and prevent it
from calling bpf_throw. This would make the JIT produce a prologue that
restores saved registers and reset the stack frame. But let's not do
that until there is a concrete use case for this, and simply disallow
this for now.

One key point here to note is that currently X86_TAIL_CALL_OFFSET didn't
require any modifications, even though we emit instructions before the
corresponding endbr64 instruction. This is because we ensure that a main
subprog never serves as an exception callback, and therefore the
exception callback (which will be a global subprog) can never serve as
the tail call target, eliminating any discrepancies. However, once we
support a BPF_PROG_TYPE_EXT to also act as an exception callback, it
will end up requiring change to the tail call offset to account for the
extra instructions. For simplicitly, tail calls could be disabled for
such targets.

Noting the above, it appears better to wait for a concrete use case
before choosing to permit extension programs to replace exception
callbacks.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/helpers.c  | 1 +
 kernel/bpf/verifier.c | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 64a07232c58f..a04eff53354c 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2470,6 +2470,7 @@ __bpf_kfunc void bpf_throw(u64 cookie)
 	 */
 	kasan_unpoison_task_stack_below((void *)ctx.sp);
 	ctx.aux->bpf_exception_cb(cookie, ctx.sp, ctx.bp);
+	WARN(1, "A call to BPF exception callback should never return\n");
 }
 
 __diag_pop();
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a0e1a1d1f5d3..13db1fa4163c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19622,6 +19622,11 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 					"Extension programs should be JITed\n");
 				return -EINVAL;
 			}
+			if (aux->func && aux->func[subprog]->aux->exception_cb) {
+				bpf_log(log,
+					"Extension programs cannot replace exception callback\n");
+				return -EINVAL;
+			}
 		}
 		if (!tgt_prog->jited) {
 			bpf_log(log, "Can attach to only JITed progs\n");
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 11/14] bpf: Fix kfunc callback register type handling
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (9 preceding siblings ...)
  2023-08-09 11:41 ` [PATCH bpf-next v2 10/14] bpf: Disallow extensions to exception callbacks Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-10 21:12   ` David Marchevsky
  2023-08-09 11:41 ` [PATCH bpf-next v2 12/14] libbpf: Add support for custom exception callbacks Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Dave Marchevsky, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, David Vernet

The kfunc code to handle KF_ARG_PTR_TO_CALLBACK does not check the reg
type before using reg->subprogno. This can accidently permit invalid
pointers from being passed into callback helpers (e.g. silently from
different paths). Likewise, reg->subprogno from the per-register type
union may not be meaningful either. We need to reject any other type
except PTR_TO_FUNC.

Cc: Dave Marchevsky <davemarchevsky@fb.com>
Fixes: 5d92ddc3de1b ("bpf: Add callback validation to kfunc verifier logic")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 13db1fa4163c..1c9a7a6ef906 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11334,6 +11334,10 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			break;
 		}
 		case KF_ARG_PTR_TO_CALLBACK:
+			if (reg->type != PTR_TO_FUNC) {
+				verbose(env, "arg%d expected pointer to func\n", i);
+				return -EINVAL;
+			}
 			meta->subprogno = reg->subprogno;
 			break;
 		case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 12/14] libbpf: Add support for custom exception callbacks
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (10 preceding siblings ...)
  2023-08-09 11:41 ` [PATCH bpf-next v2 11/14] bpf: Fix kfunc callback register type handling Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-22 16:34   ` Alexei Starovoitov
  2023-08-25 18:43   ` Andrii Nakryiko
  2023-08-09 11:41 ` [PATCH bpf-next v2 13/14] selftests/bpf: Add BPF assertion macros Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

Add support to libbpf to append exception callbacks when loading a
program. The exception callback is found by discovering the declaration
tag 'exception_callback:<value>' and finding the callback in the value
of the tag.

The process is done in two steps. First, for each main program, the
bpf_object__sanitize_and_load_btf function finds and marks its
corresponding exception callback as defined by the declaration tag on
it. Second, bpf_object__reloc_code is modified to append the indicated
exception callback at the end of the instruction iteration (since
exception callback will never be appended in that loop, as it is not
directly referenced).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/libbpf.c | 166 +++++++++++++++++++++++++++++++++++------
 1 file changed, 142 insertions(+), 24 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 17883f5a44b9..7c607bac8204 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -432,9 +432,11 @@ struct bpf_program {
 	int fd;
 	bool autoload;
 	bool autoattach;
+	bool sym_global;
 	bool mark_btf_static;
 	enum bpf_prog_type type;
 	enum bpf_attach_type expected_attach_type;
+	int exception_cb_idx;
 
 	int prog_ifindex;
 	__u32 attach_btf_obj_fd;
@@ -760,6 +762,7 @@ bpf_object__init_prog(struct bpf_object *obj, struct bpf_program *prog,
 
 	prog->type = BPF_PROG_TYPE_UNSPEC;
 	prog->fd = -1;
+	prog->exception_cb_idx = -1;
 
 	/* libbpf's convention for SEC("?abc...") is that it's just like
 	 * SEC("abc...") but the corresponding bpf_program starts out with
@@ -866,20 +869,28 @@ bpf_object__add_programs(struct bpf_object *obj, Elf_Data *sec_data,
 		if (err)
 			return err;
 
+		if (ELF64_ST_BIND(sym->st_info) != STB_LOCAL)
+			prog->sym_global = true;
+
 		/* if function is a global/weak symbol, but has restricted
 		 * (STV_HIDDEN or STV_INTERNAL) visibility, mark its BTF FUNC
 		 * as static to enable more permissive BPF verification mode
 		 * with more outside context available to BPF verifier
 		 */
-		if (ELF64_ST_BIND(sym->st_info) != STB_LOCAL
-		    && (ELF64_ST_VISIBILITY(sym->st_other) == STV_HIDDEN
-			|| ELF64_ST_VISIBILITY(sym->st_other) == STV_INTERNAL))
+		if (prog->sym_global && (ELF64_ST_VISIBILITY(sym->st_other) == STV_HIDDEN
+		    || ELF64_ST_VISIBILITY(sym->st_other) == STV_INTERNAL))
 			prog->mark_btf_static = true;
 
 		nr_progs++;
 		obj->nr_programs = nr_progs;
 	}
 
+	/* After adding all programs, now pair them with their exception
+	 * callbacks if specified.
+	 */
+	if (!kernel_supports(obj, FEAT_BTF_DECL_TAG))
+		goto out;
+out:
 	return 0;
 }
 
@@ -3137,6 +3148,80 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj)
 		}
 	}
 
+	if (!kernel_supports(obj, FEAT_BTF_DECL_TAG))
+		goto skip_exception_cb;
+	for (i = 0; i < obj->nr_programs; i++) {
+		struct bpf_program *prog = &obj->programs[i];
+		int j, k, n;
+
+		if (prog_is_subprog(obj, prog))
+			continue;
+		n = btf__type_cnt(obj->btf);
+		for (j = 1; j < n; j++) {
+			const char *str = "exception_callback:", *name;
+			size_t len = strlen(str);
+			struct btf_type *t;
+
+			t = btf_type_by_id(obj->btf, j);
+			if (!btf_is_decl_tag(t) || btf_decl_tag(t)->component_idx != -1)
+				continue;
+
+			name = btf__str_by_offset(obj->btf, t->name_off);
+			if (strncmp(name, str, len))
+				continue;
+
+			t = btf_type_by_id(obj->btf, t->type);
+			if (!btf_is_func(t) || btf_func_linkage(t) != BTF_FUNC_GLOBAL) {
+				pr_warn("prog '%s': exception_callback:<value> decl tag not applied to the main program\n",
+					prog->name);
+				return -EINVAL;
+			}
+			if (strcmp(prog->name, btf__str_by_offset(obj->btf, t->name_off)))
+				continue;
+			/* Multiple callbacks are specified for the same prog,
+			 * the verifier will eventually return an error for this
+			 * case, hence simply skip appending a subprog.
+			 */
+			if (prog->exception_cb_idx >= 0) {
+				prog->exception_cb_idx = -1;
+				break;
+			}
+
+			name += len;
+			if (str_is_empty(name)) {
+				pr_warn("prog '%s': exception_callback:<value> decl tag contains empty value\n",
+					prog->name);
+				return -EINVAL;
+			}
+
+			for (k = 0; k < obj->nr_programs; k++) {
+				struct bpf_program *subprog = &obj->programs[k];
+
+				if (!prog_is_subprog(obj, subprog))
+					continue;
+				if (strcmp(name, subprog->name))
+					continue;
+				/* Enforce non-hidden, as from verifier point of
+				 * view it expects global functions, whereas the
+				 * mark_btf_static fixes up linkage as static.
+				 */
+				if (!subprog->sym_global || subprog->mark_btf_static) {
+					pr_warn("prog '%s': exception callback %s must be a global non-hidden function\n",
+						prog->name, subprog->name);
+					return -EINVAL;
+				}
+				prog->exception_cb_idx = k;
+				break;
+			}
+
+			if (prog->exception_cb_idx >= 0)
+				continue;
+			pr_warn("prog '%s': cannot find exception callback '%s'\n", prog->name, name);
+			return -ENOENT;
+		}
+	}
+skip_exception_cb:
+
 	sanitize = btf_needs_sanitization(obj);
 	if (sanitize) {
 		const void *raw_data;
@@ -6184,14 +6269,46 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra
 	return 0;
 }
 
+static int
+bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
+				struct bpf_program *subprog)
+{
+	struct bpf_insn *insns;
+	size_t new_cnt;
+	int err;
+
+	subprog->sub_insn_off = main_prog->insns_cnt;
+
+	new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
+	insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
+	if (!insns) {
+		pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
+		return -ENOMEM;
+	}
+	main_prog->insns = insns;
+	main_prog->insns_cnt = new_cnt;
+
+	memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
+	       subprog->insns_cnt * sizeof(*insns));
+
+	pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
+		 main_prog->name, subprog->insns_cnt, subprog->name);
+
+	/* The subprog insns are now appended. Append its relos too. */
+	err = append_subprog_relos(main_prog, subprog);
+	if (err)
+		return err;
+	return 0;
+}
+
 static int
 bpf_object__reloc_code(struct bpf_object *obj, struct bpf_program *main_prog,
 		       struct bpf_program *prog)
 {
-	size_t sub_insn_idx, insn_idx, new_cnt;
+	size_t sub_insn_idx, insn_idx;
 	struct bpf_program *subprog;
-	struct bpf_insn *insns, *insn;
 	struct reloc_desc *relo;
+	struct bpf_insn *insn;
 	int err;
 
 	err = reloc_prog_func_and_line_info(obj, main_prog, prog);
@@ -6266,25 +6383,7 @@ bpf_object__reloc_code(struct bpf_object *obj, struct bpf_program *main_prog,
 		 *   and relocate.
 		 */
 		if (subprog->sub_insn_off == 0) {
-			subprog->sub_insn_off = main_prog->insns_cnt;
-
-			new_cnt = main_prog->insns_cnt + subprog->insns_cnt;
-			insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns));
-			if (!insns) {
-				pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name);
-				return -ENOMEM;
-			}
-			main_prog->insns = insns;
-			main_prog->insns_cnt = new_cnt;
-
-			memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns,
-			       subprog->insns_cnt * sizeof(*insns));
-
-			pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n",
-				 main_prog->name, subprog->insns_cnt, subprog->name);
-
-			/* The subprog insns are now appended. Append its relos too. */
-			err = append_subprog_relos(main_prog, subprog);
+			err = bpf_object__append_subprog_code(obj, main_prog, subprog);
 			if (err)
 				return err;
 			err = bpf_object__reloc_code(obj, main_prog, subprog);
@@ -6308,6 +6407,25 @@ bpf_object__reloc_code(struct bpf_object *obj, struct bpf_program *main_prog,
 			 prog->name, insn_idx, insn->imm, subprog->name, subprog->sub_insn_off);
 	}
 
+	/* Now, also append exception callback if it has not been done already. */
+	if (main_prog == prog && main_prog->exception_cb_idx >= 0) {
+		subprog = &obj->programs[main_prog->exception_cb_idx];
+
+		/* Calling exception callback directly is disallowed, which the
+		 * verifier will reject later. In case it was processed already,
+		 * we can skip this step, otherwise for all other valid cases we
+		 * have to append exception callback now.
+		 */
+		if (subprog->sub_insn_off == 0) {
+			err = bpf_object__append_subprog_code(obj, main_prog, subprog);
+			if (err)
+				return err;
+			err = bpf_object__reloc_code(obj, main_prog, subprog);
+			if (err)
+				return err;
+		}
+	}
+
 	return 0;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 13/14] selftests/bpf: Add BPF assertion macros
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (11 preceding siblings ...)
  2023-08-09 11:41 ` [PATCH bpf-next v2 12/14] libbpf: Add support for custom exception callbacks Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-09 11:41 ` [PATCH bpf-next v2 14/14] selftests/bpf: Add tests for BPF exceptions Kumar Kartikeya Dwivedi
  2023-08-22 21:22 ` [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
  14 siblings, 0 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

Add macros implementing an 'assert' statement primitive using macros,
built on top of the BPF exceptions support introduced in previous
patches.

The bpf_assert_*_with variants allow supplying a value which can the be
inspected within the exception handler to signify the assert statement
that led to the program being terminated abruptly, or be returned by the
default exception handler.

Note that only 64-bit scalar values are supported with these assertion
macros, as during testing I found other cases quite unreliable in
presence of compiler shifts/manipulations extracting the value of the
right width from registers scrubbing the verifier's bounds information
and knowledge about the value in the register.

Thus, it is easier to reliably support this feature with only the full
register width, and support both signed and unsigned variants.

The bpf_assert_range is interesting in particular, which clamps the
value in the [begin, end] (both inclusive) range within verifier state,
and emits a check for the same at runtime.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../testing/selftests/bpf/bpf_experimental.h  | 242 ++++++++++++++++++
 1 file changed, 242 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 612ac86873af..faa0e785a331 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -176,4 +176,246 @@ extern void bpf_throw(u64 cookie) __ksym;
  */
 #define __exception_cb(name) __attribute__((btf_decl_tag("exception_callback:" #name)))
 
+#define __bpf_assert_signed(x) _Generic((x), \
+    unsigned long: 0,       \
+    unsigned long long: 0,  \
+    signed long: 1,         \
+    signed long long: 1     \
+)
+
+#define __bpf_assert_check(LHS, op, RHS)								 \
+	_Static_assert(sizeof(&(LHS)), "1st argument must be an lvalue expression");			 \
+	_Static_assert(sizeof(LHS) == 8, "Only 8-byte integers are supported\n");			 \
+	_Static_assert(__builtin_constant_p(__bpf_assert_signed(LHS)), "internal static assert");	 \
+	_Static_assert(__builtin_constant_p((RHS)), "2nd argument must be a constant expression")
+
+#define __bpf_assert(LHS, op, cons, RHS, VAL)							\
+	({											\
+		asm volatile ("if %[lhs] " op " %[rhs] goto +2; r1 = %[value]; call bpf_throw"	\
+			       : : [lhs] "r"(LHS), [rhs] cons(RHS), [value] "ri"(VAL) : );	\
+	})
+
+#define __bpf_assert_op_sign(LHS, op, cons, RHS, VAL, supp_sign)			\
+	({										\
+		__bpf_assert_check(LHS, op, RHS);					\
+		if (__bpf_assert_signed(LHS) && !(supp_sign))				\
+			__bpf_assert(LHS, "s" #op, cons, RHS, VAL);			\
+		else									\
+			__bpf_assert(LHS, #op, cons, RHS, VAL);				\
+	 })
+
+#define __bpf_assert_op(LHS, op, RHS, VAL, supp_sign)					\
+	({										\
+		if (sizeof(typeof(RHS)) == 8) {						\
+			const typeof(RHS) rhs_var = (RHS);				\
+			__bpf_assert_op_sign(LHS, op, "r", rhs_var, VAL, supp_sign);	\
+		} else {								\
+			__bpf_assert_op_sign(LHS, op, "i", RHS, VAL, supp_sign);	\
+		}									\
+	 })
+
+/* Description
+ *	Assert that a conditional expression is true.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the value zero when the assertion fails.
+ */
+#define bpf_assert(cond) if (!(cond)) bpf_throw(0);
+
+/* Description
+ *	Assert that a conditional expression is true.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the specified value when the assertion fails.
+ */
+#define bpf_assert_with(cond, value) if (!(cond)) bpf_throw(value);
+
+/* Description
+ *	Assert that LHS is equal to RHS. This statement updates the known value
+ *	of LHS during verification. Note that RHS must be a constant value, and
+ *	must fit within the data type of LHS.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the value zero when the assertion fails.
+ */
+#define bpf_assert_eq(LHS, RHS)						\
+	({								\
+		barrier_var(LHS);					\
+		__bpf_assert_op(LHS, ==, RHS, 0, true);			\
+	})
+
+/* Description
+ *	Assert that LHS is equal to RHS. This statement updates the known value
+ *	of LHS during verification. Note that RHS must be a constant value, and
+ *	must fit within the data type of LHS.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the specified value when the assertion fails.
+ */
+#define bpf_assert_eq_with(LHS, RHS, value)				\
+	({								\
+		barrier_var(LHS);					\
+		__bpf_assert_op(LHS, ==, RHS, value, true);		\
+	})
+
+/* Description
+ *	Assert that LHS is less than RHS. This statement updates the known
+ *	bounds of LHS during verification. Note that RHS must be a constant
+ *	value, and must fit within the data type of LHS.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the value zero when the assertion fails.
+ */
+#define bpf_assert_lt(LHS, RHS)						\
+	({								\
+		barrier_var(LHS);					\
+		__bpf_assert_op(LHS, <, RHS, 0, false);			\
+	})
+
+/* Description
+ *	Assert that LHS is less than RHS. This statement updates the known
+ *	bounds of LHS during verification. Note that RHS must be a constant
+ *	value, and must fit within the data type of LHS.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the specified value when the assertion fails.
+ */
+#define bpf_assert_lt_with(LHS, RHS, value)				\
+	({								\
+		barrier_var(LHS);					\
+		__bpf_assert_op(LHS, <, RHS, value, false);		\
+	})
+
+/* Description
+ *	Assert that LHS is greater than RHS. This statement updates the known
+ *	bounds of LHS during verification. Note that RHS must be a constant
+ *	value, and must fit within the data type of LHS.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the value zero when the assertion fails.
+ */
+#define bpf_assert_gt(LHS, RHS)						\
+	({								\
+		barrier_var(LHS);					\
+		__bpf_assert_op(LHS, >, RHS, 0, false);			\
+	})
+
+/* Description
+ *	Assert that LHS is greater than RHS. This statement updates the known
+ *	bounds of LHS during verification. Note that RHS must be a constant
+ *	value, and must fit within the data type of LHS.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the specified value when the assertion fails.
+ */
+#define bpf_assert_gt_with(LHS, RHS, value)				\
+	({								\
+		barrier_var(LHS);					\
+		__bpf_assert_op(LHS, >, RHS, value, false);		\
+	})
+
+/* Description
+ *	Assert that LHS is less than or equal to RHS. This statement updates the
+ *	known bounds of LHS during verification. Note that RHS must be a
+ *	constant value, and must fit within the data type of LHS.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the value zero when the assertion fails.
+ */
+#define bpf_assert_le(LHS, RHS)						\
+	({								\
+		barrier_var(LHS);					\
+		__bpf_assert_op(LHS, <=, RHS, 0, false);		\
+	})
+
+/* Description
+ *	Assert that LHS is less than or equal to RHS. This statement updates the
+ *	known bounds of LHS during verification. Note that RHS must be a
+ *	constant value, and must fit within the data type of LHS.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the specified value when the assertion fails.
+ */
+#define bpf_assert_le_with(LHS, RHS, value)				\
+	({								\
+		barrier_var(LHS);					\
+		__bpf_assert_op(LHS, <=, RHS, value, false);		\
+	})
+
+/* Description
+ *	Assert that LHS is greater than or equal to RHS. This statement updates
+ *	the known bounds of LHS during verification. Note that RHS must be a
+ *	constant value, and must fit within the data type of LHS.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the value zero when the assertion fails.
+ */
+#define bpf_assert_ge(LHS, RHS)						\
+	({								\
+		barrier_var(LHS);					\
+		__bpf_assert_op(LHS, >=, RHS, 0, false);		\
+	})
+
+/* Description
+ *	Assert that LHS is greater than or equal to RHS. This statement updates
+ *	the known bounds of LHS during verification. Note that RHS must be a
+ *	constant value, and must fit within the data type of LHS.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the specified value when the assertion fails.
+ */
+#define bpf_assert_ge_with(LHS, RHS, value)				\
+	({								\
+		barrier_var(LHS);					\
+		__bpf_assert_op(LHS, >=, RHS, value, false);		\
+	})
+
+/* Description
+ *	Assert that LHS is in the range [BEG, END] (inclusive of both). This
+ *	statement updates the known bounds of LHS during verification. Note
+ *	that both BEG and END must be constant values, and must fit within the
+ *	data type of LHS.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the value zero when the assertion fails.
+ */
+#define bpf_assert_range(LHS, BEG, END)					\
+	({								\
+		_Static_assert(BEG <= END, "BEG must be <= END");	\
+		barrier_var(LHS);					\
+		__bpf_assert_op(LHS, >=, BEG, 0, false);		\
+		__bpf_assert_op(LHS, <=, END, 0, false);		\
+	})
+
+/* Description
+ *	Assert that LHS is in the range [BEG, END] (inclusive of both). This
+ *	statement updates the known bounds of LHS during verification. Note
+ *	that both BEG and END must be constant values, and must fit within the
+ *	data type of LHS.
+ * Returns
+ *	Void.
+ * Throws
+ *	An exception with the specified value when the assertion fails.
+ */
+#define bpf_assert_range_with(LHS, BEG, END, value)			\
+	({								\
+		_Static_assert(BEG <= END, "BEG must be <= END");	\
+		barrier_var(LHS);					\
+		__bpf_assert_op(LHS, >=, BEG, value, false);		\
+		__bpf_assert_op(LHS, <=, END, value, false);		\
+	})
+
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH bpf-next v2 14/14] selftests/bpf: Add tests for BPF exceptions
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (12 preceding siblings ...)
  2023-08-09 11:41 ` [PATCH bpf-next v2 13/14] selftests/bpf: Add BPF assertion macros Kumar Kartikeya Dwivedi
@ 2023-08-09 11:41 ` Kumar Kartikeya Dwivedi
  2023-08-22 21:22 ` [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
  14 siblings, 0 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-09 11:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

Add selftests to cover success and failure cases of API usage, runtime
behavior and invariants that need to be maintained for implementation
correctness.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/testing/selftests/bpf/DENYLIST.aarch64  |   1 +
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../selftests/bpf/prog_tests/exceptions.c     | 324 +++++++++++++++
 .../testing/selftests/bpf/progs/exceptions.c  | 368 ++++++++++++++++++
 .../selftests/bpf/progs/exceptions_assert.c   | 135 +++++++
 .../selftests/bpf/progs/exceptions_ext.c      |  59 +++
 .../selftests/bpf/progs/exceptions_fail.c     | 347 +++++++++++++++++
 7 files changed, 1235 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/exceptions.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_assert.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_fail.c

diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
index 3b61e8b35d62..432af22973b1 100644
--- a/tools/testing/selftests/bpf/DENYLIST.aarch64
+++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
@@ -1,5 +1,6 @@
 bpf_cookie/multi_kprobe_attach_api               # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
 bpf_cookie/multi_kprobe_link_api                 # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
+exceptions					 # JIT does not support calling kfunc bpf_throw: -524
 fexit_sleep                                      # The test never returns. The remaining tests cannot start.
 kprobe_multi_bench_attach                        # bpf_program__attach_kprobe_multi_opts unexpected error: -95
 kprobe_multi_test/attach_api_addrs               # bpf_program__attach_kprobe_multi_opts unexpected error: -95
diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
index 5061d9e24c16..ce6f291665cf 100644
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@ -6,6 +6,7 @@ bpf_loop                                 # attaches to __x64_sys_nanosleep
 cgrp_local_storage                       # prog_attach unexpected error: -524                                          (trampoline)
 dynptr/test_dynptr_skb_data
 dynptr/test_skb_readonly
+exceptions				 # JIT does not support calling kfunc bpf_throw				       (exceptions)
 fexit_sleep                              # fexit_skel_load fexit skeleton failed                                       (trampoline)
 get_stack_raw_tp                         # user_stack corrupted user stack                                             (no backchain userspace)
 iters/testmod_seq*                       # s390x doesn't support kfuncs in modules yet
diff --git a/tools/testing/selftests/bpf/prog_tests/exceptions.c b/tools/testing/selftests/bpf/prog_tests/exceptions.c
new file mode 100644
index 000000000000..4a0b9910dab3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/exceptions.c
@@ -0,0 +1,324 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "exceptions.skel.h"
+#include "exceptions_ext.skel.h"
+#include "exceptions_fail.skel.h"
+#include "exceptions_assert.skel.h"
+
+static char log_buf[1024 * 1024];
+
+static void test_exceptions_failure(void)
+{
+	RUN_TESTS(exceptions_fail);
+}
+
+static void test_exceptions_success(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, ropts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	struct exceptions_ext *eskel = NULL;
+	struct exceptions *skel;
+	int ret;
+
+	skel = exceptions__open();
+	if (!ASSERT_OK_PTR(skel, "exceptions__open"))
+		return;
+
+	ret = exceptions__load(skel);
+	if (!ASSERT_OK(ret, "exceptions__load"))
+		goto done;
+
+	if (!ASSERT_OK(bpf_map_update_elem(bpf_map__fd(skel->maps.jmp_table), &(int){0},
+					   &(int){bpf_program__fd(skel->progs.exception_tail_call_target)}, BPF_ANY),
+		       "bpf_map_update_elem jmp_table"))
+		goto done;
+
+#define RUN_SUCCESS(_prog, return_val)						  \
+	if (!test__start_subtest(#_prog)) goto _prog##_##return_val;		  \
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs._prog), &ropts); \
+	ASSERT_OK(ret, #_prog " prog run ret");					  \
+	ASSERT_EQ(ropts.retval, return_val, #_prog " prog run retval");		  \
+	_prog##_##return_val:
+
+	RUN_SUCCESS(exception_throw_always_1, 64);
+	RUN_SUCCESS(exception_throw_always_2, 32);
+	RUN_SUCCESS(exception_throw_unwind_1, 16);
+	RUN_SUCCESS(exception_throw_unwind_2, 32);
+	RUN_SUCCESS(exception_throw_default, 0);
+	RUN_SUCCESS(exception_throw_default_value, 5);
+	RUN_SUCCESS(exception_tail_call, 24);
+	RUN_SUCCESS(exception_ext, 0);
+	RUN_SUCCESS(exception_ext_mod_cb_runtime, 35);
+	RUN_SUCCESS(exception_throw_subprog, 1);
+	RUN_SUCCESS(exception_assert_nz_gfunc, 1);
+	RUN_SUCCESS(exception_assert_zero_gfunc, 1);
+	RUN_SUCCESS(exception_assert_neg_gfunc, 1);
+	RUN_SUCCESS(exception_assert_pos_gfunc, 1);
+	RUN_SUCCESS(exception_assert_negeq_gfunc, 1);
+	RUN_SUCCESS(exception_assert_poseq_gfunc, 1);
+	RUN_SUCCESS(exception_assert_nz_gfunc_with, 1);
+	RUN_SUCCESS(exception_assert_zero_gfunc_with, 1);
+	RUN_SUCCESS(exception_assert_neg_gfunc_with, 1);
+	RUN_SUCCESS(exception_assert_pos_gfunc_with, 1);
+	RUN_SUCCESS(exception_assert_negeq_gfunc_with, 1);
+	RUN_SUCCESS(exception_assert_poseq_gfunc_with, 1);
+	RUN_SUCCESS(exception_bad_assert_nz_gfunc, 0);
+	RUN_SUCCESS(exception_bad_assert_zero_gfunc, 0);
+	RUN_SUCCESS(exception_bad_assert_neg_gfunc, 0);
+	RUN_SUCCESS(exception_bad_assert_pos_gfunc, 0);
+	RUN_SUCCESS(exception_bad_assert_negeq_gfunc, 0);
+	RUN_SUCCESS(exception_bad_assert_poseq_gfunc, 0);
+	RUN_SUCCESS(exception_bad_assert_nz_gfunc_with, 100);
+	RUN_SUCCESS(exception_bad_assert_zero_gfunc_with, 105);
+	RUN_SUCCESS(exception_bad_assert_neg_gfunc_with, 200);
+	RUN_SUCCESS(exception_bad_assert_pos_gfunc_with, 0);
+	RUN_SUCCESS(exception_bad_assert_negeq_gfunc_with, 101);
+	RUN_SUCCESS(exception_bad_assert_poseq_gfunc_with, 99);
+	RUN_SUCCESS(exception_assert_range, 1);
+	RUN_SUCCESS(exception_assert_range_with, 1);
+	RUN_SUCCESS(exception_bad_assert_range, 0);
+	RUN_SUCCESS(exception_bad_assert_range_with, 10);
+
+#define RUN_EXT(load_ret, attach_err, expr, msg, after_link)			  \
+	{									  \
+		LIBBPF_OPTS(bpf_object_open_opts, o, .kernel_log_buf = log_buf,		 \
+						     .kernel_log_size = sizeof(log_buf), \
+						     .kernel_log_level = 2);		 \
+		exceptions_ext__destroy(eskel);					  \
+		eskel = exceptions_ext__open_opts(&o);				  \
+		struct bpf_program *prog = NULL;				  \
+		struct bpf_link *link = NULL;					  \
+		if (!ASSERT_OK_PTR(eskel, "exceptions_ext__open"))		  \
+			goto done;						  \
+		(expr);								  \
+		ASSERT_OK_PTR(bpf_program__name(prog), bpf_program__name(prog));  \
+		if (!ASSERT_EQ(exceptions_ext__load(eskel), load_ret,		  \
+			       "exceptions_ext__load"))	{			  \
+			printf("%s\n", log_buf);				  \
+			goto done;						  \
+		}								  \
+		if (load_ret != 0) {						  \
+			printf("%s\n", log_buf);				  \
+			if (!ASSERT_OK_PTR(strstr(log_buf, msg), "strstr"))	  \
+				goto done;					  \
+		}								  \
+		if (!load_ret && attach_err) {					  \
+			if (!ASSERT_ERR_PTR(link = bpf_program__attach(prog), "attach err")) \
+				goto done;					  \
+		} else if (!load_ret) {						  \
+			if (!ASSERT_OK_PTR(link = bpf_program__attach(prog), "attach ok"))  \
+				goto done;					  \
+			(void)(after_link);					  \
+			bpf_link__destroy(link);				  \
+		}								  \
+	}
+
+	if (test__start_subtest("throwing extension with exception_cb"))
+		RUN_EXT(0, false, ({
+			prog = eskel->progs.throwing_extension;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_ext),
+				       "exception_ext_global"), "set_attach_target"))
+				goto done;
+		}), "", ({ RUN_SUCCESS(exception_ext, 128); }));
+
+	if (test__start_subtest("throwing exception_cb extension with exception_cb"))
+		RUN_EXT(-EINVAL, true, ({
+			prog = eskel->progs.throwing_exception_cb_extension;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_ext_mod_cb_runtime),
+				       "exception_cb_mod"), "set_attach_target"))
+				goto done;
+		}), "Extension programs cannot replace exception callback", 0);
+
+	if (test__start_subtest("throwing extension with exception_cb global"))
+		RUN_EXT(0, false, ({
+			prog = eskel->progs.throwing_exception_cb_extension;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_ext_mod_cb_runtime),
+				       "exception_cb_mod_global"), "set_attach_target"))
+				goto done;
+		}), "", ({ RUN_SUCCESS(exception_ext_mod_cb_runtime, 131); }));
+
+	if (test__start_subtest("non-throwing fexit -> non-throwing subprog"))
+		/* non-throwing fexit -> non-throwing subprog : OK */
+		RUN_EXT(0, false, ({
+			prog = eskel->progs.pfexit;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "subprog"), "set_attach_target"))
+				goto done;
+		}), "", 0);
+
+	if (test__start_subtest("throwing fexit -> non-throwing subprog"))
+		/* throwing fexit -> non-throwing subprog : OK */
+		RUN_EXT(0, false, ({
+			prog = eskel->progs.throwing_fexit;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "subprog"), "set_attach_target"))
+				goto done;
+		}), "", 0);
+
+	if (test__start_subtest("non-throwing fexit -> throwing subprog"))
+		/* non-throwing fexit -> throwing subprog : OK */
+		RUN_EXT(0, false, ({
+			prog = eskel->progs.pfexit;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "throwing_subprog"), "set_attach_target"))
+				goto done;
+		}), "", 0);
+
+	if (test__start_subtest("throwing fexit -> throwing subprog"))
+		/* throwing fexit -> throwing subprog : OK */
+		RUN_EXT(0, false, ({
+			prog = eskel->progs.throwing_fexit;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "throwing_subprog"), "set_attach_target"))
+				goto done;
+		}), "", 0);
+
+	/* fmod_ret not allowed for subprog - Check so we remember to handle its
+	 * throwing specification compatibility with target when supported.
+	 */
+	if (test__start_subtest("non-throwing fmod_ret -> non-throwing subprog"))
+		RUN_EXT(-EINVAL, true, ({
+			prog = eskel->progs.pfmod_ret;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "subprog"), "set_attach_target"))
+				goto done;
+		}), "can't modify return codes of BPF program", 0);
+
+	/* fmod_ret not allowed for subprog - Check so we remember to handle its
+	 * throwing specification compatibility with target when supported.
+	 */
+	if (test__start_subtest("non-throwing fmod_ret -> non-throwing global subprog"))
+		RUN_EXT(-EINVAL, true, ({
+			prog = eskel->progs.pfmod_ret;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "global_subprog"), "set_attach_target"))
+				goto done;
+		}), "can't modify return codes of BPF program", 0);
+
+	if (test__start_subtest("non-throwing extension -> non-throwing subprog"))
+		/* non-throwing extension -> non-throwing subprog : BAD (!global) */
+		RUN_EXT(-EINVAL, true, ({
+			prog = eskel->progs.extension;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "subprog"), "set_attach_target"))
+				goto done;
+		}), "subprog() is not a global function", 0);
+
+	if (test__start_subtest("non-throwing extension -> throwing subprog"))
+		/* non-throwing extension -> throwing subprog : BAD (!global) */
+		RUN_EXT(-EINVAL, true, ({
+			prog = eskel->progs.extension;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "throwing_subprog"), "set_attach_target"))
+				goto done;
+		}), "throwing_subprog() is not a global function", 0);
+
+	if (test__start_subtest("non-throwing extension -> non-throwing subprog"))
+		/* non-throwing extension -> non-throwing global subprog : OK */
+		RUN_EXT(0, false, ({
+			prog = eskel->progs.extension;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "global_subprog"), "set_attach_target"))
+				goto done;
+		}), "", 0);
+
+	if (test__start_subtest("non-throwing extension -> throwing global subprog"))
+		/* non-throwing extension -> throwing global subprog : OK */
+		RUN_EXT(0, false, ({
+			prog = eskel->progs.extension;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "throwing_global_subprog"), "set_attach_target"))
+				goto done;
+		}), "", 0);
+
+	if (test__start_subtest("throwing extension -> throwing global subprog"))
+		/* throwing extension -> throwing global subprog : OK */
+		RUN_EXT(0, false, ({
+			prog = eskel->progs.throwing_extension;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "throwing_global_subprog"), "set_attach_target"))
+				goto done;
+		}), "", 0);
+
+	if (test__start_subtest("throwing extension -> non-throwing global subprog"))
+		/* throwing extension -> non-throwing global subprog : OK */
+		RUN_EXT(0, false, ({
+			prog = eskel->progs.throwing_extension;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "global_subprog"), "set_attach_target"))
+				goto done;
+		}), "", 0);
+
+	if (test__start_subtest("non-throwing extension -> main subprog"))
+		/* non-throwing extension -> main subprog : OK */
+		RUN_EXT(0, false, ({
+			prog = eskel->progs.extension;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "exception_throw_subprog"), "set_attach_target"))
+				goto done;
+		}), "", 0);
+
+	if (test__start_subtest("throwing extension -> main subprog"))
+		/* throwing extension -> main subprog : OK */
+		RUN_EXT(0, false, ({
+			prog = eskel->progs.throwing_extension;
+			bpf_program__set_autoload(prog, true);
+			if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+				       bpf_program__fd(skel->progs.exception_throw_subprog),
+				       "exception_throw_subprog"), "set_attach_target"))
+				goto done;
+		}), "", 0);
+
+done:
+	exceptions_ext__destroy(eskel);
+	exceptions__destroy(skel);
+}
+
+static void test_exceptions_assertions(void)
+{
+	RUN_TESTS(exceptions_assert);
+}
+
+void test_exceptions(void)
+{
+	test_exceptions_success();
+	test_exceptions_failure();
+	test_exceptions_assertions();
+}
diff --git a/tools/testing/selftests/bpf/progs/exceptions.c b/tools/testing/selftests/bpf/progs/exceptions.c
new file mode 100644
index 000000000000..2811ee842b01
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/exceptions.c
@@ -0,0 +1,368 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include <bpf/bpf_endian.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+#ifndef ETH_P_IP
+#define ETH_P_IP 0x0800
+#endif
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 4);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} jmp_table SEC(".maps");
+
+static __noinline int static_func(u64 i)
+{
+	bpf_throw(32);
+	return i;
+}
+
+__noinline int global2static_simple(u64 i)
+{
+	static_func(i + 2);
+	return i - 1;
+}
+
+__noinline int global2static(u64 i)
+{
+	if (i == ETH_P_IP)
+		bpf_throw(16);
+	return static_func(i);
+}
+
+static __noinline int static2global(u64 i)
+{
+	return global2static(i) + i;
+}
+
+SEC("tc")
+int exception_throw_always_1(struct __sk_buff *ctx)
+{
+	bpf_throw(64);
+	return 0;
+}
+
+/* In this case, the global func will never be seen executing after call to
+ * static subprog, hence verifier will DCE the remaining instructions. Ensure we
+ * are resilient to that.
+ */
+SEC("tc")
+int exception_throw_always_2(struct __sk_buff *ctx)
+{
+	return global2static_simple(ctx->protocol);
+}
+
+SEC("tc")
+int exception_throw_unwind_1(struct __sk_buff *ctx)
+{
+	return static2global(bpf_ntohs(ctx->protocol));
+}
+
+SEC("tc")
+int exception_throw_unwind_2(struct __sk_buff *ctx)
+{
+	return static2global(bpf_ntohs(ctx->protocol) - 1);
+}
+
+SEC("tc")
+int exception_throw_default(struct __sk_buff *ctx)
+{
+	bpf_throw(0);
+	return 1;
+}
+
+SEC("tc")
+int exception_throw_default_value(struct __sk_buff *ctx)
+{
+	bpf_throw(5);
+	return 1;
+}
+
+SEC("tc")
+int exception_tail_call_target(struct __sk_buff *ctx)
+{
+	bpf_throw(16);
+	return 0;
+}
+
+static __noinline
+int exception_tail_call_subprog(struct __sk_buff *ctx)
+{
+	volatile int ret = 10;
+
+	bpf_tail_call_static(ctx, &jmp_table, 0);
+	return ret;
+}
+
+SEC("tc")
+int exception_tail_call(struct __sk_buff *ctx) {
+	volatile int ret = 0;
+
+	ret = exception_tail_call_subprog(ctx);
+	return ret + 8;
+}
+
+__noinline int exception_ext_global(struct __sk_buff *ctx)
+{
+	volatile int ret = 0;
+
+	return ret;
+}
+
+static __noinline int exception_ext_static(struct __sk_buff *ctx)
+{
+	return exception_ext_global(ctx);
+}
+
+SEC("tc")
+int exception_ext(struct __sk_buff *ctx)
+{
+	return exception_ext_static(ctx);
+}
+
+__noinline int exception_cb_mod_global(u64 cookie)
+{
+	volatile int ret = 0;
+
+	return ret;
+}
+
+/* Example of how the exception callback supplied during verification can still
+ * introduce extensions by calling to dummy global functions, and alter runtime
+ * behavior.
+ *
+ * Right now we don't allow freplace attachment to exception callback itself,
+ * but if the need arises this restriction is technically feasible to relax in
+ * the future.
+ */
+__noinline int exception_cb_mod(u64 cookie)
+{
+	return exception_cb_mod_global(cookie) + cookie + 10;
+}
+
+SEC("tc")
+__exception_cb(exception_cb_mod)
+int exception_ext_mod_cb_runtime(struct __sk_buff *ctx)
+{
+	bpf_throw(25);
+	return 0;
+}
+
+__noinline static int subprog(struct __sk_buff *ctx)
+{
+	return bpf_ktime_get_ns();
+}
+
+__noinline static int throwing_subprog(struct __sk_buff *ctx)
+{
+	if (ctx->tstamp)
+		bpf_throw(0);
+	return bpf_ktime_get_ns();
+}
+
+__noinline int global_subprog(struct __sk_buff *ctx)
+{
+	return bpf_ktime_get_ns();
+}
+
+__noinline int throwing_global_subprog(struct __sk_buff *ctx)
+{
+	if (ctx->tstamp)
+		bpf_throw(0);
+	return bpf_ktime_get_ns();
+}
+
+SEC("tc")
+int exception_throw_subprog(struct __sk_buff *ctx)
+{
+	switch (ctx->protocol) {
+	case 1:
+		return subprog(ctx);
+	case 2:
+		return global_subprog(ctx);
+	case 3:
+		return throwing_subprog(ctx);
+	case 4:
+		return throwing_global_subprog(ctx);
+	default:
+		break;
+	}
+	bpf_throw(1);
+	return 0;
+}
+
+__noinline int assert_nz_gfunc(u64 c)
+{
+	volatile u64 cookie = c;
+
+	bpf_assert(cookie != 0);
+	return 0;
+}
+
+__noinline int assert_zero_gfunc(u64 c)
+{
+	volatile u64 cookie = c;
+
+	bpf_assert_eq(cookie, 0);
+	return 0;
+}
+
+__noinline int assert_neg_gfunc(s64 c)
+{
+	volatile s64 cookie = c;
+
+	bpf_assert_lt(cookie, 0);
+	return 0;
+}
+
+__noinline int assert_pos_gfunc(s64 c)
+{
+	volatile s64 cookie = c;
+
+	bpf_assert_gt(cookie, 0);
+	return 0;
+}
+
+__noinline int assert_negeq_gfunc(s64 c)
+{
+	volatile s64 cookie = c;
+
+	bpf_assert_le(cookie, -1);
+	return 0;
+}
+
+__noinline int assert_poseq_gfunc(s64 c)
+{
+	volatile s64 cookie = c;
+
+	bpf_assert_ge(cookie, 1);
+	return 0;
+}
+
+__noinline int assert_nz_gfunc_with(u64 c)
+{
+	volatile u64 cookie = c;
+
+	bpf_assert_with(cookie != 0, cookie + 100);
+	return 0;
+}
+
+__noinline int assert_zero_gfunc_with(u64 c)
+{
+	volatile u64 cookie = c;
+
+	bpf_assert_eq_with(cookie, 0, cookie + 100);
+	return 0;
+}
+
+__noinline int assert_neg_gfunc_with(s64 c)
+{
+	volatile s64 cookie = c;
+
+	bpf_assert_lt_with(cookie, 0, cookie + 100);
+	return 0;
+}
+
+__noinline int assert_pos_gfunc_with(s64 c)
+{
+	volatile s64 cookie = c;
+
+	bpf_assert_gt_with(cookie, 0, cookie + 100);
+	return 0;
+}
+
+__noinline int assert_negeq_gfunc_with(s64 c)
+{
+	volatile s64 cookie = c;
+
+	bpf_assert_le_with(cookie, -1, cookie + 100);
+	return 0;
+}
+
+__noinline int assert_poseq_gfunc_with(s64 c)
+{
+	volatile s64 cookie = c;
+
+	bpf_assert_ge_with(cookie, 1, cookie + 100);
+	return 0;
+}
+
+#define check_assert(name, cookie, tag)				\
+SEC("tc")							\
+int exception##tag##name(struct __sk_buff *ctx)			\
+{								\
+	return name(cookie) + 1;				\
+}
+
+check_assert(assert_nz_gfunc, 5, _);
+check_assert(assert_zero_gfunc, 0, _);
+check_assert(assert_neg_gfunc, -100, _);
+check_assert(assert_pos_gfunc, 100, _);
+check_assert(assert_negeq_gfunc, -1, _);
+check_assert(assert_poseq_gfunc, 1, _);
+
+check_assert(assert_nz_gfunc_with, 5, _);
+check_assert(assert_zero_gfunc_with, 0, _);
+check_assert(assert_neg_gfunc_with, -100, _);
+check_assert(assert_pos_gfunc_with, 100, _);
+check_assert(assert_negeq_gfunc_with, -1, _);
+check_assert(assert_poseq_gfunc_with, 1, _);
+
+check_assert(assert_nz_gfunc, 0, _bad_);
+check_assert(assert_zero_gfunc, 5, _bad_);
+check_assert(assert_neg_gfunc, 100, _bad_);
+check_assert(assert_pos_gfunc, -100, _bad_);
+check_assert(assert_negeq_gfunc, 1, _bad_);
+check_assert(assert_poseq_gfunc, -1, _bad_);
+
+check_assert(assert_nz_gfunc_with, 0, _bad_);
+check_assert(assert_zero_gfunc_with, 5, _bad_);
+check_assert(assert_neg_gfunc_with, 100, _bad_);
+check_assert(assert_pos_gfunc_with, -100, _bad_);
+check_assert(assert_negeq_gfunc_with, 1, _bad_);
+check_assert(assert_poseq_gfunc_with, -1, _bad_);
+
+SEC("tc")
+int exception_assert_range(struct __sk_buff *ctx)
+{
+	u64 time = bpf_ktime_get_ns();
+
+	bpf_assert_range(time, 0, ~0ULL);
+	return 1;
+}
+
+SEC("tc")
+int exception_assert_range_with(struct __sk_buff *ctx)
+{
+	u64 time = bpf_ktime_get_ns();
+
+	bpf_assert_range_with(time, 0, ~0ULL, 10);
+	return 1;
+}
+
+SEC("tc")
+int exception_bad_assert_range(struct __sk_buff *ctx)
+{
+	u64 time = bpf_ktime_get_ns();
+
+	bpf_assert_range(time, -100, 100);
+	return 1;
+}
+
+SEC("tc")
+int exception_bad_assert_range_with(struct __sk_buff *ctx)
+{
+	u64 time = bpf_ktime_get_ns();
+
+	bpf_assert_range_with(time, -1000, 1000, 10);
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/exceptions_assert.c b/tools/testing/selftests/bpf/progs/exceptions_assert.c
new file mode 100644
index 000000000000..fa35832e6748
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/exceptions_assert.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <limits.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include <bpf/bpf_endian.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+#define check_assert(type, op, name, value)				\
+	SEC("?tc")							\
+	__log_level(2) __failure					\
+	int check_assert_##op##_##name(void *ctx)			\
+	{								\
+		type num = bpf_ktime_get_ns();				\
+		bpf_assert_##op(num, value);				\
+		return *(u64 *)num;					\
+	}
+
+__msg(": R0_w=-2147483648 R10=fp0")
+check_assert(s64, eq, int_min, INT_MIN);
+__msg(": R0_w=2147483647 R10=fp0")
+check_assert(s64, eq, int_max, INT_MAX);
+__msg(": R0_w=0 R10=fp0")
+check_assert(s64, eq, zero, 0);
+__msg(": R0_w=-9223372036854775808 R1_w=-9223372036854775808 R10=fp0")
+check_assert(s64, eq, llong_min, LLONG_MIN);
+__msg(": R0_w=9223372036854775807 R1_w=9223372036854775807 R10=fp0")
+check_assert(s64, eq, llong_max, LLONG_MAX);
+
+__msg(": R0_w=scalar(smax=2147483646) R10=fp0")
+check_assert(s64, lt, pos, INT_MAX);
+__msg(": R0_w=scalar(umin=9223372036854775808,var_off=(0x8000000000000000; 0x7fffffffffffffff))")
+check_assert(s64, lt, zero, 0);
+__msg(": R0_w=scalar(umin=9223372036854775808,umax=18446744071562067967,var_off=(0x8000000000000000; 0x7fffffffffffffff))")
+check_assert(s64, lt, neg, INT_MIN);
+
+__msg(": R0_w=scalar(smax=2147483647) R10=fp0")
+check_assert(s64, le, pos, INT_MAX);
+__msg(": R0_w=scalar(smax=0) R10=fp0")
+check_assert(s64, le, zero, 0);
+__msg(": R0_w=scalar(umin=9223372036854775808,umax=18446744071562067968,var_off=(0x8000000000000000; 0x7fffffffffffffff))")
+check_assert(s64, le, neg, INT_MIN);
+
+__msg(": R0_w=scalar(umin=2147483648,umax=9223372036854775807,var_off=(0x0; 0x7fffffffffffffff))")
+check_assert(s64, gt, pos, INT_MAX);
+__msg(": R0_w=scalar(umin=1,umax=9223372036854775807,var_off=(0x0; 0x7fffffffffffffff))")
+check_assert(s64, gt, zero, 0);
+__msg(": R0_w=scalar(smin=-2147483647) R10=fp0")
+check_assert(s64, gt, neg, INT_MIN);
+
+__msg(": R0_w=scalar(umin=2147483647,umax=9223372036854775807,var_off=(0x0; 0x7fffffffffffffff))")
+check_assert(s64, ge, pos, INT_MAX);
+__msg(": R0_w=scalar(umax=9223372036854775807,var_off=(0x0; 0x7fffffffffffffff)) R10=fp0")
+check_assert(s64, ge, zero, 0);
+__msg(": R0_w=scalar(smin=-2147483648) R10=fp0")
+check_assert(s64, ge, neg, INT_MIN);
+
+SEC("?tc")
+__log_level(2) __failure
+__msg(": R0=0 R1=ctx(off=0,imm=0) R2=scalar(smin=-2147483646,smax=2147483645) R10=fp0")
+int check_assert_range_s64(struct __sk_buff *ctx)
+{
+	struct bpf_sock *sk = ctx->sk;
+	s64 num;
+
+	_Static_assert(_Generic((sk->rx_queue_mapping), s32: 1, default: 0), "type match");
+	if (!sk)
+		return 0;
+	num = sk->rx_queue_mapping;
+	bpf_assert_range(num, INT_MIN + 2, INT_MAX - 2);
+	return *((u8 *)ctx + num);
+}
+
+SEC("?tc")
+__log_level(2) __failure
+__msg(": R1=ctx(off=0,imm=0) R2=scalar(umin=4096,umax=8192,var_off=(0x0; 0x3fff))")
+int check_assert_range_u64(struct __sk_buff *ctx)
+{
+	u64 num = ctx->len;
+
+	bpf_assert_range(num, 4096, 8192);
+	return *((u8 *)ctx + num);
+}
+
+SEC("?tc")
+__log_level(2) __failure
+__msg(": R0=0 R1=ctx(off=0,imm=0) R2=4096 R10=fp0")
+int check_assert_single_range_s64(struct __sk_buff *ctx)
+{
+	struct bpf_sock *sk = ctx->sk;
+	s64 num;
+
+	_Static_assert(_Generic((sk->rx_queue_mapping), s32: 1, default: 0), "type match");
+	if (!sk)
+		return 0;
+	num = sk->rx_queue_mapping;
+
+	bpf_assert_range(num, 4096, 4096);
+	return *((u8 *)ctx + num);
+}
+
+SEC("?tc")
+__log_level(2) __failure
+__msg(": R1=ctx(off=0,imm=0) R2=4096 R10=fp0")
+int check_assert_single_range_u64(struct __sk_buff *ctx)
+{
+	u64 num = ctx->len;
+
+	bpf_assert_range(num, 4096, 4096);
+	return *((u8 *)ctx + num);
+}
+
+SEC("?tc")
+__log_level(2) __failure
+__msg(": R1=pkt(off=64,r=64,imm=0) R2=pkt_end(off=0,imm=0) R6=pkt(off=0,r=64,imm=0) R10=fp0")
+int check_assert_generic(struct __sk_buff *ctx)
+{
+	u8 *data_end = (void *)(long)ctx->data_end;
+	u8 *data = (void *)(long)ctx->data;
+
+	bpf_assert(data + 64 <= data_end);
+	return data[128];
+}
+
+SEC("?fentry/bpf_check")
+__failure __msg("At program exit the register R0 has value (0x40; 0x0)")
+int check_assert_with_return(void *ctx)
+{
+	bpf_assert_with(!ctx, 64);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/exceptions_ext.c b/tools/testing/selftests/bpf/progs/exceptions_ext.c
new file mode 100644
index 000000000000..e37f05a03d86
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/exceptions_ext.c
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_experimental.h"
+
+__noinline int exception_cb(u64 cookie)
+{
+	return cookie + 64;
+}
+
+SEC("?freplace")
+int extension(struct __sk_buff *ctx)
+{
+	return 0;
+}
+
+SEC("?freplace")
+__exception_cb(exception_cb)
+int throwing_exception_cb_extension(u64 cookie)
+{
+	bpf_throw(32);
+	return 0;
+}
+
+SEC("?freplace")
+__exception_cb(exception_cb)
+int throwing_extension(struct __sk_buff *ctx)
+{
+	bpf_throw(64);
+	return 0;
+}
+
+SEC("?fexit")
+int pfexit(void *ctx)
+{
+	return 0;
+}
+
+SEC("?fexit")
+int throwing_fexit(void *ctx)
+{
+	bpf_throw(0);
+	return 0;
+}
+
+SEC("?fmod_ret")
+int pfmod_ret(void *ctx)
+{
+	return 0;
+}
+
+SEC("?fmod_ret")
+int throwing_fmod_ret(void *ctx)
+{
+	bpf_throw(0);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c b/tools/testing/selftests/bpf/progs/exceptions_fail.c
new file mode 100644
index 000000000000..4c39e920dac2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c
@@ -0,0 +1,347 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+extern void bpf_rcu_read_lock(void) __ksym;
+
+#define private(name) SEC(".bss." #name) __hidden __attribute__((aligned(8)))
+
+struct foo {
+	struct bpf_rb_node node;
+};
+
+struct hmap_elem {
+	struct bpf_timer timer;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 64);
+	__type(key, int);
+	__type(value, struct hmap_elem);
+} hmap SEC(".maps");
+
+private(A) struct bpf_spin_lock lock;
+private(A) struct bpf_rb_root rbtree __contains(foo, node);
+
+__noinline void *exception_cb_bad_ret_type(u64 cookie)
+{
+	return NULL;
+}
+
+__noinline int exception_cb_bad_arg_0(void)
+{
+	return 0;
+}
+
+__noinline int exception_cb_bad_arg_2(int a, int b)
+{
+	return 0;
+}
+
+__noinline int exception_cb_ok_arg_small(int a)
+{
+	return 0;
+}
+
+SEC("?tc")
+__exception_cb(exception_cb_bad_ret_type)
+__failure __msg("Global function exception_cb_bad_ret_type() doesn't return scalar.")
+int reject_exception_cb_type_1(struct __sk_buff *ctx)
+{
+	bpf_throw(0);
+	return 0;
+}
+
+SEC("?tc")
+__exception_cb(exception_cb_bad_arg_0)
+__failure __msg("exception cb only supports single integer argument")
+int reject_exception_cb_type_2(struct __sk_buff *ctx)
+{
+	bpf_throw(0);
+	return 0;
+}
+
+SEC("?tc")
+__exception_cb(exception_cb_bad_arg_2)
+__failure __msg("exception cb only supports single integer argument")
+int reject_exception_cb_type_3(struct __sk_buff *ctx)
+{
+	bpf_throw(0);
+	return 0;
+}
+
+SEC("?tc")
+__exception_cb(exception_cb_ok_arg_small)
+__success
+int reject_exception_cb_type_4(struct __sk_buff *ctx)
+{
+	bpf_throw(0);
+	return 0;
+}
+
+__noinline
+static int timer_cb(void *map, int *key, struct bpf_timer *timer)
+{
+	bpf_throw(0);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("cannot be called from callback subprog")
+int reject_async_callback_throw(struct __sk_buff *ctx)
+{
+	struct hmap_elem *elem;
+
+	elem = bpf_map_lookup_elem(&hmap, &(int){0});
+	if (!elem)
+		return 0;
+	return bpf_timer_set_callback(&elem->timer, timer_cb);
+}
+
+__noinline static int subprog_lock(struct __sk_buff *ctx)
+{
+	volatile int ret = 0;
+
+	bpf_spin_lock(&lock);
+	if (ctx->len)
+		bpf_throw(0);
+	return ret;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed while holding a lock")
+int reject_with_lock(void *ctx)
+{
+	bpf_spin_lock(&lock);
+	bpf_throw(0);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed while holding a lock")
+int reject_subprog_with_lock(void *ctx)
+{
+	return subprog_lock(ctx);
+}
+
+SEC("?tc")
+__failure __msg("bpf_rcu_read_unlock is missing")
+int reject_with_rcu_read_lock(void *ctx)
+{
+	bpf_rcu_read_lock();
+	bpf_throw(0);
+	return 0;
+}
+
+__noinline static int throwing_subprog(struct __sk_buff *ctx)
+{
+	if (ctx->len)
+		bpf_throw(0);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("bpf_rcu_read_unlock is missing")
+int reject_subprog_with_rcu_read_lock(void *ctx)
+{
+	bpf_rcu_read_lock();
+	return throwing_subprog(ctx);
+}
+
+static bool rbless(struct bpf_rb_node *n1, const struct bpf_rb_node *n2)
+{
+	bpf_throw(0);
+	return true;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed while holding a lock")
+int reject_with_rbtree_add_throw(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_spin_lock(&lock);
+	bpf_rbtree_add(&rbtree, &f->node, rbless);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("Unreleased reference")
+int reject_with_reference(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_throw(0);
+	return 0;
+}
+
+__noinline static int subprog_ref(struct __sk_buff *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_throw(0);
+	return 0;
+}
+
+__noinline static int subprog_cb_ref(u32 i, void *ctx)
+{
+	bpf_throw(0);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("Unreleased reference")
+int reject_with_cb_reference(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_loop(5, subprog_cb_ref, NULL, 0);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("cannot be called from callback")
+int reject_with_cb(void *ctx)
+{
+	bpf_loop(5, subprog_cb_ref, NULL, 0);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("Unreleased reference")
+int reject_with_subprog_reference(void *ctx)
+{
+	return subprog_ref(ctx) + 1;
+}
+
+__noinline int throwing_exception_cb(u64 c)
+{
+	bpf_throw(0);
+	return c;
+}
+
+__noinline int exception_cb1(u64 c)
+{
+	return c;
+}
+
+__noinline int exception_cb2(u64 c)
+{
+	return c;
+}
+
+static __noinline int static_func(struct __sk_buff *ctx)
+{
+	return exception_cb1(ctx->tstamp);
+}
+
+__noinline int global_func(struct __sk_buff *ctx)
+{
+	return exception_cb1(ctx->tstamp);
+}
+
+SEC("?tc")
+__exception_cb(throwing_exception_cb)
+__failure __msg("cannot be called from callback subprog")
+int reject_throwing_exception_cb(struct __sk_buff *ctx)
+{
+	return 0;
+}
+
+SEC("?tc")
+__exception_cb(exception_cb1)
+__failure __msg("cannot call exception cb directly")
+int reject_exception_cb_call_global_func(struct __sk_buff *ctx)
+{
+	return global_func(ctx);
+}
+
+SEC("?tc")
+__exception_cb(exception_cb1)
+__failure __msg("cannot call exception cb directly")
+int reject_exception_cb_call_static_func(struct __sk_buff *ctx)
+{
+	return static_func(ctx);
+}
+
+SEC("?tc")
+__exception_cb(exception_cb1)
+__exception_cb(exception_cb2)
+__failure __msg("multiple exception callback tags for main subprog")
+int reject_multiple_exception_cb(struct __sk_buff *ctx)
+{
+	bpf_throw(0);
+	return 16;
+}
+
+__noinline int exception_cb_bad_ret(u64 c)
+{
+	return c;
+}
+
+SEC("?fentry/bpf_check")
+__exception_cb(exception_cb_bad_ret)
+__failure __msg("At program exit the register R0 has unknown scalar value should")
+int reject_set_exception_cb_bad_ret1(void *ctx)
+{
+	return 0;
+}
+
+SEC("?fentry/bpf_check")
+__failure __msg("At program exit the register R0 has value (0x40; 0x0) should")
+int reject_set_exception_cb_bad_ret2(void *ctx)
+{
+	bpf_throw(64);
+	return 0;
+}
+
+__noinline static int loop_cb1(u32 index, int *ctx)
+{
+	bpf_throw(0);
+	return 0;
+}
+
+__noinline static int loop_cb2(u32 index, int *ctx)
+{
+	bpf_throw(0);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("cannot be called from callback")
+int reject_exception_throw_cb(struct __sk_buff *ctx)
+{
+	bpf_loop(5, loop_cb1, NULL, 0);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("cannot be called from callback")
+int reject_exception_throw_cb_diff(struct __sk_buff *ctx)
+{
+	if (ctx->protocol)
+		bpf_loop(5, loop_cb1, NULL, 0);
+	else
+		bpf_loop(5, loop_cb2, NULL, 0);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 11/14] bpf: Fix kfunc callback register type handling
  2023-08-09 11:41 ` [PATCH bpf-next v2 11/14] bpf: Fix kfunc callback register type handling Kumar Kartikeya Dwivedi
@ 2023-08-10 21:12   ` David Marchevsky
  0 siblings, 0 replies; 37+ messages in thread
From: David Marchevsky @ 2023-08-10 21:12 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Dave Marchevsky, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, David Vernet

On 8/9/23 7:41 AM, Kumar Kartikeya Dwivedi wrote:
> The kfunc code to handle KF_ARG_PTR_TO_CALLBACK does not check the reg
> type before using reg->subprogno. This can accidently permit invalid
> pointers from being passed into callback helpers (e.g. silently from
> different paths). Likewise, reg->subprogno from the per-register type
> union may not be meaningful either. We need to reject any other type
> except PTR_TO_FUNC.
> 
> Cc: Dave Marchevsky <davemarchevsky@fb.com>
> Fixes: 5d92ddc3de1b ("bpf: Add callback validation to kfunc verifier logic")
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Acked-by: Dave Marchevsky <davemarchevsky@fb.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 10/14] bpf: Disallow extensions to exception callbacks
  2023-08-09 11:41 ` [PATCH bpf-next v2 10/14] bpf: Disallow extensions to exception callbacks Kumar Kartikeya Dwivedi
@ 2023-08-22  5:09   ` Alexei Starovoitov
  2023-08-22 12:53     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2023-08-22  5:09 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

On Wed, Aug 09, 2023 at 05:11:12PM +0530, Kumar Kartikeya Dwivedi wrote:
> During testing, it was discovered that extensions to exception callbacks
> had no checks, upon running a testcase, the kernel ended up running off
> the end of a program having final call as bpf_throw, and hitting int3
> instructions.
> 
> The reason is that while the default exception callback would have reset
> the stack frame to return back to the main program's caller, the
> replacing extension program will simply return back to bpf_throw, which
> will instead return back to the program and the program will continue
> execution, now in an undefined state where anything could happen.
> 
> The way to support extensions to an exception callback would be to mark
> the BPF_PROG_TYPE_EXT main subprog as an exception_cb, and prevent it
> from calling bpf_throw. This would make the JIT produce a prologue that
> restores saved registers and reset the stack frame. But let's not do
> that until there is a concrete use case for this, and simply disallow
> this for now.
> 
> One key point here to note is that currently X86_TAIL_CALL_OFFSET didn't
> require any modifications, even though we emit instructions before the
> corresponding endbr64 instruction. This is because we ensure that a main
> subprog never serves as an exception callback, and therefore the
> exception callback (which will be a global subprog) can never serve as
> the tail call target, eliminating any discrepancies. However, once we
> support a BPF_PROG_TYPE_EXT to also act as an exception callback, it
> will end up requiring change to the tail call offset to account for the
> extra instructions. For simplicitly, tail calls could be disabled for
> such targets.
> 
> Noting the above, it appears better to wait for a concrete use case
> before choosing to permit extension programs to replace exception
> callbacks.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  kernel/bpf/helpers.c  | 1 +
>  kernel/bpf/verifier.c | 5 +++++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 64a07232c58f..a04eff53354c 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2470,6 +2470,7 @@ __bpf_kfunc void bpf_throw(u64 cookie)
>  	 */
>  	kasan_unpoison_task_stack_below((void *)ctx.sp);
>  	ctx.aux->bpf_exception_cb(cookie, ctx.sp, ctx.bp);
> +	WARN(1, "A call to BPF exception callback should never return\n");
>  }
>  
>  __diag_pop();
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a0e1a1d1f5d3..13db1fa4163c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -19622,6 +19622,11 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>  					"Extension programs should be JITed\n");
>  				return -EINVAL;
>  			}
> +			if (aux->func && aux->func[subprog]->aux->exception_cb) {
> +				bpf_log(log,
> +					"Extension programs cannot replace exception callback\n");
> +				return -EINVAL;

Should we disallow fentry/fexit to exception cb as well?
Probably things will go wrong for similar reasons as freplace.

And also disallow fentry/fexit for main prog that is exception_boundary ?
since bpf trampoline doesn't know that it needs to save r12.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 03/14] bpf: Implement BPF exceptions
  2023-08-09 11:41 ` [PATCH bpf-next v2 03/14] bpf: Implement BPF exceptions Kumar Kartikeya Dwivedi
@ 2023-08-22  5:12   ` Alexei Starovoitov
  2023-08-22 12:53     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2023-08-22  5:12 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

On Wed, Aug 09, 2023 at 05:11:05PM +0530, Kumar Kartikeya Dwivedi wrote:
> +
> +static bool bpf_stack_walker(void *cookie, u64 ip, u64 sp, u64 bp)
> +{
> +	struct bpf_throw_ctx *ctx = cookie;
> +	struct bpf_prog *prog;
> +
> +	if (!is_bpf_text_address(ip))
> +		return !ctx->cnt;
> +	prog = bpf_prog_ksym_find(ip);
> +	ctx->cnt++;
> +	if (!prog->aux->id)
> +		return true;
> +	ctx->aux = prog->aux;
> +	ctx->sp = sp;
> +	ctx->bp = bp;
> +	return false;
> +}

Took me some time to understand what !prog->aux->id is doing.
Let's add a helper: is_subprog() and check:
prog->aux->func_idx != 0
since that's what arm64, x64, s390 JITs are using.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 10/14] bpf: Disallow extensions to exception callbacks
  2023-08-22  5:09   ` Alexei Starovoitov
@ 2023-08-22 12:53     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-22 12:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

On Tue, 22 Aug 2023 at 10:40, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Aug 09, 2023 at 05:11:12PM +0530, Kumar Kartikeya Dwivedi wrote:
> > During testing, it was discovered that extensions to exception callbacks
> > had no checks, upon running a testcase, the kernel ended up running off
> > the end of a program having final call as bpf_throw, and hitting int3
> > instructions.
> >
> > The reason is that while the default exception callback would have reset
> > the stack frame to return back to the main program's caller, the
> > replacing extension program will simply return back to bpf_throw, which
> > will instead return back to the program and the program will continue
> > execution, now in an undefined state where anything could happen.
> >
> > The way to support extensions to an exception callback would be to mark
> > the BPF_PROG_TYPE_EXT main subprog as an exception_cb, and prevent it
> > from calling bpf_throw. This would make the JIT produce a prologue that
> > restores saved registers and reset the stack frame. But let's not do
> > that until there is a concrete use case for this, and simply disallow
> > this for now.
> >
> > One key point here to note is that currently X86_TAIL_CALL_OFFSET didn't
> > require any modifications, even though we emit instructions before the
> > corresponding endbr64 instruction. This is because we ensure that a main
> > subprog never serves as an exception callback, and therefore the
> > exception callback (which will be a global subprog) can never serve as
> > the tail call target, eliminating any discrepancies. However, once we
> > support a BPF_PROG_TYPE_EXT to also act as an exception callback, it
> > will end up requiring change to the tail call offset to account for the
> > extra instructions. For simplicitly, tail calls could be disabled for
> > such targets.
> >
> > Noting the above, it appears better to wait for a concrete use case
> > before choosing to permit extension programs to replace exception
> > callbacks.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  kernel/bpf/helpers.c  | 1 +
> >  kernel/bpf/verifier.c | 5 +++++
> >  2 files changed, 6 insertions(+)
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 64a07232c58f..a04eff53354c 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -2470,6 +2470,7 @@ __bpf_kfunc void bpf_throw(u64 cookie)
> >        */
> >       kasan_unpoison_task_stack_below((void *)ctx.sp);
> >       ctx.aux->bpf_exception_cb(cookie, ctx.sp, ctx.bp);
> > +     WARN(1, "A call to BPF exception callback should never return\n");
> >  }
> >
> >  __diag_pop();
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index a0e1a1d1f5d3..13db1fa4163c 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -19622,6 +19622,11 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
> >                                       "Extension programs should be JITed\n");
> >                               return -EINVAL;
> >                       }
> > +                     if (aux->func && aux->func[subprog]->aux->exception_cb) {
> > +                             bpf_log(log,
> > +                                     "Extension programs cannot replace exception callback\n");
> > +                             return -EINVAL;
>
> Should we disallow fentry/fexit to exception cb as well?
> Probably things will go wrong for similar reasons as freplace.
>

Yes, great catch. I think you are right. I will disable both of them as well.
Trampoline does not expect the stack frame to be reset as it pushes
data to it and will need to restore it after the call to exception cb.

> And also disallow fentry/fexit for main prog that is exception_boundary ?
> since bpf trampoline doesn't know that it needs to save r12.

Hmm, I think I should probably enable that instead of blocking it. I
think it's a common enough use case. Compared to exception cb it also
seems a valid one. We can enable pushing of r12 for such generate
trampolines, IIUC it's not a lot of complexity and we will know if we
are attaching to exception boundary prog.

In any case, I will add more selftests to ensure these cases are
handled/rejected properly in v3, thanks!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 03/14] bpf: Implement BPF exceptions
  2023-08-22  5:12   ` Alexei Starovoitov
@ 2023-08-22 12:53     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-22 12:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

On Tue, 22 Aug 2023 at 10:43, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Aug 09, 2023 at 05:11:05PM +0530, Kumar Kartikeya Dwivedi wrote:
> > +
> > +static bool bpf_stack_walker(void *cookie, u64 ip, u64 sp, u64 bp)
> > +{
> > +     struct bpf_throw_ctx *ctx = cookie;
> > +     struct bpf_prog *prog;
> > +
> > +     if (!is_bpf_text_address(ip))
> > +             return !ctx->cnt;
> > +     prog = bpf_prog_ksym_find(ip);
> > +     ctx->cnt++;
> > +     if (!prog->aux->id)
> > +             return true;
> > +     ctx->aux = prog->aux;
> > +     ctx->sp = sp;
> > +     ctx->bp = bp;
> > +     return false;
> > +}
>
> Took me some time to understand what !prog->aux->id is doing.
> Let's add a helper: is_subprog() and check:
> prog->aux->func_idx != 0
> since that's what arm64, x64, s390 JITs are using.

Ack, I will fix it.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 08/14] bpf: Prevent KASAN false positive with bpf_throw
  2023-08-09 11:41 ` [PATCH bpf-next v2 08/14] bpf: Prevent KASAN false positive with bpf_throw Kumar Kartikeya Dwivedi
@ 2023-08-22 16:23   ` Alexei Starovoitov
  2023-08-30 16:53   ` Andrey Konovalov
  1 sibling, 0 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2023-08-22 16:23 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song,
	David Vernet

Andrey, Dmitry,

Please help review this patch.


On Wed, Aug 9, 2023 at 4:43 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> The KASAN stack instrumentation when CONFIG_KASAN_STACK is true poisons
> the stack of a function when it is entered and unpoisons it when
> leaving. However, in the case of bpf_throw, we will never return as we
> switch our stack frame to the BPF exception callback. Later, this
> discrepancy will lead to confusing KASAN splats when kernel resumes
> execution on return from the BPF program.
>
> Fix this by unpoisoning everything below the stack pointer of the BPF
> program, which should cover the range that would not be unpoisoned. An
> example splat is below:
>
> BUG: KASAN: stack-out-of-bounds in stack_trace_consume_entry+0x14e/0x170
> Write of size 8 at addr ffffc900013af958 by task test_progs/227
>
> CPU: 0 PID: 227 Comm: test_progs Not tainted 6.5.0-rc2-g43f1c6c9052a-dirty #26
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-2.fc39 04/01/2014
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x4a/0x80
>  print_report+0xcf/0x670
>  ? arch_stack_walk+0x79/0x100
>  kasan_report+0xda/0x110
>  ? stack_trace_consume_entry+0x14e/0x170
>  ? stack_trace_consume_entry+0x14e/0x170
>  ? __pfx_stack_trace_consume_entry+0x10/0x10
>  stack_trace_consume_entry+0x14e/0x170
>  ? __sys_bpf+0xf2e/0x41b0
>  arch_stack_walk+0x8b/0x100
>  ? __sys_bpf+0xf2e/0x41b0
>  ? bpf_prog_test_run_skb+0x341/0x1c70
>  ? bpf_prog_test_run_skb+0x341/0x1c70
>  stack_trace_save+0x9b/0xd0
>  ? __pfx_stack_trace_save+0x10/0x10
>  ? __kasan_slab_free+0x109/0x180
>  ? bpf_prog_test_run_skb+0x341/0x1c70
>  ? __sys_bpf+0xf2e/0x41b0
>  ? __x64_sys_bpf+0x78/0xc0
>  ? do_syscall_64+0x3c/0x90
>  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>  kasan_save_stack+0x33/0x60
>  ? kasan_save_stack+0x33/0x60
>  ? kasan_set_track+0x25/0x30
>  ? kasan_save_free_info+0x2b/0x50
>  ? __kasan_slab_free+0x109/0x180
>  ? kmem_cache_free+0x191/0x460
>  ? bpf_prog_test_run_skb+0x341/0x1c70
>  kasan_set_track+0x25/0x30
>  kasan_save_free_info+0x2b/0x50
>  __kasan_slab_free+0x109/0x180
>  kmem_cache_free+0x191/0x460
>  bpf_prog_test_run_skb+0x341/0x1c70
>  ? __pfx_bpf_prog_test_run_skb+0x10/0x10
>  ? __fget_light+0x51/0x220
>  __sys_bpf+0xf2e/0x41b0
>  ? __might_fault+0xa2/0x170
>  ? __pfx___sys_bpf+0x10/0x10
>  ? lock_release+0x1de/0x620
>  ? __might_fault+0xcd/0x170
>  ? __pfx_lock_release+0x10/0x10
>  ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
>  __x64_sys_bpf+0x78/0xc0
>  ? syscall_enter_from_user_mode+0x20/0x50
>  do_syscall_64+0x3c/0x90
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> RIP: 0033:0x7f0fbb38880d
> Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d
> 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f3 45 12 00 f7 d8 64
> 89 01 48
> RSP: 002b:00007ffe13907de8 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
> RAX: ffffffffffffffda RBX: 00007ffe13908708 RCX: 00007f0fbb38880d
> RDX: 0000000000000050 RSI: 00007ffe13907e20 RDI: 000000000000000a
> RBP: 00007ffe13907e00 R08: 0000000000000000 R09: 00007ffe13907e20
> R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000003
> R13: 0000000000000000 R14: 00007f0fbb532000 R15: 0000000000cfbd90
>  </TASK>
>
> The buggy address belongs to stack of task test_progs/227
> KASAN internal error: frame info validation failed; invalid marker: 0
>
> The buggy address belongs to the virtual mapping at
>  [ffffc900013a8000, ffffc900013b1000) created by:
>  kernel_clone+0xcd/0x600
>
> The buggy address belongs to the physical page:
> page:00000000b70f4332 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11418f
> flags: 0x2fffe0000000000(node=0|zone=2|lastcpupid=0x7fff)
> page_type: 0xffffffff()
> raw: 02fffe0000000000 0000000000000000 dead000000000122 0000000000000000
> raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>  ffffc900013af800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  ffffc900013af880: 00 00 00 f1 f1 f1 f1 00 00 00 f3 f3 f3 f3 f3 00
> >ffffc900013af900: 00 00 00 00 00 00 00 00 00 00 00 f1 00 00 00 00
>                                                     ^
>  ffffc900013af980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  ffffc900013afa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ==================================================================
> Disabling lock debugging due to kernel taint
>
> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: Andrey Konovalov <andreyknvl@gmail.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/kasan.h | 2 ++
>  kernel/bpf/helpers.c  | 6 ++++++
>  2 files changed, 8 insertions(+)
>
> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
> index 819b6bc8ac08..7a463f814db2 100644
> --- a/include/linux/kasan.h
> +++ b/include/linux/kasan.h
> @@ -283,8 +283,10 @@ static inline bool kasan_check_byte(const void *address)
>
>  #if defined(CONFIG_KASAN) && defined(CONFIG_KASAN_STACK)
>  void kasan_unpoison_task_stack(struct task_struct *task);
> +asmlinkage void kasan_unpoison_task_stack_below(const void *watermark);
>  #else
>  static inline void kasan_unpoison_task_stack(struct task_struct *task) {}
> +static inline void kasan_unpoison_task_stack_below(const void *watermark) {}
>  #endif
>
>  #ifdef CONFIG_KASAN_GENERIC
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index af4add1e3a31..64a07232c58f 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -22,6 +22,7 @@
>  #include <linux/security.h>
>  #include <linux/btf_ids.h>
>  #include <linux/bpf_mem_alloc.h>
> +#include <linux/kasan.h>
>
>  #include "../../lib/kstrtox.h"
>
> @@ -2463,6 +2464,11 @@ __bpf_kfunc void bpf_throw(u64 cookie)
>                 WARN_ON_ONCE(!ctx.aux->exception_boundary);
>         WARN_ON_ONCE(!ctx.bp);
>         WARN_ON_ONCE(!ctx.cnt);
> +       /* Prevent KASAN false positives for CONFIG_KASAN_STACK by unpoisoning
> +        * deeper stack depths than ctx.sp as we do not return from bpf_throw,
> +        * which skips compiler generated instrumentation to do the same.
> +        */
> +       kasan_unpoison_task_stack_below((void *)ctx.sp);
>         ctx.aux->bpf_exception_cb(cookie, ctx.sp, ctx.bp);
>  }
>
> --
> 2.41.0
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 12/14] libbpf: Add support for custom exception callbacks
  2023-08-09 11:41 ` [PATCH bpf-next v2 12/14] libbpf: Add support for custom exception callbacks Kumar Kartikeya Dwivedi
@ 2023-08-22 16:34   ` Alexei Starovoitov
  2023-08-22 16:58     ` Kumar Kartikeya Dwivedi
  2023-08-25 18:43   ` Andrii Nakryiko
  1 sibling, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2023-08-22 16:34 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

On Wed, Aug 9, 2023 at 4:44 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> Add support to libbpf to append exception callbacks when loading a
> program. The exception callback is found by discovering the declaration
> tag 'exception_callback:<value>' and finding the callback in the value
> of the tag.

...

> +       /* After adding all programs, now pair them with their exception
> +        * callbacks if specified.
> +        */
> +       if (!kernel_supports(obj, FEAT_BTF_DECL_TAG))
> +               goto out;
> +out:

The above looks odd. Accidental leftover?

>         return 0;
>  }
>
> @@ -3137,6 +3148,80 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj)
>                 }
>         }
>
> +       if (!kernel_supports(obj, FEAT_BTF_DECL_TAG))
> +               goto skip_exception_cb;
> +       for (i = 0; i < obj->nr_programs; i++) {
> +               struct bpf_program *prog = &obj->programs[i];
> +               int j, k, n;
> +
> +               if (prog_is_subprog(obj, prog))
> +                       continue;
> +               n = btf__type_cnt(obj->btf);
> +               for (j = 1; j < n; j++) {
> +                       const char *str = "exception_callback:", *name;

On the first read of this patch and corresponding kernel support
I started doubting my earlier suggestion to use decl_tag and
reconsidered going back to fake bpf_register_except_cb() call,
but after sleeping on it I think it is a useful extension for both
kernel and libbpf to support such tagging.
We might specify ctors and dtors with decl_tag in the future
and other various callbacks that are never explicitly referenced
in bpf_call, ld_imm64 or other bpf insns.
So having libbpf and kernel support such tagging will help in the long run.
It's not going to be limited to exceptions.
Despite the extra complexity this is a good step forward.

> +static int
> +bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
> +                               struct bpf_program *subprog)
> +{

Please split this refactoring into a separate patch for ease of review.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 12/14] libbpf: Add support for custom exception callbacks
  2023-08-22 16:34   ` Alexei Starovoitov
@ 2023-08-22 16:58     ` Kumar Kartikeya Dwivedi
  2023-08-22 19:20       ` Alexei Starovoitov
  0 siblings, 1 reply; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-22 16:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

On Tue, 22 Aug 2023 at 22:05, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Aug 9, 2023 at 4:44 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > Add support to libbpf to append exception callbacks when loading a
> > program. The exception callback is found by discovering the declaration
> > tag 'exception_callback:<value>' and finding the callback in the value
> > of the tag.
>
> ...
>
> > +       /* After adding all programs, now pair them with their exception
> > +        * callbacks if specified.
> > +        */
> > +       if (!kernel_supports(obj, FEAT_BTF_DECL_TAG))
> > +               goto out;
> > +out:
>
> The above looks odd. Accidental leftover?
>

Oops, yes. I have dropped it now.

> >         return 0;
> >  }
> >
> > @@ -3137,6 +3148,80 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj)
> >                 }
> >         }
> >
> > +       if (!kernel_supports(obj, FEAT_BTF_DECL_TAG))
> > +               goto skip_exception_cb;
> > +       for (i = 0; i < obj->nr_programs; i++) {
> > +               struct bpf_program *prog = &obj->programs[i];
> > +               int j, k, n;
> > +
> > +               if (prog_is_subprog(obj, prog))
> > +                       continue;
> > +               n = btf__type_cnt(obj->btf);
> > +               for (j = 1; j < n; j++) {
> > +                       const char *str = "exception_callback:", *name;
>
> On the first read of this patch and corresponding kernel support
> I started doubting my earlier suggestion to use decl_tag and
> reconsidered going back to fake bpf_register_except_cb() call,

This is exactly what I thought when I realised it's not that simple
when implementing it :).

> but after sleeping on it I think it is a useful extension for both
> kernel and libbpf to support such tagging.
> We might specify ctors and dtors with decl_tag in the future
> and other various callbacks that are never explicitly referenced
> in bpf_call, ld_imm64 or other bpf insns.
> So having libbpf and kernel support such tagging will help in the long run.
> It's not going to be limited to exceptions.
> Despite the extra complexity this is a good step forward.
>

I agree. This same code can also be reused to establish an edge from
one BTF type to some other BTF type (by name).
function -> exception_cb. struct -> ctor, struct -> dtor etc.

I did have some questions though.
First of all this is explicitly an unstable feature. How do we feel
about putting related support for it in libbpf and making breaking
changes later?
Will the expectation be to pair the libbpf with its corresponding
kernel release to use such features? Or do we have to make the changes
in a backwards compatible fashion?

Secondly, due to proliferation of BTF tag usage, do you think it's
time we reserve a namespace for all tags that would be recognized by
the verifier? E.g. require all of them to be prefixed with "bpf."
similar to "llvm." etc.? Since they are simply attributes for a
specific BTF type (or component of a type).
It may be too late for some BTF tags, but we could do better going forward.

It may also allow us to indicate that a tag is experimental until its
effect on the program becomes stabilized. E.g.
bpf.experimental.exception_callback instead of exception_callback? Or
do you feel it's unnecessary?

> > +static int
> > +bpf_object__append_subprog_code(struct bpf_object *obj, struct bpf_program *main_prog,
> > +                               struct bpf_program *subprog)
> > +{
>
> Please split this refactoring into a separate patch for ease of review.

Ack, will do.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 12/14] libbpf: Add support for custom exception callbacks
  2023-08-22 16:58     ` Kumar Kartikeya Dwivedi
@ 2023-08-22 19:20       ` Alexei Starovoitov
  0 siblings, 0 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2023-08-22 19:20 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

On Tue, Aug 22, 2023 at 9:58 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Tue, 22 Aug 2023 at 22:05, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Aug 9, 2023 at 4:44 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > >
> > > Add support to libbpf to append exception callbacks when loading a
> > > program. The exception callback is found by discovering the declaration
> > > tag 'exception_callback:<value>' and finding the callback in the value
> > > of the tag.
> >
> > ...
> >
> > > +       /* After adding all programs, now pair them with their exception
> > > +        * callbacks if specified.
> > > +        */
> > > +       if (!kernel_supports(obj, FEAT_BTF_DECL_TAG))
> > > +               goto out;
> > > +out:
> >
> > The above looks odd. Accidental leftover?
> >
>
> Oops, yes. I have dropped it now.
>
> > >         return 0;
> > >  }
> > >
> > > @@ -3137,6 +3148,80 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj)
> > >                 }
> > >         }
> > >
> > > +       if (!kernel_supports(obj, FEAT_BTF_DECL_TAG))
> > > +               goto skip_exception_cb;
> > > +       for (i = 0; i < obj->nr_programs; i++) {
> > > +               struct bpf_program *prog = &obj->programs[i];
> > > +               int j, k, n;
> > > +
> > > +               if (prog_is_subprog(obj, prog))
> > > +                       continue;
> > > +               n = btf__type_cnt(obj->btf);
> > > +               for (j = 1; j < n; j++) {
> > > +                       const char *str = "exception_callback:", *name;
> >
> > On the first read of this patch and corresponding kernel support
> > I started doubting my earlier suggestion to use decl_tag and
> > reconsidered going back to fake bpf_register_except_cb() call,
>
> This is exactly what I thought when I realised it's not that simple
> when implementing it :).
>
> > but after sleeping on it I think it is a useful extension for both
> > kernel and libbpf to support such tagging.
> > We might specify ctors and dtors with decl_tag in the future
> > and other various callbacks that are never explicitly referenced
> > in bpf_call, ld_imm64 or other bpf insns.
> > So having libbpf and kernel support such tagging will help in the long run.
> > It's not going to be limited to exceptions.
> > Despite the extra complexity this is a good step forward.
> >
>
> I agree. This same code can also be reused to establish an edge from
> one BTF type to some other BTF type (by name).
> function -> exception_cb. struct -> ctor, struct -> dtor etc.
>
> I did have some questions though.
> First of all this is explicitly an unstable feature. How do we feel
> about putting related support for it in libbpf and making breaking
> changes later?
> Will the expectation be to pair the libbpf with its corresponding
> kernel release to use such features? Or do we have to make the changes
> in a backwards compatible fashion?

We should always do backwards compatible changes in both kernel and libbpf,
but sooner or later we might hit an issue where we would have to break things.
At that time the special prefix won't save us, so...

>
> Secondly, due to proliferation of BTF tag usage, do you think it's
> time we reserve a namespace for all tags that would be recognized by
> the verifier? E.g. require all of them to be prefixed with "bpf."
> similar to "llvm." etc.? Since they are simply attributes for a
> specific BTF type (or component of a type).
> It may be too late for some BTF tags, but we could do better going forward.
>
> It may also allow us to indicate that a tag is experimental until its
> effect on the program becomes stabilized. E.g.
> bpf.experimental.exception_callback instead of exception_callback? Or
> do you feel it's unnecessary?

... I don't think any of that is necessary.
Whether btf tag is prefixed with "exception_callback:" or
"bpf.experimental.debug.exception_callback:" we will be doing the same thing.
We'll keep backward compat if trade-offs allow.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 00/14] Exceptions - 1/2
  2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (13 preceding siblings ...)
  2023-08-09 11:41 ` [PATCH bpf-next v2 14/14] selftests/bpf: Add tests for BPF exceptions Kumar Kartikeya Dwivedi
@ 2023-08-22 21:22 ` Kumar Kartikeya Dwivedi
  2023-08-22 22:07   ` Jose E. Marchesi
  14 siblings, 1 reply; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-22 21:22 UTC (permalink / raw)
  To: bpf, Yonghong Song
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, David Vernet

On Wed, 9 Aug 2023 at 17:11, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> [...]
>
> Known issues
> ------------
>
>  * Just asm volatile ("call bpf_throw" :::) does not emit DATASEC .ksyms
>    for bpf_throw, there needs to be explicit call in C for clang to emit
>    the DATASEC info in BTF, leading to errors during compilation.
>

Hi Yonghong, I'd like to ask you about this issue to figure out
whether this is something worth fixing in clang or not.
It pops up in programs which only use bpf_assert macros (which emit
the call to bpf_throw using inline assembly) and not bpf_throw kfunc
directly.

I believe in case we emit a call bpf_throw instruction, the BPF
backend code will not see any DWARF debug info for the respective
symbol, so it will also not be able to convert it and emit anything to
.BTF section in case no direct call without asm volatile is present.
Therefore my guess is that this isn't something that can be fixed in clang/LLVM.

There are also options like the one below to work around it.
if ((volatile int){0}) bpf_throw();
asm volatile ("call bpf_throw");

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 00/14] Exceptions - 1/2
  2023-08-22 21:22 ` [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
@ 2023-08-22 22:07   ` Jose E. Marchesi
  2023-08-22 22:39     ` Yonghong Song
  2023-08-22 22:54     ` Kumar Kartikeya Dwivedi
  0 siblings, 2 replies; 37+ messages in thread
From: Jose E. Marchesi @ 2023-08-22 22:07 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Yonghong Song, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, David Vernet


> On Wed, 9 Aug 2023 at 17:11, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>>
>> [...]
>>
>> Known issues
>> ------------
>>
>>  * Just asm volatile ("call bpf_throw" :::) does not emit DATASEC .ksyms
>>    for bpf_throw, there needs to be explicit call in C for clang to emit
>>    the DATASEC info in BTF, leading to errors during compilation.
>>
>
> Hi Yonghong, I'd like to ask you about this issue to figure out
> whether this is something worth fixing in clang or not.
> It pops up in programs which only use bpf_assert macros (which emit
> the call to bpf_throw using inline assembly) and not bpf_throw kfunc
> directly.
>
> I believe in case we emit a call bpf_throw instruction, the BPF
> backend code will not see any DWARF debug info for the respective
> symbol, so it will also not be able to convert it and emit anything to
> .BTF section in case no direct call without asm volatile is present.
> Therefore my guess is that this isn't something that can be fixed in
> clang/LLVM.

Besides, please keep in mind that GCC doens't have an integrated
assembler, and therefore relying on clang's understanding on the
instructions in inline assembly is something to avoid.

> There are also options like the one below to work around it.
> if ((volatile int){0}) bpf_throw();
> asm volatile ("call bpf_throw");

I can confirm the above results in a BTF entry for bpf_throw with
bpf-unknown-none-gcc -gbtf.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 00/14] Exceptions - 1/2
  2023-08-22 22:07   ` Jose E. Marchesi
@ 2023-08-22 22:39     ` Yonghong Song
  2023-08-22 22:53       ` Kumar Kartikeya Dwivedi
  2023-08-22 22:54     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 37+ messages in thread
From: Yonghong Song @ 2023-08-22 22:39 UTC (permalink / raw)
  To: Jose E. Marchesi, Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, David Vernet



On 8/22/23 3:07 PM, Jose E. Marchesi wrote:
> 
>> On Wed, 9 Aug 2023 at 17:11, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>>>
>>> [...]
>>>
>>> Known issues
>>> ------------
>>>
>>>   * Just asm volatile ("call bpf_throw" :::) does not emit DATASEC .ksyms
>>>     for bpf_throw, there needs to be explicit call in C for clang to emit
>>>     the DATASEC info in BTF, leading to errors during compilation.
>>>
>>
>> Hi Yonghong, I'd like to ask you about this issue to figure out
>> whether this is something worth fixing in clang or not.
>> It pops up in programs which only use bpf_assert macros (which emit
>> the call to bpf_throw using inline assembly) and not bpf_throw kfunc
>> directly.
>>
>> I believe in case we emit a call bpf_throw instruction, the BPF
>> backend code will not see any DWARF debug info for the respective
>> symbol, so it will also not be able to convert it and emit anything to
>> .BTF section in case no direct call without asm volatile is present.
>> Therefore my guess is that this isn't something that can be fixed in
>> clang/LLVM.
> 
> Besides, please keep in mind that GCC doens't have an integrated
> assembler, and therefore relying on clang's understanding on the
> instructions in inline assembly is something to avoid.
> 
>> There are also options like the one below to work around it.
>> if ((volatile int){0}) bpf_throw();
>> asm volatile ("call bpf_throw");
> 
> I can confirm the above results in a BTF entry for bpf_throw with
> bpf-unknown-none-gcc -gbtf.

Kumar, you are correct.
For clang, symbols inside 'asm volatile' statement or generally
inside any asm code (e.g., kernel .s files) won't generate an entry
in dwarf. The
   if ((volatile int){0}) bpf_throw();
will force a dwarf, hence btf, entry.

The unfortunately thing is the above code will generate redundant code
like
   0000000000000000 <foo>:
        0:       b7 01 00 00 00 00 00 00 r1 = 0x0
        1:       63 1a fc ff 00 00 00 00 *(u32 *)(r10 - 0x4) = r1
        2:       61 a1 fc ff 00 00 00 00 r1 = *(u32 *)(r10 - 0x4)
        3:       15 01 01 00 00 00 00 00 if r1 == 0x0 goto +0x1 <LBB0_2>
        4:       85 10 00 00 ff ff ff ff call -0x1

0000000000000028 <LBB0_2>:
        5:       85 10 00 00 ff ff ff ff call -0x1
        6:       b7 00 00 00 00 00 00 00 r0 = 0x0
        7:       95 00 00 00 00 00 00 00 exit

I am curious why in bpf_assert macro bpf_throw() kfunc cannot
be used?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 00/14] Exceptions - 1/2
  2023-08-22 22:39     ` Yonghong Song
@ 2023-08-22 22:53       ` Kumar Kartikeya Dwivedi
  2023-08-22 23:06         ` Alexei Starovoitov
  0 siblings, 1 reply; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-22 22:53 UTC (permalink / raw)
  To: yonghong.song
  Cc: Jose E. Marchesi, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, David Vernet

On Wed, 23 Aug 2023 at 04:09, Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 8/22/23 3:07 PM, Jose E. Marchesi wrote:
> >
> >> On Wed, 9 Aug 2023 at 17:11, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >>>
> >>> [...]
> >>>
> >>> Known issues
> >>> ------------
> >>>
> >>>   * Just asm volatile ("call bpf_throw" :::) does not emit DATASEC .ksyms
> >>>     for bpf_throw, there needs to be explicit call in C for clang to emit
> >>>     the DATASEC info in BTF, leading to errors during compilation.
> >>>
> >>
> >> Hi Yonghong, I'd like to ask you about this issue to figure out
> >> whether this is something worth fixing in clang or not.
> >> It pops up in programs which only use bpf_assert macros (which emit
> >> the call to bpf_throw using inline assembly) and not bpf_throw kfunc
> >> directly.
> >>
> >> I believe in case we emit a call bpf_throw instruction, the BPF
> >> backend code will not see any DWARF debug info for the respective
> >> symbol, so it will also not be able to convert it and emit anything to
> >> .BTF section in case no direct call without asm volatile is present.
> >> Therefore my guess is that this isn't something that can be fixed in
> >> clang/LLVM.
> >
> > Besides, please keep in mind that GCC doens't have an integrated
> > assembler, and therefore relying on clang's understanding on the
> > instructions in inline assembly is something to avoid.
> >
> >> There are also options like the one below to work around it.
> >> if ((volatile int){0}) bpf_throw();
> >> asm volatile ("call bpf_throw");
> >
> > I can confirm the above results in a BTF entry for bpf_throw with
> > bpf-unknown-none-gcc -gbtf.
>
> Kumar, you are correct.
> For clang, symbols inside 'asm volatile' statement or generally
> inside any asm code (e.g., kernel .s files) won't generate an entry
> in dwarf. The
>    if ((volatile int){0}) bpf_throw();
> will force a dwarf, hence btf, entry.
>
> The unfortunately thing is the above code will generate redundant code
> like
>    0000000000000000 <foo>:
>         0:       b7 01 00 00 00 00 00 00 r1 = 0x0
>         1:       63 1a fc ff 00 00 00 00 *(u32 *)(r10 - 0x4) = r1
>         2:       61 a1 fc ff 00 00 00 00 r1 = *(u32 *)(r10 - 0x4)
>         3:       15 01 01 00 00 00 00 00 if r1 == 0x0 goto +0x1 <LBB0_2>
>         4:       85 10 00 00 ff ff ff ff call -0x1
>
> 0000000000000028 <LBB0_2>:
>         5:       85 10 00 00 ff ff ff ff call -0x1
>         6:       b7 00 00 00 00 00 00 00 r0 = 0x0
>         7:       95 00 00 00 00 00 00 00 exit
>

Yes, I am relying on the verifier to eliminate dead code later, but it
is obviously a hack.

> I am curious why in bpf_assert macro bpf_throw() kfunc cannot
> be used?

The reason was to force the compiler to emit a specific branch for the
assertion check without being influenced by compiler optimizations,
and tying comparison to the register holding a value being tested in
assertion. Secondly we also enforce that the first argument is in a
register and the second a constant, so as to apply the verifier bounds
gained after comparison op to the original register.

I am aware (though correct me if this is wrong) that the compiler can
select a different register for the input operand of the asm
constraint, but find_equal_id_scalars should still do correct
propagation. This is partly why I disabled usage of this macro for
variable widths < 64-bit, because then the compiler sometimes performs
shifts etc. for integer promotion implicitly, sometimes destroying
whatever information we gained through an assertion check. Depending
on the signedness of the variable, we emit the signed/unsigned
comparison.

I suppose we could switch to the ' if (!(LHS <op> RHS)) bpf_throw(); '
sequence in C, force volatile load for LHS and __builtin_constant_p
for RHS to get the same behavior. Emitting these redundant checks is
definitely a bit weird just to emit BTF.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 00/14] Exceptions - 1/2
  2023-08-22 22:07   ` Jose E. Marchesi
  2023-08-22 22:39     ` Yonghong Song
@ 2023-08-22 22:54     ` Kumar Kartikeya Dwivedi
  1 sibling, 0 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-22 22:54 UTC (permalink / raw)
  To: Jose E. Marchesi
  Cc: bpf, Yonghong Song, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, David Vernet

On Wed, 23 Aug 2023 at 03:37, Jose E. Marchesi <jose.marchesi@oracle.com> wrote:
>
>
> > On Wed, 9 Aug 2023 at 17:11, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >>
> >> [...]
> >>
> >> Known issues
> >> ------------
> >>
> >>  * Just asm volatile ("call bpf_throw" :::) does not emit DATASEC .ksyms
> >>    for bpf_throw, there needs to be explicit call in C for clang to emit
> >>    the DATASEC info in BTF, leading to errors during compilation.
> >>
> >
> > Hi Yonghong, I'd like to ask you about this issue to figure out
> > whether this is something worth fixing in clang or not.
> > It pops up in programs which only use bpf_assert macros (which emit
> > the call to bpf_throw using inline assembly) and not bpf_throw kfunc
> > directly.
> >
> > I believe in case we emit a call bpf_throw instruction, the BPF
> > backend code will not see any DWARF debug info for the respective
> > symbol, so it will also not be able to convert it and emit anything to
> > .BTF section in case no direct call without asm volatile is present.
> > Therefore my guess is that this isn't something that can be fixed in
> > clang/LLVM.
>
> Besides, please keep in mind that GCC doens't have an integrated
> assembler, and therefore relying on clang's understanding on the
> instructions in inline assembly is something to avoid.
>

Thank you for reminding me that. I will be more careful about this.
We certainly cannot rely on clang-specific behavior for this.

> > There are also options like the one below to work around it.
> > if ((volatile int){0}) bpf_throw();
> > asm volatile ("call bpf_throw");
>
> I can confirm the above results in a BTF entry for bpf_throw with
> bpf-unknown-none-gcc -gbtf.

Thanks for confirming.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 00/14] Exceptions - 1/2
  2023-08-22 22:53       ` Kumar Kartikeya Dwivedi
@ 2023-08-22 23:06         ` Alexei Starovoitov
  2023-08-25 18:55           ` Andrii Nakryiko
  0 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2023-08-22 23:06 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Yonghong Song, Jose E. Marchesi, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, David Vernet

On Tue, Aug 22, 2023 at 3:54 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
>
> I suppose we could switch to the ' if (!(LHS <op> RHS)) bpf_throw(); '
> sequence in C, force volatile load for LHS and __builtin_constant_p
> for RHS to get the same behavior. Emitting these redundant checks is
> definitely a bit weird just to emit BTF.

I guess we can try
#define bpf_assert(LHS, OP, RHS) if (!(LHS OP RHS)) bpf_throw();
with barrier_var(LHS) and __builtin_constant_p(RHS) and
keep things completely in C,
but there is no guarantee that the compiler will not convert == to !=,
swap lhs and rhs, etc.
Maybe we can have both asm and C style macros, then recommend C to start
and switch to asm if things are dodgy.
Feels like dangerous ambiguity.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 12/14] libbpf: Add support for custom exception callbacks
  2023-08-09 11:41 ` [PATCH bpf-next v2 12/14] libbpf: Add support for custom exception callbacks Kumar Kartikeya Dwivedi
  2023-08-22 16:34   ` Alexei Starovoitov
@ 2023-08-25 18:43   ` Andrii Nakryiko
  2023-08-26 22:41     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2023-08-25 18:43 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

On Wed, Aug 9, 2023 at 4:44 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> Add support to libbpf to append exception callbacks when loading a
> program. The exception callback is found by discovering the declaration
> tag 'exception_callback:<value>' and finding the callback in the value
> of the tag.
>
> The process is done in two steps. First, for each main program, the
> bpf_object__sanitize_and_load_btf function finds and marks its
> corresponding exception callback as defined by the declaration tag on
> it. Second, bpf_object__reloc_code is modified to append the indicated
> exception callback at the end of the instruction iteration (since
> exception callback will never be appended in that loop, as it is not
> directly referenced).
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Just two point before you send next version:

a) it seems like this appending of exception callback logically fits
bpf_object__relocate() step, where other subprogs are appended. Any
reason we can't do it there?

b) all the callbacks are static functions, right? Which means in the
case of static linking, we can have multiple subprogs with the same
name. So this whole look up by name thing doesn't guarantee unique
match. At the very least libbpf should check that the match is unique
and error out otherwise.

Ideally though, would be great if something like this would be
supported instead (but I know it's way more complex, Alexei already
mentioned that in person and on the list):

try (my_callback) {
    ... code that throws ...
}

try (my_other_callback) {
    ... some other code that throws ...
}


This try() macro can be implemented in a form similar to bpf_for() by
using fancy for() loop. It would look and feel way more like
try/catch.

>  tools/lib/bpf/libbpf.c | 166 +++++++++++++++++++++++++++++++++++------
>  1 file changed, 142 insertions(+), 24 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 17883f5a44b9..7c607bac8204 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -432,9 +432,11 @@ struct bpf_program {
>         int fd;
>         bool autoload;
>         bool autoattach;
> +       bool sym_global;
>         bool mark_btf_static;
>         enum bpf_prog_type type;
>         enum bpf_attach_type expected_attach_type;
> +       int exception_cb_idx;
>
>         int prog_ifindex;
>         __u32 attach_btf_obj_fd;

[...]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 00/14] Exceptions - 1/2
  2023-08-22 23:06         ` Alexei Starovoitov
@ 2023-08-25 18:55           ` Andrii Nakryiko
  2023-08-26 22:42             ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2023-08-25 18:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kumar Kartikeya Dwivedi, Yonghong Song, Jose E. Marchesi, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, David Vernet

On Tue, Aug 22, 2023 at 4:06 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 22, 2023 at 3:54 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> >
> > I suppose we could switch to the ' if (!(LHS <op> RHS)) bpf_throw(); '
> > sequence in C, force volatile load for LHS and __builtin_constant_p
> > for RHS to get the same behavior. Emitting these redundant checks is
> > definitely a bit weird just to emit BTF.
>
> I guess we can try
> #define bpf_assert(LHS, OP, RHS) if (!(LHS OP RHS)) bpf_throw();
> with barrier_var(LHS) and __builtin_constant_p(RHS) and
> keep things completely in C,
> but there is no guarantee that the compiler will not convert == to !=,
> swap lhs and rhs, etc.
> Maybe we can have both asm and C style macros, then recommend C to start
> and switch to asm if things are dodgy.
> Feels like dangerous ambiguity.

This seems similar to the issue I had with
__attribute__((cleanup(some_kfunc)))) not emitting BTF info for that
some_kfunc? See bpf_for_each(), seems like just adding
`(void)bpf_iter_##type##_destroy` makes Clang emit BTF info.

It would be nice to have this fixed for cleanup() attribute and asm,
of course. But this is a simple work around.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 12/14] libbpf: Add support for custom exception callbacks
  2023-08-25 18:43   ` Andrii Nakryiko
@ 2023-08-26 22:41     ` Kumar Kartikeya Dwivedi
  2023-08-27 22:27       ` Alexei Starovoitov
  0 siblings, 1 reply; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-26 22:41 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, David Vernet

On Sat, 26 Aug 2023 at 00:13, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Aug 9, 2023 at 4:44 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > Add support to libbpf to append exception callbacks when loading a
> > program. The exception callback is found by discovering the declaration
> > tag 'exception_callback:<value>' and finding the callback in the value
> > of the tag.
> >
> > The process is done in two steps. First, for each main program, the
> > bpf_object__sanitize_and_load_btf function finds and marks its
> > corresponding exception callback as defined by the declaration tag on
> > it. Second, bpf_object__reloc_code is modified to append the indicated
> > exception callback at the end of the instruction iteration (since
> > exception callback will never be appended in that loop, as it is not
> > directly referenced).
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> Just two point before you send next version:
>
> a) it seems like this appending of exception callback logically fits
> bpf_object__relocate() step, where other subprogs are appended. Any
> reason we can't do it there?
>

We should be able to do it there as well. But I felt it is better to
do it in bpf_object__reloc_code as the logic is similar to the
handling of bpf_pseudo_func/bpf_pseudo_call insns. And then we need to
recurse using bpf_object__reloc_code for exception cb again.

> b) all the callbacks are static functions, right? Which means in the
> case of static linking, we can have multiple subprogs with the same
> name. So this whole look up by name thing doesn't guarantee unique
> match. At the very least libbpf should check that the match is unique
> and error out otherwise.

Ack, will fix this in v3.

>
> Ideally though, would be great if something like this would be
> supported instead (but I know it's way more complex, Alexei already
> mentioned that in person and on the list):
>
> try (my_callback) {
>     ... code that throws ...
> }
>
> try (my_other_callback) {
>     ... some other code that throws ...
> }
>
>
> This try() macro can be implemented in a form similar to bpf_for() by
> using fancy for() loop. It would look and feel way more like
> try/catch.
>

These try blocks are easier than having a try/catch block which the
execution jumps to when the exception is thrown. I think the latter
will involve some form of compiler support, because otherwise there is
no control flow that is seen by the compiler into the catch block,
which will mess up things, and I plan to atleast explore that approach
(already looking at LLVM) once I am done with the second part of this
feature.

Having just these try (callback) {} blocks is easier as we can record
which subprog corresponds to [begin_ip, end_ip] (per frame) and stop
unwinding when we find a suitable handler for the ip of a parent
frame. The callback will be invoked and will return to the parent
frame (or kernel if it's the main frame). So if this seems like a more
useful thing, I can make this work and send it out as a follow up to
this set.

> [...]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 00/14] Exceptions - 1/2
  2023-08-25 18:55           ` Andrii Nakryiko
@ 2023-08-26 22:42             ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-26 22:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Yonghong Song, Jose E. Marchesi, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, David Vernet

On Sat, 26 Aug 2023 at 00:25, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Aug 22, 2023 at 4:06 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Aug 22, 2023 at 3:54 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > >
> > > I suppose we could switch to the ' if (!(LHS <op> RHS)) bpf_throw(); '
> > > sequence in C, force volatile load for LHS and __builtin_constant_p
> > > for RHS to get the same behavior. Emitting these redundant checks is
> > > definitely a bit weird just to emit BTF.
> >
> > I guess we can try
> > #define bpf_assert(LHS, OP, RHS) if (!(LHS OP RHS)) bpf_throw();
> > with barrier_var(LHS) and __builtin_constant_p(RHS) and
> > keep things completely in C,
> > but there is no guarantee that the compiler will not convert == to !=,
> > swap lhs and rhs, etc.
> > Maybe we can have both asm and C style macros, then recommend C to start
> > and switch to asm if things are dodgy.
> > Feels like dangerous ambiguity.
>
> This seems similar to the issue I had with
> __attribute__((cleanup(some_kfunc)))) not emitting BTF info for that
> some_kfunc? See bpf_for_each(), seems like just adding
> `(void)bpf_iter_##type##_destroy` makes Clang emit BTF info.
>
> It would be nice to have this fixed for cleanup() attribute and asm,
> of course. But this is a simple work around.

Good to know, this is cleaner than my solution. But I am planning to
switch to the direct C approach, so it should not be needed anymore.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 12/14] libbpf: Add support for custom exception callbacks
  2023-08-26 22:41     ` Kumar Kartikeya Dwivedi
@ 2023-08-27 22:27       ` Alexei Starovoitov
  0 siblings, 0 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2023-08-27 22:27 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, David Vernet

On Sat, Aug 26, 2023 at 3:42 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> >
> > Ideally though, would be great if something like this would be
> > supported instead (but I know it's way more complex, Alexei already
> > mentioned that in person and on the list):
> >
> > try (my_callback) {
> >     ... code that throws ...
> > }
> >
> > try (my_other_callback) {
> >     ... some other code that throws ...
> > }
> >
> >
> > This try() macro can be implemented in a form similar to bpf_for() by
> > using fancy for() loop. It would look and feel way more like
> > try/catch.
> >
>
> These try blocks are easier than having a try/catch block which the
> execution jumps to when the exception is thrown. I think the latter
> will involve some form of compiler support, because otherwise there is
> no control flow that is seen by the compiler into the catch block,
> which will mess up things, and I plan to atleast explore that approach
> (already looking at LLVM) once I am done with the second part of this
> feature.
>
> Having just these try (callback) {} blocks is easier as we can record
> which subprog corresponds to [begin_ip, end_ip] (per frame) and stop
> unwinding when we find a suitable handler for the ip of a parent
> frame. The callback will be invoked and will return to the parent
> frame (or kernel if it's the main frame). So if this seems like a more
> useful thing, I can make this work and send it out as a follow up to
> this set.

I suspect even "just try {}" will not work without compiler support.
{} is a semantic structure from compiler pov. It doesn't have
any delineation in LLVM IR. They don't exist to optimization passes.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 05/14] bpf: Add support for custom exception callbacks
  2023-08-09 11:41 ` [PATCH bpf-next v2 05/14] bpf: Add support for custom exception callbacks Kumar Kartikeya Dwivedi
@ 2023-08-28 22:11   ` Martin KaFai Lau
  0 siblings, 0 replies; 37+ messages in thread
From: Martin KaFai Lau @ 2023-08-28 22:11 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Yonghong Song, David Vernet, bpf

On 8/9/23 4:41 AM, Kumar Kartikeya Dwivedi wrote:
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d0f6c984272b..9d67d0633c59 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2457,6 +2457,73 @@ static int add_subprog(struct bpf_verifier_env *env, int off)
>   	return env->subprog_cnt - 1;
>   }
>   
> +static int bpf_find_exception_callback_insn_off(struct bpf_verifier_env *env)
> +{
> +	struct bpf_prog_aux *aux = env->prog->aux;
> +	struct btf *btf = aux->btf;
> +	const struct btf_type *t;
> +	const char *name;
> +	u32 main_btf_id;
> +	int ret, i, j;
> +
> +	/* Non-zero func_info_cnt implies valid btf */
> +	if (!aux->func_info_cnt)
> +		return 0;
> +	main_btf_id = aux->func_info[0].type_id;
> +
> +	t = btf_type_by_id(btf, main_btf_id);
> +	if (!t) {
> +		verbose(env, "invalid btf id for main subprog in func_info\n");
> +		return -EINVAL;
> +	}
> +
> +	name = btf_find_decl_tag_value(btf, t, -1, "exception_callback:");
> +	if (IS_ERR(name)) {
> +		ret = PTR_ERR(name);
> +		/* If there is no tag present, there is no exception callback */
> +		if (ret == -ENOENT)
> +			ret = 0;
> +		else if (ret == -EEXIST)
> +			verbose(env, "multiple exception callback tags for main subprog\n");
> +		return ret;
> +	}
> +
> +	ret = -ENOENT;
> +	for (i = 0; i < btf_nr_types(btf); i++) {
> +		t = btf_type_by_id(btf, i);
> +		if (!btf_type_is_func(t))
> +			continue;
> +		if (strcmp(name, btf_name_by_offset(btf, t->name_off)))
> +			continue;

nit. btf_find_by_name_kind() could be used here.


> +		if (btf_func_linkage(t) != BTF_FUNC_GLOBAL) {
> +			verbose(env, "exception callback '%s' must have global linkage\n", name);
> +			return -EINVAL;
> +		} > +
> +		ret = 0;
> +		for (j = 0; j < aux->func_info_cnt; j++) {
> +			if (aux->func_info[j].type_id != i)
> +				continue;
> +			ret = aux->func_info[j].insn_off;
> +			/* Further func_info and subprog checks will also happen
> +			 * later, so assume this is the right insn_off for now.
> +			 */
> +			if (!ret) {
> +				verbose(env, "invalid exception callback insn_off in func_info: 0\n");
> +				ret = -EINVAL;
> +			}
> +		}
> +		if (!ret) {
> +			verbose(env, "exception callback type id not found in func_info\n");
> +			ret = -EINVAL;
> +		}
> +		break;
> +	}
> +	if (ret == -ENOENT)
> +		verbose(env, "exception callback '%s' could not be found in BTF\n", name);
> +	return ret;
> +}


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH bpf-next v2 08/14] bpf: Prevent KASAN false positive with bpf_throw
  2023-08-09 11:41 ` [PATCH bpf-next v2 08/14] bpf: Prevent KASAN false positive with bpf_throw Kumar Kartikeya Dwivedi
  2023-08-22 16:23   ` Alexei Starovoitov
@ 2023-08-30 16:53   ` Andrey Konovalov
  1 sibling, 0 replies; 37+ messages in thread
From: Andrey Konovalov @ 2023-08-30 16:53 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	Vincenzo Frascino, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, David Vernet

On Wed, Aug 9, 2023 at 1:43 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>

Hi Kumar,

> @@ -283,8 +283,10 @@ static inline bool kasan_check_byte(const void *address)
>
>  #if defined(CONFIG_KASAN) && defined(CONFIG_KASAN_STACK)
>  void kasan_unpoison_task_stack(struct task_struct *task);
> +asmlinkage void kasan_unpoison_task_stack_below(const void *watermark);
>  #else
>  static inline void kasan_unpoison_task_stack(struct task_struct *task) {}
> +static inline void kasan_unpoison_task_stack_below(const void *watermark) {}
>  #endif

Please also drop the kasan_unpoison_task_stack_below declaration from
mm/kasan/kasan.h.

Also, could you please split this change into 2: one that exposes
kasan_unpoison_task_stack_below and another that changes the bpf code.
This will greatly simplify backporting KASAN changes for older
kernels.

Thank you!

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2023-08-30 16:53 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-09 11:41 [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
2023-08-09 11:41 ` [PATCH bpf-next v2 01/14] arch/x86: Implement arch_bpf_stack_walk Kumar Kartikeya Dwivedi
2023-08-09 11:41 ` [PATCH bpf-next v2 02/14] bpf: Implement support for adding hidden subprogs Kumar Kartikeya Dwivedi
2023-08-09 11:41 ` [PATCH bpf-next v2 03/14] bpf: Implement BPF exceptions Kumar Kartikeya Dwivedi
2023-08-22  5:12   ` Alexei Starovoitov
2023-08-22 12:53     ` Kumar Kartikeya Dwivedi
2023-08-09 11:41 ` [PATCH bpf-next v2 04/14] bpf: Refactor check_btf_func and split into two phases Kumar Kartikeya Dwivedi
2023-08-09 11:41 ` [PATCH bpf-next v2 05/14] bpf: Add support for custom exception callbacks Kumar Kartikeya Dwivedi
2023-08-28 22:11   ` Martin KaFai Lau
2023-08-09 11:41 ` [PATCH bpf-next v2 06/14] bpf: Perform CFG walk for exception callback Kumar Kartikeya Dwivedi
2023-08-09 11:41 ` [PATCH bpf-next v2 07/14] bpf: Treat first argument as return value for bpf_throw Kumar Kartikeya Dwivedi
2023-08-09 11:41 ` [PATCH bpf-next v2 08/14] bpf: Prevent KASAN false positive with bpf_throw Kumar Kartikeya Dwivedi
2023-08-22 16:23   ` Alexei Starovoitov
2023-08-30 16:53   ` Andrey Konovalov
2023-08-09 11:41 ` [PATCH bpf-next v2 09/14] bpf: Detect IP == ksym.end as part of BPF program Kumar Kartikeya Dwivedi
2023-08-09 11:41 ` [PATCH bpf-next v2 10/14] bpf: Disallow extensions to exception callbacks Kumar Kartikeya Dwivedi
2023-08-22  5:09   ` Alexei Starovoitov
2023-08-22 12:53     ` Kumar Kartikeya Dwivedi
2023-08-09 11:41 ` [PATCH bpf-next v2 11/14] bpf: Fix kfunc callback register type handling Kumar Kartikeya Dwivedi
2023-08-10 21:12   ` David Marchevsky
2023-08-09 11:41 ` [PATCH bpf-next v2 12/14] libbpf: Add support for custom exception callbacks Kumar Kartikeya Dwivedi
2023-08-22 16:34   ` Alexei Starovoitov
2023-08-22 16:58     ` Kumar Kartikeya Dwivedi
2023-08-22 19:20       ` Alexei Starovoitov
2023-08-25 18:43   ` Andrii Nakryiko
2023-08-26 22:41     ` Kumar Kartikeya Dwivedi
2023-08-27 22:27       ` Alexei Starovoitov
2023-08-09 11:41 ` [PATCH bpf-next v2 13/14] selftests/bpf: Add BPF assertion macros Kumar Kartikeya Dwivedi
2023-08-09 11:41 ` [PATCH bpf-next v2 14/14] selftests/bpf: Add tests for BPF exceptions Kumar Kartikeya Dwivedi
2023-08-22 21:22 ` [PATCH bpf-next v2 00/14] Exceptions - 1/2 Kumar Kartikeya Dwivedi
2023-08-22 22:07   ` Jose E. Marchesi
2023-08-22 22:39     ` Yonghong Song
2023-08-22 22:53       ` Kumar Kartikeya Dwivedi
2023-08-22 23:06         ` Alexei Starovoitov
2023-08-25 18:55           ` Andrii Nakryiko
2023-08-26 22:42             ` Kumar Kartikeya Dwivedi
2023-08-22 22:54     ` Kumar Kartikeya Dwivedi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.