[PATCH bpf-next v1 0/2] Handle possible NULL trusted raw

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments
@ 2024-11-01  0:00 Kumar Kartikeya Dwivedi
  2024-11-01  0:00 ` [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL Kumar Kartikeya Dwivedi
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-01  0:00 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, Steven Rostedt,
	Jiri Olsa, Juri Lelli, kernel-team

More context is available in [0], but the TLDR; is that the verifier
incorrectly assumes that any raw tracepoint argument will always be
non-NULL. This means that even when users correctly check possible NULL
arguments, the verifier can remove the NULL check due to incorrect
knowledge of the NULL-ness of the pointer. Secondly, kernel helpers or
kfuncs taking these trusted tracepoint arguments incorrectly assume that
all arguments will always be valid non-NULL.

In this set, we mark raw_tp arguments as PTR_MAYBE_NULL on top of
PTR_TRUSTED, but special case their behavior when dereferencing them or
pointer arithmetic over them is involved. When passing trusted args to
helpers or kfuncs, raw_tp programs are permitted to pass possibly NULL
pointers in such cases.

Any loads into such maybe NULL trusted PTR_TO_BTF_ID is promoted to a
PROBE_MEM load to handle emanating page faults. The verifier will ensure
NULL checks on such pointers are preserved and do not lead to dead code
elimination.

This new behavior is not applied when ref_obj_id is non-zero, as those
pointers do not belong to raw_tp arguments, but instead acquired
objects.

Since helpers and kfuncs already require attention for PTR_TO_BTF_ID
(non-trusted) pointers, we do not implement any protection for such
cases in this patch set, and leave it as future work for an upcoming
series.

A selftest is included with this patch set to verify the new behavior,
and it crashes the kernel without the first patch.

 [0]: https://lore.kernel.org/bpf/CAADnVQLMPPavJQR6JFsi3dtaaLHB816JN4HCV_TFWohJ61D+wQ@mail.gmail.com

Kumar Kartikeya Dwivedi (2):
  bpf: Mark raw_tp arguments with PTR_MAYBE_NULL
  selftests/bpf: Add tests for raw_tp null handling

 include/linux/bpf.h                           |  6 ++
 kernel/bpf/btf.c                              |  5 +-
 kernel/bpf/verifier.c                         | 75 +++++++++++++++++--
 .../bpf/bpf_testmod/bpf_testmod-events.h      |  8 ++
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   |  2 +
 .../selftests/bpf/prog_tests/raw_tp_null.c    | 25 +++++++
 .../testing/selftests/bpf/progs/raw_tp_null.c | 27 +++++++
 .../bpf/progs/test_tp_btf_nullable.c          |  6 +-
 8 files changed, 145 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_null.c
 create mode 100644 tools/testing/selftests/bpf/progs/raw_tp_null.c

base-commit: e626a13f6fbb4697f8734333432dca577628d09a
-- 
2.43.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL
  2024-11-01  0:00 [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments Kumar Kartikeya Dwivedi
@ 2024-11-01  0:00 ` Kumar Kartikeya Dwivedi
  2024-11-01 19:16   ` Andrii Nakryiko
  2024-11-01  0:00 ` [PATCH bpf-next v1 2/2] selftests/bpf: Add tests for raw_tp null handling Kumar Kartikeya Dwivedi
  2024-11-01 13:18 ` [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments Kumar Kartikeya Dwivedi
  2 siblings, 1 reply; 19+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-01  0:00 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Juri Lelli, Alexei Starovoitov, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	Steven Rostedt, Jiri Olsa, kernel-team

Arguments to a raw tracepoint are tagged as trusted, which carries the
semantics that the pointer will be non-NULL.  However, in certain cases,
a raw tracepoint argument may end up being NULL More context about this
issue is available in [0].

Thus, there is a discrepancy between the reality, that raw_tp arguments
can actually be NULL, and the verifier's knowledge, that they are never
NULL, causing explicit NULL checks to be deleted, and accesses to such
pointers potentially crashing the kernel.

To fix this, mark raw_tp arguments as PTR_MAYBE_NULL, and then special
case the dereference and pointer arithmetic to permit it, and allow
passing them into helpers/kfuncs; these exceptions are made for raw_tp
programs only. Ensure that we don't do this when ref_obj_id > 0, as in
that case this is an acquired object and doesn't need such adjustment.

The reason we do mask_raw_tp_trusted_reg logic is because other will
recheck in places whether the register is a trusted_reg, and then
consider our register as untrusted when detecting the presence of the
PTR_MAYBE_NULL flag.

To allow safe dereference, we enable PROBE_MEM marking when we see loads
into trusted pointers with PTR_MAYBE_NULL.

While trusted raw_tp arguments can also be passed into helpers or kfuncs
where such broken assumption may cause issues, a future patch set will
tackle their case separately, as PTR_TO_BTF_ID (without PTR_TRUSTED) can
already be passed into helpers and causes similar problems. Thus, they
are left alone for now.

It is possible that these checks also permit passing non-raw_tp args
that are trusted PTR_TO_BTF_ID with null marking. In such a case,
allowing dereference when pointer is NULL expands allowed behavior, so
won't regress existing programs, and the case of passing these into
helpers is the same as above and will be dealt with later.

Also update the failure case in tp_btf_nullable selftest to capture the
new behavior, as the verifier will no longer cause an error when
directly dereference a raw tracepoint argument marked as __nullable.

  [0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb

Reported-by: Juri Lelli <juri.lelli@redhat.com>
Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                           |  6 ++
 kernel/bpf/btf.c                              |  5 +-
 kernel/bpf/verifier.c                         | 75 +++++++++++++++++--
 .../bpf/progs/test_tp_btf_nullable.c          |  6 +-
 4 files changed, 83 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c3ba4d475174..1b84613b10ac 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3495,4 +3495,10 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
 	return prog->aux->func_idx != 0;
 }
 
+static inline bool bpf_prog_is_raw_tp(const struct bpf_prog *prog)
+{
+	return prog->type == BPF_PROG_TYPE_TRACING &&
+	       prog->expected_attach_type == BPF_TRACE_RAW_TP;
+}
+
 #endif /* _LINUX_BPF_H */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index ed3219da7181..e7a59e6462a9 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6588,7 +6588,10 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 	if (prog_args_trusted(prog))
 		info->reg_type |= PTR_TRUSTED;
 
-	if (btf_param_match_suffix(btf, &args[arg], "__nullable"))
+	/* Raw tracepoint arguments always get marked as maybe NULL */
+	if (bpf_prog_is_raw_tp(prog))
+		info->reg_type |= PTR_MAYBE_NULL;
+	else if (btf_param_match_suffix(btf, &args[arg], "__nullable"))
 		info->reg_type |= PTR_MAYBE_NULL;
 
 	if (tgt_prog) {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 797cf3ed32e0..36776624710f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -418,6 +418,21 @@ static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
 	return rec;
 }
 
+static bool mask_raw_tp_reg(const struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	if (reg->type != (PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL) ||
+	    !bpf_prog_is_raw_tp(env->prog) || reg->ref_obj_id)
+		return false;
+	reg->type &= ~PTR_MAYBE_NULL;
+	return true;
+}
+
+static void unmask_raw_tp_reg(struct bpf_reg_state *reg, bool result)
+{
+	if (result)
+		reg->type |= PTR_MAYBE_NULL;
+}
+
 static bool subprog_is_global(const struct bpf_verifier_env *env, int subprog)
 {
 	struct bpf_func_info_aux *aux = env->prog->aux->func_info_aux;
@@ -6622,6 +6637,7 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 	const char *field_name = NULL;
 	enum bpf_type_flag flag = 0;
 	u32 btf_id = 0;
+	bool mask;
 	int ret;
 
 	if (!env->allow_ptr_leaks) {
@@ -6693,7 +6709,21 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 
 	if (ret < 0)
 		return ret;
-
+	/* For raw_tp progs, we allow dereference of PTR_MAYBE_NULL
+	 * trusted PTR_TO_BTF_ID, these are the ones that are possibly
+	 * arguments to the raw_tp. Since internal checks in for trusted
+	 * reg in check_ptr_to_btf_access would consider PTR_MAYBE_NULL
+	 * modifier as problematic, mask it out temporarily for the
+	 * check. Don't apply this to pointers with ref_obj_id > 0, as
+	 * those won't be raw_tp args.
+	 *
+	 * We may end up applying this relaxation to other trusted
+	 * PTR_TO_BTF_ID with maybe null flag, since we cannot
+	 * distinguish PTR_MAYBE_NULL tagged for arguments vs normal
+	 * tagging, but that should expand allowed behavior, and not
+	 * cause regression for existing behavior.
+	 */
+	mask = mask_raw_tp_reg(env, reg);
 	if (ret != PTR_TO_BTF_ID) {
 		/* just mark; */
 
@@ -6754,8 +6784,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 		clear_trusted_flags(&flag);
 	}
 
-	if (atype == BPF_READ && value_regno >= 0)
+	if (atype == BPF_READ && value_regno >= 0) {
 		mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
+		/* We've assigned a new type to regno, so don't undo masking. */
+		if (regno == value_regno)
+			mask = false;
+	}
+	unmask_raw_tp_reg(reg, mask);
 
 	return 0;
 }
@@ -7140,7 +7175,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		if (!err && t == BPF_READ && value_regno >= 0)
 			mark_reg_unknown(env, regs, value_regno);
 	} else if (base_type(reg->type) == PTR_TO_BTF_ID &&
-		   !type_may_be_null(reg->type)) {
+		   (bpf_prog_is_raw_tp(env->prog) || !type_may_be_null(reg->type))) {
 		err = check_ptr_to_btf_access(env, regs, regno, off, size, t,
 					      value_regno);
 	} else if (reg->type == CONST_PTR_TO_MAP) {
@@ -8833,6 +8868,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 	enum bpf_reg_type type = reg->type;
 	u32 *arg_btf_id = NULL;
 	int err = 0;
+	bool mask;
 
 	if (arg_type == ARG_DONTCARE)
 		return 0;
@@ -8873,11 +8909,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 	    base_type(arg_type) == ARG_PTR_TO_SPIN_LOCK)
 		arg_btf_id = fn->arg_btf_id[arg];
 
+	mask = mask_raw_tp_reg(env, reg);
 	err = check_reg_type(env, regno, arg_type, arg_btf_id, meta);
-	if (err)
-		return err;
 
-	err = check_func_arg_reg_off(env, reg, regno, arg_type);
+	err = err ?: check_func_arg_reg_off(env, reg, regno, arg_type);
+	unmask_raw_tp_reg(reg, mask);
 	if (err)
 		return err;
 
@@ -9672,14 +9708,17 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
 				return ret;
 		} else if (base_type(arg->arg_type) == ARG_PTR_TO_BTF_ID) {
 			struct bpf_call_arg_meta meta;
+			bool mask;
 			int err;
 
 			if (register_is_null(reg) && type_may_be_null(arg->arg_type))
 				continue;
 
 			memset(&meta, 0, sizeof(meta)); /* leave func_id as zero */
+			mask = mask_raw_tp_reg(env, reg);
 			err = check_reg_type(env, regno, arg->arg_type, &arg->btf_id, &meta);
 			err = err ?: check_func_arg_reg_off(env, reg, regno, arg->arg_type);
+			unmask_raw_tp_reg(reg, mask);
 			if (err)
 				return err;
 		} else {
@@ -11981,6 +12020,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		enum bpf_arg_type arg_type = ARG_DONTCARE;
 		u32 regno = i + 1, ref_id, type_size;
 		bool is_ret_buf_sz = false;
+		bool mask = false;
 		int kf_arg_type;
 
 		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
@@ -12039,12 +12079,15 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			return -EINVAL;
 		}
 
+		mask = mask_raw_tp_reg(env, reg);
 		if ((is_kfunc_trusted_args(meta) || is_kfunc_rcu(meta)) &&
 		    (register_is_null(reg) || type_may_be_null(reg->type)) &&
 			!is_kfunc_arg_nullable(meta->btf, &args[i])) {
 			verbose(env, "Possibly NULL pointer passed to trusted arg%d\n", i);
+			unmask_raw_tp_reg(reg, mask);
 			return -EACCES;
 		}
+		unmask_raw_tp_reg(reg, mask);
 
 		if (reg->ref_obj_id) {
 			if (is_kfunc_release(meta) && meta->ref_obj_id) {
@@ -12102,16 +12145,24 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
 				break;
 
+			/* Allow passing maybe NULL raw_tp arguments to
+			 * kfuncs for compatibility. Don't apply this to
+			 * arguments with ref_obj_id > 0.
+			 */
+			mask = mask_raw_tp_reg(env, reg);
 			if (!is_trusted_reg(reg)) {
 				if (!is_kfunc_rcu(meta)) {
 					verbose(env, "R%d must be referenced or trusted\n", regno);
+					unmask_raw_tp_reg(reg, mask);
 					return -EINVAL;
 				}
 				if (!is_rcu_reg(reg)) {
 					verbose(env, "R%d must be a rcu pointer\n", regno);
+					unmask_raw_tp_reg(reg, mask);
 					return -EINVAL;
 				}
 			}
+			unmask_raw_tp_reg(reg, mask);
 			fallthrough;
 		case KF_ARG_PTR_TO_CTX:
 		case KF_ARG_PTR_TO_DYNPTR:
@@ -12134,7 +12185,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 
 		if (is_kfunc_release(meta) && reg->ref_obj_id)
 			arg_type |= OBJ_RELEASE;
+		mask = mask_raw_tp_reg(env, reg);
 		ret = check_func_arg_reg_off(env, reg, regno, arg_type);
+		unmask_raw_tp_reg(reg, mask);
 		if (ret < 0)
 			return ret;
 
@@ -12311,6 +12364,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			ref_tname = btf_name_by_offset(btf, ref_t->name_off);
 			fallthrough;
 		case KF_ARG_PTR_TO_BTF_ID:
+			mask = mask_raw_tp_reg(env, reg);
 			/* Only base_type is checked, further checks are done here */
 			if ((base_type(reg->type) != PTR_TO_BTF_ID ||
 			     (bpf_type_has_unsafe_modifiers(reg->type) && !is_rcu_reg(reg))) &&
@@ -12319,9 +12373,11 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "expected %s or socket\n",
 					reg_type_str(env, base_type(reg->type) |
 							  (type_flag(reg->type) & BPF_REG_TRUSTED_MODIFIERS)));
+				unmask_raw_tp_reg(reg, mask);
 				return -EINVAL;
 			}
 			ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);
+			unmask_raw_tp_reg(reg, mask);
 			if (ret < 0)
 				return ret;
 			break;
@@ -13294,7 +13350,7 @@ static int sanitize_check_bounds(struct bpf_verifier_env *env,
  */
 static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 				   struct bpf_insn *insn,
-				   const struct bpf_reg_state *ptr_reg,
+				   struct bpf_reg_state *ptr_reg,
 				   const struct bpf_reg_state *off_reg)
 {
 	struct bpf_verifier_state *vstate = env->cur_state;
@@ -13308,6 +13364,7 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 	struct bpf_sanitize_info info = {};
 	u8 opcode = BPF_OP(insn->code);
 	u32 dst = insn->dst_reg;
+	bool mask;
 	int ret;
 
 	dst_reg = &regs[dst];
@@ -13334,11 +13391,14 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
+	mask = mask_raw_tp_reg(env, ptr_reg);
 	if (ptr_reg->type & PTR_MAYBE_NULL) {
 		verbose(env, "R%d pointer arithmetic on %s prohibited, null-check it first\n",
 			dst, reg_type_str(env, ptr_reg->type));
+		unmask_raw_tp_reg(ptr_reg, mask);
 		return -EACCES;
 	}
+	unmask_raw_tp_reg(ptr_reg, mask);
 
 	switch (base_type(ptr_reg->type)) {
 	case PTR_TO_CTX:
@@ -19873,6 +19933,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		 * for this case.
 		 */
 		case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED:
+		case PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL:
 			if (type == BPF_READ) {
 				if (BPF_MODE(insn->code) == BPF_MEM)
 					insn->code = BPF_LDX | BPF_PROBE_MEM |
diff --git a/tools/testing/selftests/bpf/progs/test_tp_btf_nullable.c b/tools/testing/selftests/bpf/progs/test_tp_btf_nullable.c
index bba3e37f749b..5aaf2b065f86 100644
--- a/tools/testing/selftests/bpf/progs/test_tp_btf_nullable.c
+++ b/tools/testing/selftests/bpf/progs/test_tp_btf_nullable.c
@@ -7,7 +7,11 @@
 #include "bpf_misc.h"
 
 SEC("tp_btf/bpf_testmod_test_nullable_bare")
-__failure __msg("R1 invalid mem access 'trusted_ptr_or_null_'")
+/* This used to be a failure test, but raw_tp nullable arguments can now
+ * directly be dereferenced, whether they have nullable annotation or not,
+ * and don't need to be explicitly checked.
+ */
+__success
 int BPF_PROG(handle_tp_btf_nullable_bare1, struct bpf_testmod_test_read_ctx *nullable_ctx)
 {
 	return nullable_ctx->len;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v1 2/2] selftests/bpf: Add tests for raw_tp null handling
  2024-11-01  0:00 [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments Kumar Kartikeya Dwivedi
  2024-11-01  0:00 ` [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL Kumar Kartikeya Dwivedi
@ 2024-11-01  0:00 ` Kumar Kartikeya Dwivedi
  2024-11-01 19:19   ` Andrii Nakryiko
  2024-11-01 13:18 ` [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments Kumar Kartikeya Dwivedi
  2 siblings, 1 reply; 19+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-01  0:00 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, Steven Rostedt,
	Jiri Olsa, Juri Lelli, kernel-team

Ensure that trusted PTR_TO_BTF_ID accesses perform PROBE_MEM handling in
raw_tp program. Without the previous fix, this selftest crashes the
kernel due to a NULL-pointer dereference. Also ensure that dead code
elimination does not kick in for checks on the pointer.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../bpf/bpf_testmod/bpf_testmod-events.h      |  8 ++++++
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   |  2 ++
 .../selftests/bpf/prog_tests/raw_tp_null.c    | 25 +++++++++++++++++
 .../testing/selftests/bpf/progs/raw_tp_null.c | 27 +++++++++++++++++++
 4 files changed, 62 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_null.c
 create mode 100644 tools/testing/selftests/bpf/progs/raw_tp_null.c

diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
index 6c3b4d4f173a..aeef86b3da74 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
@@ -40,6 +40,14 @@ DECLARE_TRACE(bpf_testmod_test_nullable_bare,
 	TP_ARGS(ctx__nullable)
 );
 
+struct sk_buff;
+
+DECLARE_TRACE(bpf_testmod_test_raw_tp_null,
+	TP_PROTO(struct sk_buff *skb),
+	TP_ARGS(skb)
+);
+
+
 #undef BPF_TESTMOD_DECLARE_TRACE
 #ifdef DECLARE_TRACE_WRITABLE
 #define BPF_TESTMOD_DECLARE_TRACE(call, proto, args, size) \
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 8835761d9a12..4e6a9e9c0368 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -380,6 +380,8 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
 
 	(void)bpf_testmod_test_arg_ptr_to_struct(&struct_arg1_2);
 
+	(void)trace_bpf_testmod_test_raw_tp_null(NULL);
+
 	struct_arg3 = kmalloc((sizeof(struct bpf_testmod_struct_arg_3) +
 				sizeof(int)), GFP_KERNEL);
 	if (struct_arg3 != NULL) {
diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_null.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_null.c
new file mode 100644
index 000000000000..b9068fee7d8a
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_null.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+#include "raw_tp_null.skel.h"
+
+void test_raw_tp_null(void)
+{
+	struct raw_tp_null *skel;
+
+	skel = raw_tp_null__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "raw_tp_null__open_and_load"))
+		return;
+
+	skel->bss->tid = gettid();
+
+	if (!ASSERT_OK(raw_tp_null__attach(skel), "raw_tp_null__attach"))
+		goto end;
+
+	ASSERT_OK(trigger_module_test_read(2), "trigger testmod read");
+	ASSERT_EQ(skel->bss->i, 3, "invocations");
+
+end:
+	raw_tp_null__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/raw_tp_null.c b/tools/testing/selftests/bpf/progs/raw_tp_null.c
new file mode 100644
index 000000000000..c7c9ad4ec3b7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/raw_tp_null.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+int tid;
+int i;
+
+SEC("tp_btf/bpf_testmod_test_raw_tp_null")
+int BPF_PROG(test_raw_tp_null, struct sk_buff *skb)
+{
+	if (bpf_get_current_task_btf()->pid == tid) {
+		i = i + skb->mark + 1;
+
+		/* If dead code elimination kicks in, the increment below will
+		 * be removed. For raw_tp programs, we mark input arguments as
+		 * PTR_MAYBE_NULL, so branch prediction should never kick in.
+		 */
+		if (!skb)
+			i += 2;
+	}
+
+	return 0;
+}
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments
  2024-11-01  0:00 [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments Kumar Kartikeya Dwivedi
  2024-11-01  0:00 ` [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL Kumar Kartikeya Dwivedi
  2024-11-01  0:00 ` [PATCH bpf-next v1 2/2] selftests/bpf: Add tests for raw_tp null handling Kumar Kartikeya Dwivedi
@ 2024-11-01 13:18 ` Kumar Kartikeya Dwivedi
  2024-11-02  0:21   ` Eduard Zingerman
  2 siblings, 1 reply; 19+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-01 13:18 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, Steven Rostedt,
	Jiri Olsa, Juri Lelli, kernel-team

On Fri, 1 Nov 2024 at 01:00, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> More context is available in [0], but the TLDR; is that the verifier
> incorrectly assumes that any raw tracepoint argument will always be
> non-NULL. This means that even when users correctly check possible NULL
> arguments, the verifier can remove the NULL check due to incorrect
> knowledge of the NULL-ness of the pointer. Secondly, kernel helpers or
> kfuncs taking these trusted tracepoint arguments incorrectly assume that
> all arguments will always be valid non-NULL.
>
> In this set, we mark raw_tp arguments as PTR_MAYBE_NULL on top of
> PTR_TRUSTED, but special case their behavior when dereferencing them or
> pointer arithmetic over them is involved. When passing trusted args to
> helpers or kfuncs, raw_tp programs are permitted to pass possibly NULL
> pointers in such cases.
>
> Any loads into such maybe NULL trusted PTR_TO_BTF_ID is promoted to a
> PROBE_MEM load to handle emanating page faults. The verifier will ensure
> NULL checks on such pointers are preserved and do not lead to dead code
> elimination.
>
> This new behavior is not applied when ref_obj_id is non-zero, as those
> pointers do not belong to raw_tp arguments, but instead acquired
> objects.
>
> Since helpers and kfuncs already require attention for PTR_TO_BTF_ID
> (non-trusted) pointers, we do not implement any protection for such
> cases in this patch set, and leave it as future work for an upcoming
> series.
>
> A selftest is included with this patch set to verify the new behavior,
> and it crashes the kernel without the first patch.

I see that all selftests except one passed. The one that didn't
appears to have been cancelled after running for an hour, and stalled
after select_reuseport:OK.
Looking at the LLVM 18
(https://github.com/kernel-patches/bpf/actions/runs/11621768944/job/32366412581?pr=7999)
run instead of LLVM 17
(https://github.com/kernel-patches/bpf/actions/runs/11621768944/job/32366400714?pr=7999,
which failed), it seems the next test send_signal_tracepoint.

Is this known to be flaky? I'm guessing not and it is probably caused
by my patch, but just want to confirm before I begin debugging.

>
>  [...]
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL
  2024-11-01  0:00 ` [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL Kumar Kartikeya Dwivedi
@ 2024-11-01 19:16   ` Andrii Nakryiko
  2024-11-01 22:55     ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2024-11-01 19:16 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Juri Lelli, Alexei Starovoitov, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	Steven Rostedt, Jiri Olsa, kernel-team

On Thu, Oct 31, 2024 at 5:00 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Arguments to a raw tracepoint are tagged as trusted, which carries the
> semantics that the pointer will be non-NULL.  However, in certain cases,
> a raw tracepoint argument may end up being NULL More context about this
> issue is available in [0].
>
> Thus, there is a discrepancy between the reality, that raw_tp arguments
> can actually be NULL, and the verifier's knowledge, that they are never
> NULL, causing explicit NULL checks to be deleted, and accesses to such
> pointers potentially crashing the kernel.
>
> To fix this, mark raw_tp arguments as PTR_MAYBE_NULL, and then special
> case the dereference and pointer arithmetic to permit it, and allow
> passing them into helpers/kfuncs; these exceptions are made for raw_tp
> programs only. Ensure that we don't do this when ref_obj_id > 0, as in
> that case this is an acquired object and doesn't need such adjustment.
>
> The reason we do mask_raw_tp_trusted_reg logic is because other will
> recheck in places whether the register is a trusted_reg, and then
> consider our register as untrusted when detecting the presence of the
> PTR_MAYBE_NULL flag.
>
> To allow safe dereference, we enable PROBE_MEM marking when we see loads
> into trusted pointers with PTR_MAYBE_NULL.
>
> While trusted raw_tp arguments can also be passed into helpers or kfuncs
> where such broken assumption may cause issues, a future patch set will
> tackle their case separately, as PTR_TO_BTF_ID (without PTR_TRUSTED) can
> already be passed into helpers and causes similar problems. Thus, they
> are left alone for now.
>
> It is possible that these checks also permit passing non-raw_tp args
> that are trusted PTR_TO_BTF_ID with null marking. In such a case,
> allowing dereference when pointer is NULL expands allowed behavior, so
> won't regress existing programs, and the case of passing these into
> helpers is the same as above and will be dealt with later.
>
> Also update the failure case in tp_btf_nullable selftest to capture the
> new behavior, as the verifier will no longer cause an error when
> directly dereference a raw tracepoint argument marked as __nullable.
>
>   [0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
>
> Reported-by: Juri Lelli <juri.lelli@redhat.com>
> Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h                           |  6 ++
>  kernel/bpf/btf.c                              |  5 +-
>  kernel/bpf/verifier.c                         | 75 +++++++++++++++++--
>  .../bpf/progs/test_tp_btf_nullable.c          |  6 +-
>  4 files changed, 83 insertions(+), 9 deletions(-)
>

[...]

> @@ -6693,7 +6709,21 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>
>         if (ret < 0)
>                 return ret;
> -
> +       /* For raw_tp progs, we allow dereference of PTR_MAYBE_NULL
> +        * trusted PTR_TO_BTF_ID, these are the ones that are possibly
> +        * arguments to the raw_tp. Since internal checks in for trusted
> +        * reg in check_ptr_to_btf_access would consider PTR_MAYBE_NULL
> +        * modifier as problematic, mask it out temporarily for the
> +        * check. Don't apply this to pointers with ref_obj_id > 0, as
> +        * those won't be raw_tp args.
> +        *
> +        * We may end up applying this relaxation to other trusted
> +        * PTR_TO_BTF_ID with maybe null flag, since we cannot
> +        * distinguish PTR_MAYBE_NULL tagged for arguments vs normal
> +        * tagging, but that should expand allowed behavior, and not
> +        * cause regression for existing behavior.
> +        */

Yeah, I'm not sure why this has to be raw tp-specific?.. What's wrong
with the same behavior for BPF iterator programs, for example?

It seems nicer if we can avoid this temporary masking and instead
support this as a generic functionality? Or are there complications?

> +       mask = mask_raw_tp_reg(env, reg);
>         if (ret != PTR_TO_BTF_ID) {
>                 /* just mark; */
>
> @@ -6754,8 +6784,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>                 clear_trusted_flags(&flag);
>         }
>
> -       if (atype == BPF_READ && value_regno >= 0)
> +       if (atype == BPF_READ && value_regno >= 0) {
>                 mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
> +               /* We've assigned a new type to regno, so don't undo masking. */
> +               if (regno == value_regno)
> +                       mask = false;
> +       }
> +       unmask_raw_tp_reg(reg, mask);
>
>         return 0;
>  }

[...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 2/2] selftests/bpf: Add tests for raw_tp null handling
  2024-11-01  0:00 ` [PATCH bpf-next v1 2/2] selftests/bpf: Add tests for raw_tp null handling Kumar Kartikeya Dwivedi
@ 2024-11-01 19:19   ` Andrii Nakryiko
  2024-11-03 15:58     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2024-11-01 19:19 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, Steven Rostedt,
	Jiri Olsa, Juri Lelli, kernel-team

On Thu, Oct 31, 2024 at 5:00 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Ensure that trusted PTR_TO_BTF_ID accesses perform PROBE_MEM handling in
> raw_tp program. Without the previous fix, this selftest crashes the
> kernel due to a NULL-pointer dereference. Also ensure that dead code
> elimination does not kick in for checks on the pointer.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  .../bpf/bpf_testmod/bpf_testmod-events.h      |  8 ++++++
>  .../selftests/bpf/bpf_testmod/bpf_testmod.c   |  2 ++
>  .../selftests/bpf/prog_tests/raw_tp_null.c    | 25 +++++++++++++++++
>  .../testing/selftests/bpf/progs/raw_tp_null.c | 27 +++++++++++++++++++
>  4 files changed, 62 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_null.c
>  create mode 100644 tools/testing/selftests/bpf/progs/raw_tp_null.c
>
> diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
> index 6c3b4d4f173a..aeef86b3da74 100644
> --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
> +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
> @@ -40,6 +40,14 @@ DECLARE_TRACE(bpf_testmod_test_nullable_bare,
>         TP_ARGS(ctx__nullable)
>  );
>
> +struct sk_buff;
> +
> +DECLARE_TRACE(bpf_testmod_test_raw_tp_null,
> +       TP_PROTO(struct sk_buff *skb),
> +       TP_ARGS(skb)
> +);
> +
> +
>  #undef BPF_TESTMOD_DECLARE_TRACE
>  #ifdef DECLARE_TRACE_WRITABLE
>  #define BPF_TESTMOD_DECLARE_TRACE(call, proto, args, size) \
> diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> index 8835761d9a12..4e6a9e9c0368 100644
> --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> @@ -380,6 +380,8 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
>
>         (void)bpf_testmod_test_arg_ptr_to_struct(&struct_arg1_2);
>
> +       (void)trace_bpf_testmod_test_raw_tp_null(NULL);
> +
>         struct_arg3 = kmalloc((sizeof(struct bpf_testmod_struct_arg_3) +
>                                 sizeof(int)), GFP_KERNEL);
>         if (struct_arg3 != NULL) {
> diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_null.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_null.c
> new file mode 100644
> index 000000000000..b9068fee7d8a
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_null.c
> @@ -0,0 +1,25 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
> +
> +#include <test_progs.h>
> +#include "raw_tp_null.skel.h"
> +
> +void test_raw_tp_null(void)
> +{
> +       struct raw_tp_null *skel;
> +
> +       skel = raw_tp_null__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "raw_tp_null__open_and_load"))
> +               return;
> +
> +       skel->bss->tid = gettid();

this is not available everywhere, we just recently had a fix. Other
tests call syscall() directly. It might be time to add macro in one of
the helpers headers, though. But that can be done as a separate clean
up patch outside of this change (there is enough to review and
discuss)

> +
> +       if (!ASSERT_OK(raw_tp_null__attach(skel), "raw_tp_null__attach"))
> +               goto end;
> +
> +       ASSERT_OK(trigger_module_test_read(2), "trigger testmod read");
> +       ASSERT_EQ(skel->bss->i, 3, "invocations");
> +
> +end:
> +       raw_tp_null__destroy(skel);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/raw_tp_null.c b/tools/testing/selftests/bpf/progs/raw_tp_null.c
> new file mode 100644
> index 000000000000..c7c9ad4ec3b7
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/raw_tp_null.c
> @@ -0,0 +1,27 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
> +
> +#include <vmlinux.h>
> +#include <bpf/bpf_tracing.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +int tid;
> +int i;
> +
> +SEC("tp_btf/bpf_testmod_test_raw_tp_null")
> +int BPF_PROG(test_raw_tp_null, struct sk_buff *skb)
> +{
> +       if (bpf_get_current_task_btf()->pid == tid) {

nit: avoid unnecessary nesting. Check condition and return early. Also
seems nicer to have task_struct local variable for this, tbh:

struct task_struct *t = bpf_get_current_task_btf();

if (t->pid != tid)
    return 0;

/* the rest follows, unnested */

> +               i = i + skb->mark + 1;
> +
> +               /* If dead code elimination kicks in, the increment below will
> +                * be removed. For raw_tp programs, we mark input arguments as
> +                * PTR_MAYBE_NULL, so branch prediction should never kick in.
> +                */
> +               if (!skb)
> +                       i += 2;
> +       }
> +
> +       return 0;
> +}
> --
> 2.43.5
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL
  2024-11-01 19:16   ` Andrii Nakryiko
@ 2024-11-01 22:55     ` Alexei Starovoitov
  2024-11-03 16:16       ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2024-11-01 22:55 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Kumar Kartikeya Dwivedi, bpf, kkd, Juri Lelli, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Steven Rostedt, Jiri Olsa, Kernel Team

On Fri, Nov 1, 2024 at 12:16 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> > @@ -6693,7 +6709,21 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> >
> >         if (ret < 0)
> >                 return ret;
> > -
> > +       /* For raw_tp progs, we allow dereference of PTR_MAYBE_NULL
> > +        * trusted PTR_TO_BTF_ID, these are the ones that are possibly
> > +        * arguments to the raw_tp. Since internal checks in for trusted
> > +        * reg in check_ptr_to_btf_access would consider PTR_MAYBE_NULL
> > +        * modifier as problematic, mask it out temporarily for the
> > +        * check. Don't apply this to pointers with ref_obj_id > 0, as
> > +        * those won't be raw_tp args.
> > +        *
> > +        * We may end up applying this relaxation to other trusted
> > +        * PTR_TO_BTF_ID with maybe null flag, since we cannot
> > +        * distinguish PTR_MAYBE_NULL tagged for arguments vs normal
> > +        * tagging, but that should expand allowed behavior, and not
> > +        * cause regression for existing behavior.
> > +        */
>
> Yeah, I'm not sure why this has to be raw tp-specific?.. What's wrong
> with the same behavior for BPF iterator programs, for example?
>
> It seems nicer if we can avoid this temporary masking and instead
> support this as a generic functionality? Or are there complications?
>
> > +       mask = mask_raw_tp_reg(env, reg);
> >         if (ret != PTR_TO_BTF_ID) {
> >                 /* just mark; */
> >
> > @@ -6754,8 +6784,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> >                 clear_trusted_flags(&flag);
> >         }
> >
> > -       if (atype == BPF_READ && value_regno >= 0)
> > +       if (atype == BPF_READ && value_regno >= 0) {
> >                 mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
> > +               /* We've assigned a new type to regno, so don't undo masking. */
> > +               if (regno == value_regno)
> > +                       mask = false;
> > +       }
> > +       unmask_raw_tp_reg(reg, mask);

Kumar,

I chatted with Andrii offline. All other cases of mask/unmask
should probably stay raw_tp specific, but it seems we can make
this particular case to be generic.
Something like the following:

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 797cf3ed32e0..bbd4c03460e3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6703,7 +6703,11 @@ static int check_ptr_to_btf_access(struct
bpf_verifier_env *env,
                 */
                flag = PTR_UNTRUSTED;

+       } else if (reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED |
PTR_MAYBE_NULL)) {
+                       flag |= PTR_MAYBE_NULL;
+                       goto trusted;
        } else if (is_trusted_reg(reg) || is_rcu_reg(reg)) {
+trusted:

With the idea that trusted_or_null stays that way for all prog
types and bpf_iter__task->task deref stays trusted_or_null
instead of being downgraded to ptr_to_btf_id without any flags.
So progs can do few less != null checks.
Need to think it through.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments
  2024-11-01 13:18 ` [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments Kumar Kartikeya Dwivedi
@ 2024-11-02  0:21   ` Eduard Zingerman
  2024-11-02  0:29     ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Eduard Zingerman @ 2024-11-02  0:21 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Song Liu, Yonghong Song, Steven Rostedt, Jiri Olsa, Juri Lelli,
	kernel-team

On Fri, 2024-11-01 at 14:18 +0100, Kumar Kartikeya Dwivedi wrote:

[...]

> I see that all selftests except one passed. The one that didn't
> appears to have been cancelled after running for an hour, and stalled
> after select_reuseport:OK.
> Looking at the LLVM 18
> (https://github.com/kernel-patches/bpf/actions/runs/11621768944/job/32366412581?pr=7999)
> run instead of LLVM 17
> (https://github.com/kernel-patches/bpf/actions/runs/11621768944/job/32366400714?pr=7999,
> which failed), it seems the next test send_signal_tracepoint.
> 
> Is this known to be flaky? I'm guessing not and it is probably caused
> by my patch, but just want to confirm before I begin debugging.

I suspect this is a test send_signal.
It started to hang for me yesterday w/o any apparent reason (on master branch).
I added workaround to avoid stalls, but this does not address the
issue with the test. Workaround follows.

---

diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
index ee5a221b4103..4af127945417 100644
--- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -18,6 +18,38 @@ static void sigusr1_siginfo_handler(int s, siginfo_t *i, void *v)
 
 static char log_buf[64 * 1024];
 
+static ssize_t read_with_timeout(int fd, void *buf, size_t count)
+{
+	struct timeval tv = { 1, 0 };
+	fd_set fds;
+	int err;
+
+	FD_ZERO(&fds);
+	FD_SET(fd, &fds);
+	err = select(fd + 1, &fds, NULL, NULL, &tv);
+	if (!ASSERT_GE(err, 0, "read select"))
+		return err;
+	if (FD_ISSET(fd, &fds))
+		return read(fd, buf, count);
+	return -EAGAIN;
+}
+
+static ssize_t write_with_timeout(int fd, void *buf, size_t count)
+{
+	struct timeval tv = { 1, 0 };
+	fd_set fds;
+	int err;
+
+	FD_ZERO(&fds);
+	FD_SET(fd, &fds);
+	err = select(fd + 1, NULL, &fds, NULL, &tv);
+	if (!ASSERT_GE(err, 0, "write select"))
+		return err;
+	if (FD_ISSET(fd, &fds))
+		return write(fd, buf, count);
+	return -EAGAIN;
+}
+
 static void test_send_signal_common(struct perf_event_attr *attr,
 				    bool signal_thread, bool remote)
 {
@@ -75,10 +107,10 @@ static void test_send_signal_common(struct perf_event_attr *attr,
 		}
 
 		/* notify parent signal handler is installed */
-		ASSERT_EQ(write(pipe_c2p[1], buf, 1), 1, "pipe_write");
+		ASSERT_EQ(write_with_timeout(pipe_c2p[1], buf, 1), 1, "pipe_write");
 
 		/* make sure parent enabled bpf program to send_signal */
-		ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read");
+		ASSERT_EQ(read_with_timeout(pipe_p2c[0], buf, 1), 1, "pipe_read");
 
 		/* wait a little for signal handler */
 		for (int i = 0; i < 1000000000 && !sigusr1_received; i++) {
@@ -94,10 +126,10 @@ static void test_send_signal_common(struct perf_event_attr *attr,
 		buf[0] = sigusr1_received;
 
 		ASSERT_EQ(sigusr1_received, 8, "sigusr1_received");
-		ASSERT_EQ(write(pipe_c2p[1], buf, 1), 1, "pipe_write");
+		ASSERT_EQ(write_with_timeout(pipe_c2p[1], buf, 1), 1, "pipe_write");
 
 		/* wait for parent notification and exit */
-		ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read");
+		ASSERT_EQ(read_with_timeout(pipe_p2c[0], buf, 1), 1, "pipe_read");
 
 		/* restore the old priority */
 		if (!remote)
@@ -158,7 +190,7 @@ static void test_send_signal_common(struct perf_event_attr *attr,
 	}
 
 	/* wait until child signal handler installed */
-	ASSERT_EQ(read(pipe_c2p[0], buf, 1), 1, "pipe_read");
+	ASSERT_EQ(read_with_timeout(pipe_c2p[0], buf, 1), 1, "pipe_read");
 
 	/* trigger the bpf send_signal */
 	skel->bss->signal_thread = signal_thread;
@@ -172,7 +204,7 @@ static void test_send_signal_common(struct perf_event_attr *attr,
 	}
 
 	/* notify child that bpf program can send_signal now */
-	ASSERT_EQ(write(pipe_p2c[1], buf, 1), 1, "pipe_write");
+	ASSERT_EQ(write_with_timeout(pipe_p2c[1], buf, 1), 1, "pipe_write");
 
 	/* For the remote test, the BPF program is triggered from this
 	 * process but the other process/thread is signaled.
@@ -188,7 +220,7 @@ static void test_send_signal_common(struct perf_event_attr *attr,
 	}
 
 	/* wait for result */
-	err = read(pipe_c2p[0], buf, 1);
+	err = read_with_timeout(pipe_c2p[0], buf, 1);
 	if (!ASSERT_GE(err, 0, "reading pipe"))
 		goto disable_pmu;
 	if (!ASSERT_GT(err, 0, "reading pipe error: size 0")) {
@@ -199,7 +231,7 @@ static void test_send_signal_common(struct perf_event_attr *attr,
 	ASSERT_EQ(buf[0], 8, "incorrect result");
 
 	/* notify child safe to exit */
-	ASSERT_EQ(write(pipe_p2c[1], buf, 1), 1, "pipe_write");
+	ASSERT_EQ(write_with_timeout(pipe_p2c[1], buf, 1), 1, "pipe_write");
 
 disable_pmu:
 	close(pmu_fd);


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments
  2024-11-02  0:21   ` Eduard Zingerman
@ 2024-11-02  0:29     ` Alexei Starovoitov
  2024-11-02  0:32       ` Eduard Zingerman
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2024-11-02  0:29 UTC (permalink / raw)
  To: Eduard Zingerman, Puranjay Mohan
  Cc: Kumar Kartikeya Dwivedi, bpf, kkd, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	Steven Rostedt, Jiri Olsa, Juri Lelli, Kernel Team

On Fri, Nov 1, 2024 at 5:21 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2024-11-01 at 14:18 +0100, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > I see that all selftests except one passed. The one that didn't
> > appears to have been cancelled after running for an hour, and stalled
> > after select_reuseport:OK.
> > Looking at the LLVM 18
> > (https://github.com/kernel-patches/bpf/actions/runs/11621768944/job/32366412581?pr=7999)
> > run instead of LLVM 17
> > (https://github.com/kernel-patches/bpf/actions/runs/11621768944/job/32366400714?pr=7999,
> > which failed), it seems the next test send_signal_tracepoint.
> >
> > Is this known to be flaky? I'm guessing not and it is probably caused
> > by my patch, but just want to confirm before I begin debugging.
>
> I suspect this is a test send_signal.
> It started to hang for me yesterday w/o any apparent reason (on master branch).
> I added workaround to avoid stalls, but this does not address the
> issue with the test. Workaround follows.

Hmm.
Puranjay touched it last with extra logic.

And before that David Vernet tried to address flakiness
in commit 4a54de65964d.
Yonghong also noticed lockups in paravirt
and added workaround 7015843afc.

Your additional timeout/workaround makes sense to me,
but would be good to bisect whether Puranjay's change caused it.

> ---
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> index ee5a221b4103..4af127945417 100644
> --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
> +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> @@ -18,6 +18,38 @@ static void sigusr1_siginfo_handler(int s, siginfo_t *i, void *v)
>
>  static char log_buf[64 * 1024];
>
> +static ssize_t read_with_timeout(int fd, void *buf, size_t count)
> +{
> +       struct timeval tv = { 1, 0 };
> +       fd_set fds;
> +       int err;
> +
> +       FD_ZERO(&fds);
> +       FD_SET(fd, &fds);
> +       err = select(fd + 1, &fds, NULL, NULL, &tv);
> +       if (!ASSERT_GE(err, 0, "read select"))
> +               return err;
> +       if (FD_ISSET(fd, &fds))
> +               return read(fd, buf, count);
> +       return -EAGAIN;
> +}
> +
> +static ssize_t write_with_timeout(int fd, void *buf, size_t count)
> +{
> +       struct timeval tv = { 1, 0 };
> +       fd_set fds;
> +       int err;
> +
> +       FD_ZERO(&fds);
> +       FD_SET(fd, &fds);
> +       err = select(fd + 1, NULL, &fds, NULL, &tv);
> +       if (!ASSERT_GE(err, 0, "write select"))
> +               return err;
> +       if (FD_ISSET(fd, &fds))
> +               return write(fd, buf, count);
> +       return -EAGAIN;
> +}
> +
>  static void test_send_signal_common(struct perf_event_attr *attr,
>                                     bool signal_thread, bool remote)
>  {
> @@ -75,10 +107,10 @@ static void test_send_signal_common(struct perf_event_attr *attr,
>                 }
>
>                 /* notify parent signal handler is installed */
> -               ASSERT_EQ(write(pipe_c2p[1], buf, 1), 1, "pipe_write");
> +               ASSERT_EQ(write_with_timeout(pipe_c2p[1], buf, 1), 1, "pipe_write");
>
>                 /* make sure parent enabled bpf program to send_signal */
> -               ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read");
> +               ASSERT_EQ(read_with_timeout(pipe_p2c[0], buf, 1), 1, "pipe_read");
>
>                 /* wait a little for signal handler */
>                 for (int i = 0; i < 1000000000 && !sigusr1_received; i++) {
> @@ -94,10 +126,10 @@ static void test_send_signal_common(struct perf_event_attr *attr,
>                 buf[0] = sigusr1_received;
>
>                 ASSERT_EQ(sigusr1_received, 8, "sigusr1_received");
> -               ASSERT_EQ(write(pipe_c2p[1], buf, 1), 1, "pipe_write");
> +               ASSERT_EQ(write_with_timeout(pipe_c2p[1], buf, 1), 1, "pipe_write");
>
>                 /* wait for parent notification and exit */
> -               ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read");
> +               ASSERT_EQ(read_with_timeout(pipe_p2c[0], buf, 1), 1, "pipe_read");
>
>                 /* restore the old priority */
>                 if (!remote)
> @@ -158,7 +190,7 @@ static void test_send_signal_common(struct perf_event_attr *attr,
>         }
>
>         /* wait until child signal handler installed */
> -       ASSERT_EQ(read(pipe_c2p[0], buf, 1), 1, "pipe_read");
> +       ASSERT_EQ(read_with_timeout(pipe_c2p[0], buf, 1), 1, "pipe_read");
>
>         /* trigger the bpf send_signal */
>         skel->bss->signal_thread = signal_thread;
> @@ -172,7 +204,7 @@ static void test_send_signal_common(struct perf_event_attr *attr,
>         }
>
>         /* notify child that bpf program can send_signal now */
> -       ASSERT_EQ(write(pipe_p2c[1], buf, 1), 1, "pipe_write");
> +       ASSERT_EQ(write_with_timeout(pipe_p2c[1], buf, 1), 1, "pipe_write");
>
>         /* For the remote test, the BPF program is triggered from this
>          * process but the other process/thread is signaled.
> @@ -188,7 +220,7 @@ static void test_send_signal_common(struct perf_event_attr *attr,
>         }
>
>         /* wait for result */
> -       err = read(pipe_c2p[0], buf, 1);
> +       err = read_with_timeout(pipe_c2p[0], buf, 1);
>         if (!ASSERT_GE(err, 0, "reading pipe"))
>                 goto disable_pmu;
>         if (!ASSERT_GT(err, 0, "reading pipe error: size 0")) {
> @@ -199,7 +231,7 @@ static void test_send_signal_common(struct perf_event_attr *attr,
>         ASSERT_EQ(buf[0], 8, "incorrect result");
>
>         /* notify child safe to exit */
> -       ASSERT_EQ(write(pipe_p2c[1], buf, 1), 1, "pipe_write");
> +       ASSERT_EQ(write_with_timeout(pipe_p2c[1], buf, 1), 1, "pipe_write");
>
>  disable_pmu:
>         close(pmu_fd);
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments
  2024-11-02  0:29     ` Alexei Starovoitov
@ 2024-11-02  0:32       ` Eduard Zingerman
  2024-11-08  5:08         ` Eduard Zingerman
  0 siblings, 1 reply; 19+ messages in thread
From: Eduard Zingerman @ 2024-11-02  0:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Puranjay Mohan
  Cc: Kumar Kartikeya Dwivedi, bpf, kkd, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	Steven Rostedt, Jiri Olsa, Juri Lelli, Kernel Team

On Fri, 2024-11-01 at 17:29 -0700, Alexei Starovoitov wrote:

[...]

> Hmm.
> Puranjay touched it last with extra logic.
> 
> And before that David Vernet tried to address flakiness
> in commit 4a54de65964d.
> Yonghong also noticed lockups in paravirt
> and added workaround 7015843afc.
> 
> Your additional timeout/workaround makes sense to me,
> but would be good to bisect whether Puranjay's change caused it.

I'll debug what's going on some time later today or on Sat.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 2/2] selftests/bpf: Add tests for raw_tp null handling
  2024-11-01 19:19   ` Andrii Nakryiko
@ 2024-11-03 15:58     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 19+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-03 15:58 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, Steven Rostedt,
	Jiri Olsa, Juri Lelli, kernel-team

On Fri, 1 Nov 2024 at 14:19, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Oct 31, 2024 at 5:00 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > Ensure that trusted PTR_TO_BTF_ID accesses perform PROBE_MEM handling in
> > raw_tp program. Without the previous fix, this selftest crashes the
> > kernel due to a NULL-pointer dereference. Also ensure that dead code
> > elimination does not kick in for checks on the pointer.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  .../bpf/bpf_testmod/bpf_testmod-events.h      |  8 ++++++
> >  .../selftests/bpf/bpf_testmod/bpf_testmod.c   |  2 ++
> >  .../selftests/bpf/prog_tests/raw_tp_null.c    | 25 +++++++++++++++++
> >  .../testing/selftests/bpf/progs/raw_tp_null.c | 27 +++++++++++++++++++
> >  4 files changed, 62 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_null.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/raw_tp_null.c
> >
> > diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
> > index 6c3b4d4f173a..aeef86b3da74 100644
> > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
> > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
> > @@ -40,6 +40,14 @@ DECLARE_TRACE(bpf_testmod_test_nullable_bare,
> >         TP_ARGS(ctx__nullable)
> >  );
> >
> > +struct sk_buff;
> > +
> > +DECLARE_TRACE(bpf_testmod_test_raw_tp_null,
> > +       TP_PROTO(struct sk_buff *skb),
> > +       TP_ARGS(skb)
> > +);
> > +
> > +
> >  #undef BPF_TESTMOD_DECLARE_TRACE
> >  #ifdef DECLARE_TRACE_WRITABLE
> >  #define BPF_TESTMOD_DECLARE_TRACE(call, proto, args, size) \
> > diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > index 8835761d9a12..4e6a9e9c0368 100644
> > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > @@ -380,6 +380,8 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
> >
> >         (void)bpf_testmod_test_arg_ptr_to_struct(&struct_arg1_2);
> >
> > +       (void)trace_bpf_testmod_test_raw_tp_null(NULL);
> > +
> >         struct_arg3 = kmalloc((sizeof(struct bpf_testmod_struct_arg_3) +
> >                                 sizeof(int)), GFP_KERNEL);
> >         if (struct_arg3 != NULL) {
> > diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_null.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_null.c
> > new file mode 100644
> > index 000000000000..b9068fee7d8a
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_null.c
> > @@ -0,0 +1,25 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
> > +
> > +#include <test_progs.h>
> > +#include "raw_tp_null.skel.h"
> > +
> > +void test_raw_tp_null(void)
> > +{
> > +       struct raw_tp_null *skel;
> > +
> > +       skel = raw_tp_null__open_and_load();
> > +       if (!ASSERT_OK_PTR(skel, "raw_tp_null__open_and_load"))
> > +               return;
> > +
> > +       skel->bss->tid = gettid();
>
> this is not available everywhere, we just recently had a fix. Other
> tests call syscall() directly. It might be time to add macro in one of
> the helpers headers, though. But that can be done as a separate clean
> up patch outside of this change (there is enough to review and
> discuss)

Ok, I can include a clean up patch in the next revision.

>
> > +
> > +       if (!ASSERT_OK(raw_tp_null__attach(skel), "raw_tp_null__attach"))
> > +               goto end;
> > +
> > +       ASSERT_OK(trigger_module_test_read(2), "trigger testmod read");
> > +       ASSERT_EQ(skel->bss->i, 3, "invocations");
> > +
> > +end:
> > +       raw_tp_null__destroy(skel);
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/raw_tp_null.c b/tools/testing/selftests/bpf/progs/raw_tp_null.c
> > new file mode 100644
> > index 000000000000..c7c9ad4ec3b7
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/raw_tp_null.c
> > @@ -0,0 +1,27 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
> > +
> > +#include <vmlinux.h>
> > +#include <bpf/bpf_tracing.h>
> > +
> > +char _license[] SEC("license") = "GPL";
> > +
> > +int tid;
> > +int i;
> > +
> > +SEC("tp_btf/bpf_testmod_test_raw_tp_null")
> > +int BPF_PROG(test_raw_tp_null, struct sk_buff *skb)
> > +{
> > +       if (bpf_get_current_task_btf()->pid == tid) {
>
> nit: avoid unnecessary nesting. Check condition and return early. Also
> seems nicer to have task_struct local variable for this, tbh:
>
> struct task_struct *t = bpf_get_current_task_btf();
>
> if (t->pid != tid)
>     return 0;
>
> /* the rest follows, unnested */

Ack.

>
> > +               i = i + skb->mark + 1;
> > +
> > +               /* If dead code elimination kicks in, the increment below will
> > +                * be removed. For raw_tp programs, we mark input arguments as
> > +                * PTR_MAYBE_NULL, so branch prediction should never kick in.
> > +                */
> > +               if (!skb)
> > +                       i += 2;
> > +       }
> > +
> > +       return 0;
> > +}
> > --
> > 2.43.5
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL
  2024-11-01 22:55     ` Alexei Starovoitov
@ 2024-11-03 16:16       ` Kumar Kartikeya Dwivedi
  2024-11-03 16:40         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 19+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-03 16:16 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, kkd, Juri Lelli, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Steven Rostedt, Jiri Olsa, Kernel Team

On Fri, 1 Nov 2024 at 17:56, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Nov 1, 2024 at 12:16 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > > @@ -6693,7 +6709,21 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > >
> > >         if (ret < 0)
> > >                 return ret;
> > > -
> > > +       /* For raw_tp progs, we allow dereference of PTR_MAYBE_NULL
> > > +        * trusted PTR_TO_BTF_ID, these are the ones that are possibly
> > > +        * arguments to the raw_tp. Since internal checks in for trusted
> > > +        * reg in check_ptr_to_btf_access would consider PTR_MAYBE_NULL
> > > +        * modifier as problematic, mask it out temporarily for the
> > > +        * check. Don't apply this to pointers with ref_obj_id > 0, as
> > > +        * those won't be raw_tp args.
> > > +        *
> > > +        * We may end up applying this relaxation to other trusted
> > > +        * PTR_TO_BTF_ID with maybe null flag, since we cannot
> > > +        * distinguish PTR_MAYBE_NULL tagged for arguments vs normal
> > > +        * tagging, but that should expand allowed behavior, and not
> > > +        * cause regression for existing behavior.
> > > +        */
> >
> > Yeah, I'm not sure why this has to be raw tp-specific?.. What's wrong
> > with the same behavior for BPF iterator programs, for example?
> >
> > It seems nicer if we can avoid this temporary masking and instead
> > support this as a generic functionality? Or are there complications?
> >

We _can_ do this for all programs. The thought process here was to
leave existing raw_tp programs unbroken if possible if we're marking
their arguments as PTR_MAYBE_NULL, since most of them won't be
performing any NULL checks at all.

> > > +       mask = mask_raw_tp_reg(env, reg);
> > >         if (ret != PTR_TO_BTF_ID) {
> > >                 /* just mark; */
> > >
> > > @@ -6754,8 +6784,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > >                 clear_trusted_flags(&flag);
> > >         }
> > >
> > > -       if (atype == BPF_READ && value_regno >= 0)
> > > +       if (atype == BPF_READ && value_regno >= 0) {
> > >                 mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
> > > +               /* We've assigned a new type to regno, so don't undo masking. */
> > > +               if (regno == value_regno)
> > > +                       mask = false;
> > > +       }
> > > +       unmask_raw_tp_reg(reg, mask);
>
> Kumar,
>
> I chatted with Andrii offline. All other cases of mask/unmask
> should probably stay raw_tp specific, but it seems we can make
> this particular case to be generic.
> Something like the following:
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 797cf3ed32e0..bbd4c03460e3 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6703,7 +6703,11 @@ static int check_ptr_to_btf_access(struct
> bpf_verifier_env *env,
>                  */
>                 flag = PTR_UNTRUSTED;
>
> +       } else if (reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED |
> PTR_MAYBE_NULL)) {
> +                       flag |= PTR_MAYBE_NULL;
> +                       goto trusted;
>         } else if (is_trusted_reg(reg) || is_rcu_reg(reg)) {
> +trusted:
>
> With the idea that trusted_or_null stays that way for all prog
> types and bpf_iter__task->task deref stays trusted_or_null
> instead of being downgraded to ptr_to_btf_id without any flags.
> So progs can do few less != null checks.
> Need to think it through.

Ok. But don't allow passing such pointers into helpers, right?
We do that for raw_tp to preserve compat, but it would just exacerbate
the issue if we start doing it everywhere.
So it's just that dereferencing a _or_null pointer becomes an ok thing to do?
Let me mull over this for a bit.

I'm not sure whether not doing the NULL check is better or worse
though. On one hand everything will work without checking for NULL, on
the other hand people may also assume the verifier isn't complaining
because the pointer is valid, and then they read data from the pointer
which always ends up being zero, meaning different things for
different kinds of fields.

Just thinking out loud, but one of the other concerns would be that
we're encouraging people not to do these NULL checks, which means a
potential page fault penalty everytime that pointer _is_ NULL, instead
of a simple branch, which would certainly be a bit expensive. If this
becomes the common case, I think the prog execution latency penalty
will be big. It is something to consider.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL
  2024-11-03 16:16       ` Kumar Kartikeya Dwivedi
@ 2024-11-03 16:40         ` Kumar Kartikeya Dwivedi
  2024-11-03 17:00           ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 19+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-03 16:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, kkd, Juri Lelli, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Steven Rostedt, Jiri Olsa, Kernel Team

On Sun, 3 Nov 2024 at 10:16, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Fri, 1 Nov 2024 at 17:56, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Nov 1, 2024 at 12:16 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > > @@ -6693,7 +6709,21 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > > >
> > > >         if (ret < 0)
> > > >                 return ret;
> > > > -
> > > > +       /* For raw_tp progs, we allow dereference of PTR_MAYBE_NULL
> > > > +        * trusted PTR_TO_BTF_ID, these are the ones that are possibly
> > > > +        * arguments to the raw_tp. Since internal checks in for trusted
> > > > +        * reg in check_ptr_to_btf_access would consider PTR_MAYBE_NULL
> > > > +        * modifier as problematic, mask it out temporarily for the
> > > > +        * check. Don't apply this to pointers with ref_obj_id > 0, as
> > > > +        * those won't be raw_tp args.
> > > > +        *
> > > > +        * We may end up applying this relaxation to other trusted
> > > > +        * PTR_TO_BTF_ID with maybe null flag, since we cannot
> > > > +        * distinguish PTR_MAYBE_NULL tagged for arguments vs normal
> > > > +        * tagging, but that should expand allowed behavior, and not
> > > > +        * cause regression for existing behavior.
> > > > +        */
> > >
> > > Yeah, I'm not sure why this has to be raw tp-specific?.. What's wrong
> > > with the same behavior for BPF iterator programs, for example?
> > >
> > > It seems nicer if we can avoid this temporary masking and instead
> > > support this as a generic functionality? Or are there complications?
> > >
>
> We _can_ do this for all programs. The thought process here was to
> leave existing raw_tp programs unbroken if possible if we're marking
> their arguments as PTR_MAYBE_NULL, since most of them won't be
> performing any NULL checks at all.
>
> > > > +       mask = mask_raw_tp_reg(env, reg);
> > > >         if (ret != PTR_TO_BTF_ID) {
> > > >                 /* just mark; */
> > > >
> > > > @@ -6754,8 +6784,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > > >                 clear_trusted_flags(&flag);
> > > >         }
> > > >
> > > > -       if (atype == BPF_READ && value_regno >= 0)
> > > > +       if (atype == BPF_READ && value_regno >= 0) {
> > > >                 mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
> > > > +               /* We've assigned a new type to regno, so don't undo masking. */
> > > > +               if (regno == value_regno)
> > > > +                       mask = false;
> > > > +       }
> > > > +       unmask_raw_tp_reg(reg, mask);
> >
> > Kumar,
> >
> > I chatted with Andrii offline. All other cases of mask/unmask
> > should probably stay raw_tp specific, but it seems we can make
> > this particular case to be generic.
> > Something like the following:
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 797cf3ed32e0..bbd4c03460e3 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -6703,7 +6703,11 @@ static int check_ptr_to_btf_access(struct
> > bpf_verifier_env *env,
> >                  */
> >                 flag = PTR_UNTRUSTED;
> >
> > +       } else if (reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED |
> > PTR_MAYBE_NULL)) {
> > +                       flag |= PTR_MAYBE_NULL;
> > +                       goto trusted;
> >         } else if (is_trusted_reg(reg) || is_rcu_reg(reg)) {
> > +trusted:
> >
> > With the idea that trusted_or_null stays that way for all prog
> > types and bpf_iter__task->task deref stays trusted_or_null
> > instead of being downgraded to ptr_to_btf_id without any flags.
> > So progs can do few less != null checks.
> > Need to think it through.
>
> Ok. But don't allow passing such pointers into helpers, right?
> We do that for raw_tp to preserve compat, but it would just exacerbate
> the issue if we start doing it everywhere.
> So it's just that dereferencing a _or_null pointer becomes an ok thing to do?
> Let me mull over this for a bit.
>
> I'm not sure whether not doing the NULL check is better or worse
> though. On one hand everything will work without checking for NULL, on
> the other hand people may also assume the verifier isn't complaining
> because the pointer is valid, and then they read data from the pointer
> which always ends up being zero, meaning different things for
> different kinds of fields.
>
> Just thinking out loud, but one of the other concerns would be that
> we're encouraging people not to do these NULL checks, which means a
> potential page fault penalty everytime that pointer _is_ NULL, instead
> of a simple branch, which would certainly be a bit expensive. If this
> becomes the common case, I think the prog execution latency penalty
> will be big. It is something to consider.

Ah, no, my bad, this won't be a problem now, as the JIT does emit a
branch to check for kernel addresses, but it probably will be if
https://lore.kernel.org/bpf/20240619092216.1780946-1-memxor@gmail.com/
gets accepted.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL
  2024-11-03 16:40         ` Kumar Kartikeya Dwivedi
@ 2024-11-03 17:00           ` Kumar Kartikeya Dwivedi
  2024-11-03 17:37             ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-03 17:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, kkd, Juri Lelli, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Steven Rostedt, Jiri Olsa, Kernel Team

On Sun, 3 Nov 2024 at 10:40, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Sun, 3 Nov 2024 at 10:16, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > On Fri, 1 Nov 2024 at 17:56, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Nov 1, 2024 at 12:16 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > > @@ -6693,7 +6709,21 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > > > >
> > > > >         if (ret < 0)
> > > > >                 return ret;
> > > > > -
> > > > > +       /* For raw_tp progs, we allow dereference of PTR_MAYBE_NULL
> > > > > +        * trusted PTR_TO_BTF_ID, these are the ones that are possibly
> > > > > +        * arguments to the raw_tp. Since internal checks in for trusted
> > > > > +        * reg in check_ptr_to_btf_access would consider PTR_MAYBE_NULL
> > > > > +        * modifier as problematic, mask it out temporarily for the
> > > > > +        * check. Don't apply this to pointers with ref_obj_id > 0, as
> > > > > +        * those won't be raw_tp args.
> > > > > +        *
> > > > > +        * We may end up applying this relaxation to other trusted
> > > > > +        * PTR_TO_BTF_ID with maybe null flag, since we cannot
> > > > > +        * distinguish PTR_MAYBE_NULL tagged for arguments vs normal
> > > > > +        * tagging, but that should expand allowed behavior, and not
> > > > > +        * cause regression for existing behavior.
> > > > > +        */
> > > >
> > > > Yeah, I'm not sure why this has to be raw tp-specific?.. What's wrong
> > > > with the same behavior for BPF iterator programs, for example?
> > > >
> > > > It seems nicer if we can avoid this temporary masking and instead
> > > > support this as a generic functionality? Or are there complications?
> > > >
> >
> > We _can_ do this for all programs. The thought process here was to
> > leave existing raw_tp programs unbroken if possible if we're marking
> > their arguments as PTR_MAYBE_NULL, since most of them won't be
> > performing any NULL checks at all.
> >
> > > > > +       mask = mask_raw_tp_reg(env, reg);
> > > > >         if (ret != PTR_TO_BTF_ID) {
> > > > >                 /* just mark; */
> > > > >
> > > > > @@ -6754,8 +6784,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > > > >                 clear_trusted_flags(&flag);
> > > > >         }
> > > > >
> > > > > -       if (atype == BPF_READ && value_regno >= 0)
> > > > > +       if (atype == BPF_READ && value_regno >= 0) {
> > > > >                 mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
> > > > > +               /* We've assigned a new type to regno, so don't undo masking. */
> > > > > +               if (regno == value_regno)
> > > > > +                       mask = false;
> > > > > +       }
> > > > > +       unmask_raw_tp_reg(reg, mask);
> > >
> > > Kumar,
> > >
> > > I chatted with Andrii offline. All other cases of mask/unmask
> > > should probably stay raw_tp specific, but it seems we can make
> > > this particular case to be generic.
> > > Something like the following:
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 797cf3ed32e0..bbd4c03460e3 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -6703,7 +6703,11 @@ static int check_ptr_to_btf_access(struct
> > > bpf_verifier_env *env,
> > >                  */
> > >                 flag = PTR_UNTRUSTED;
> > >
> > > +       } else if (reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED |
> > > PTR_MAYBE_NULL)) {
> > > +                       flag |= PTR_MAYBE_NULL;
> > > +                       goto trusted;
> > >         } else if (is_trusted_reg(reg) || is_rcu_reg(reg)) {
> > > +trusted:
> > >
> > > With the idea that trusted_or_null stays that way for all prog
> > > types and bpf_iter__task->task deref stays trusted_or_null
> > > instead of being downgraded to ptr_to_btf_id without any flags.
> > > So progs can do few less != null checks.
> > > Need to think it through.
> >
> > Ok. But don't allow passing such pointers into helpers, right?
> > We do that for raw_tp to preserve compat, but it would just exacerbate
> > the issue if we start doing it everywhere.
> > So it's just that dereferencing a _or_null pointer becomes an ok thing to do?
> > Let me mull over this for a bit.
> >
> > I'm not sure whether not doing the NULL check is better or worse
> > though. On one hand everything will work without checking for NULL, on
> > the other hand people may also assume the verifier isn't complaining
> > because the pointer is valid, and then they read data from the pointer
> > which always ends up being zero, meaning different things for
> > different kinds of fields.
> >
> > Just thinking out loud, but one of the other concerns would be that
> > we're encouraging people not to do these NULL checks, which means a
> > potential page fault penalty everytime that pointer _is_ NULL, instead
> > of a simple branch, which would certainly be a bit expensive. If this
> > becomes the common case, I think the prog execution latency penalty
> > will be big. It is something to consider.
>
> Ah, no, my bad, this won't be a problem now, as the JIT does emit a
> branch to check for kernel addresses, but it probably will be if
> https://lore.kernel.org/bpf/20240619092216.1780946-1-memxor@gmail.com/
> gets accepted.

I applied this and tried to find out the time it takes to dereference
a NULL pointer using bpf_ktime_get_ns
With the patch above: time=3345 ns
Without (bpf-next right now): time=170 ns

So I guess that means I should probably drop the patch above if we
decide to allow dereferencing NULL for all programs.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL
  2024-11-03 17:00           ` Kumar Kartikeya Dwivedi
@ 2024-11-03 17:37             ` Alexei Starovoitov
  2024-11-06 20:17               ` Andrii Nakryiko
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2024-11-03 17:37 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Andrii Nakryiko, bpf, kkd, Juri Lelli, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Steven Rostedt, Jiri Olsa, Kernel Team

On Sun, Nov 3, 2024 at 9:01 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Sun, 3 Nov 2024 at 10:40, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > On Sun, 3 Nov 2024 at 10:16, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > >
> > > On Fri, 1 Nov 2024 at 17:56, Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Fri, Nov 1, 2024 at 12:16 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > > @@ -6693,7 +6709,21 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > > > > >
> > > > > >         if (ret < 0)
> > > > > >                 return ret;
> > > > > > -
> > > > > > +       /* For raw_tp progs, we allow dereference of PTR_MAYBE_NULL
> > > > > > +        * trusted PTR_TO_BTF_ID, these are the ones that are possibly
> > > > > > +        * arguments to the raw_tp. Since internal checks in for trusted
> > > > > > +        * reg in check_ptr_to_btf_access would consider PTR_MAYBE_NULL
> > > > > > +        * modifier as problematic, mask it out temporarily for the
> > > > > > +        * check. Don't apply this to pointers with ref_obj_id > 0, as
> > > > > > +        * those won't be raw_tp args.
> > > > > > +        *
> > > > > > +        * We may end up applying this relaxation to other trusted
> > > > > > +        * PTR_TO_BTF_ID with maybe null flag, since we cannot
> > > > > > +        * distinguish PTR_MAYBE_NULL tagged for arguments vs normal
> > > > > > +        * tagging, but that should expand allowed behavior, and not
> > > > > > +        * cause regression for existing behavior.
> > > > > > +        */
> > > > >
> > > > > Yeah, I'm not sure why this has to be raw tp-specific?.. What's wrong
> > > > > with the same behavior for BPF iterator programs, for example?
> > > > >
> > > > > It seems nicer if we can avoid this temporary masking and instead
> > > > > support this as a generic functionality? Or are there complications?
> > > > >
> > >
> > > We _can_ do this for all programs. The thought process here was to
> > > leave existing raw_tp programs unbroken if possible if we're marking
> > > their arguments as PTR_MAYBE_NULL, since most of them won't be
> > > performing any NULL checks at all.
> > >
> > > > > > +       mask = mask_raw_tp_reg(env, reg);
> > > > > >         if (ret != PTR_TO_BTF_ID) {
> > > > > >                 /* just mark; */
> > > > > >
> > > > > > @@ -6754,8 +6784,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > > > > >                 clear_trusted_flags(&flag);
> > > > > >         }
> > > > > >
> > > > > > -       if (atype == BPF_READ && value_regno >= 0)
> > > > > > +       if (atype == BPF_READ && value_regno >= 0) {
> > > > > >                 mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
> > > > > > +               /* We've assigned a new type to regno, so don't undo masking. */
> > > > > > +               if (regno == value_regno)
> > > > > > +                       mask = false;
> > > > > > +       }
> > > > > > +       unmask_raw_tp_reg(reg, mask);
> > > >
> > > > Kumar,
> > > >
> > > > I chatted with Andrii offline. All other cases of mask/unmask
> > > > should probably stay raw_tp specific, but it seems we can make
> > > > this particular case to be generic.
> > > > Something like the following:
> > > >
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index 797cf3ed32e0..bbd4c03460e3 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -6703,7 +6703,11 @@ static int check_ptr_to_btf_access(struct
> > > > bpf_verifier_env *env,
> > > >                  */
> > > >                 flag = PTR_UNTRUSTED;
> > > >
> > > > +       } else if (reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED |
> > > > PTR_MAYBE_NULL)) {
> > > > +                       flag |= PTR_MAYBE_NULL;
> > > > +                       goto trusted;
> > > >         } else if (is_trusted_reg(reg) || is_rcu_reg(reg)) {
> > > > +trusted:
> > > >
> > > > With the idea that trusted_or_null stays that way for all prog
> > > > types and bpf_iter__task->task deref stays trusted_or_null
> > > > instead of being downgraded to ptr_to_btf_id without any flags.
> > > > So progs can do few less != null checks.
> > > > Need to think it through.
> > >
> > > Ok. But don't allow passing such pointers into helpers, right?
> > > We do that for raw_tp to preserve compat, but it would just exacerbate
> > > the issue if we start doing it everywhere.
> > > So it's just that dereferencing a _or_null pointer becomes an ok thing to do?
> > > Let me mull over this for a bit.
> > >
> > > I'm not sure whether not doing the NULL check is better or worse
> > > though. On one hand everything will work without checking for NULL, on
> > > the other hand people may also assume the verifier isn't complaining
> > > because the pointer is valid, and then they read data from the pointer
> > > which always ends up being zero, meaning different things for
> > > different kinds of fields.
> > >
> > > Just thinking out loud, but one of the other concerns would be that
> > > we're encouraging people not to do these NULL checks, which means a
> > > potential page fault penalty everytime that pointer _is_ NULL, instead
> > > of a simple branch, which would certainly be a bit expensive. If this
> > > becomes the common case, I think the prog execution latency penalty
> > > will be big. It is something to consider.
> >
> > Ah, no, my bad, this won't be a problem now, as the JIT does emit a
> > branch to check for kernel addresses, but it probably will be if
> > https://lore.kernel.org/bpf/20240619092216.1780946-1-memxor@gmail.com/
> > gets accepted.
>
> I applied this and tried to find out the time it takes to dereference
> a NULL pointer using bpf_ktime_get_ns
> With the patch above: time=3345 ns
> Without (bpf-next right now): time=170 ns
>
> So I guess that means I should probably drop the patch above if we
> decide to allow dereferencing NULL for all programs.

Your concerns are valid.
Accepting deref of trusted_or_null generically will cause these issues
long term. It's indeed better to make programmers add explicit !=NULL
to their programs.
So scratch my earlier suggestion. Let's keep this patch as-is with special
hack for raw_tp only that we will hopefully resolve later
either with __nullable suffix or compiler detection of nullability.
The latter we discussed briefly with Eduard.
It's doable to teach llvm to emit btf_tag to aid the verifier.
gcc may catch up eventually.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL
  2024-11-03 17:37             ` Alexei Starovoitov
@ 2024-11-06 20:17               ` Andrii Nakryiko
  0 siblings, 0 replies; 19+ messages in thread
From: Andrii Nakryiko @ 2024-11-06 20:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kumar Kartikeya Dwivedi, bpf, kkd, Juri Lelli, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Steven Rostedt, Jiri Olsa, Kernel Team

On Sun, Nov 3, 2024 at 9:37 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sun, Nov 3, 2024 at 9:01 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > On Sun, 3 Nov 2024 at 10:40, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > >
> > > On Sun, 3 Nov 2024 at 10:16, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > > >
> > > > On Fri, 1 Nov 2024 at 17:56, Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Fri, Nov 1, 2024 at 12:16 PM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > >
> > > > > > > @@ -6693,7 +6709,21 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > > > > > >
> > > > > > >         if (ret < 0)
> > > > > > >                 return ret;
> > > > > > > -
> > > > > > > +       /* For raw_tp progs, we allow dereference of PTR_MAYBE_NULL
> > > > > > > +        * trusted PTR_TO_BTF_ID, these are the ones that are possibly
> > > > > > > +        * arguments to the raw_tp. Since internal checks in for trusted
> > > > > > > +        * reg in check_ptr_to_btf_access would consider PTR_MAYBE_NULL
> > > > > > > +        * modifier as problematic, mask it out temporarily for the
> > > > > > > +        * check. Don't apply this to pointers with ref_obj_id > 0, as
> > > > > > > +        * those won't be raw_tp args.
> > > > > > > +        *
> > > > > > > +        * We may end up applying this relaxation to other trusted
> > > > > > > +        * PTR_TO_BTF_ID with maybe null flag, since we cannot
> > > > > > > +        * distinguish PTR_MAYBE_NULL tagged for arguments vs normal
> > > > > > > +        * tagging, but that should expand allowed behavior, and not
> > > > > > > +        * cause regression for existing behavior.
> > > > > > > +        */
> > > > > >
> > > > > > Yeah, I'm not sure why this has to be raw tp-specific?.. What's wrong
> > > > > > with the same behavior for BPF iterator programs, for example?
> > > > > >
> > > > > > It seems nicer if we can avoid this temporary masking and instead
> > > > > > support this as a generic functionality? Or are there complications?
> > > > > >
> > > >
> > > > We _can_ do this for all programs. The thought process here was to
> > > > leave existing raw_tp programs unbroken if possible if we're marking
> > > > their arguments as PTR_MAYBE_NULL, since most of them won't be
> > > > performing any NULL checks at all.
> > > >
> > > > > > > +       mask = mask_raw_tp_reg(env, reg);
> > > > > > >         if (ret != PTR_TO_BTF_ID) {
> > > > > > >                 /* just mark; */
> > > > > > >
> > > > > > > @@ -6754,8 +6784,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > > > > > >                 clear_trusted_flags(&flag);
> > > > > > >         }
> > > > > > >
> > > > > > > -       if (atype == BPF_READ && value_regno >= 0)
> > > > > > > +       if (atype == BPF_READ && value_regno >= 0) {
> > > > > > >                 mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
> > > > > > > +               /* We've assigned a new type to regno, so don't undo masking. */
> > > > > > > +               if (regno == value_regno)
> > > > > > > +                       mask = false;
> > > > > > > +       }
> > > > > > > +       unmask_raw_tp_reg(reg, mask);
> > > > >
> > > > > Kumar,
> > > > >
> > > > > I chatted with Andrii offline. All other cases of mask/unmask
> > > > > should probably stay raw_tp specific, but it seems we can make
> > > > > this particular case to be generic.
> > > > > Something like the following:
> > > > >
> > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > index 797cf3ed32e0..bbd4c03460e3 100644
> > > > > --- a/kernel/bpf/verifier.c
> > > > > +++ b/kernel/bpf/verifier.c
> > > > > @@ -6703,7 +6703,11 @@ static int check_ptr_to_btf_access(struct
> > > > > bpf_verifier_env *env,
> > > > >                  */
> > > > >                 flag = PTR_UNTRUSTED;
> > > > >
> > > > > +       } else if (reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED |
> > > > > PTR_MAYBE_NULL)) {
> > > > > +                       flag |= PTR_MAYBE_NULL;
> > > > > +                       goto trusted;
> > > > >         } else if (is_trusted_reg(reg) || is_rcu_reg(reg)) {
> > > > > +trusted:
> > > > >
> > > > > With the idea that trusted_or_null stays that way for all prog
> > > > > types and bpf_iter__task->task deref stays trusted_or_null
> > > > > instead of being downgraded to ptr_to_btf_id without any flags.
> > > > > So progs can do few less != null checks.
> > > > > Need to think it through.
> > > >
> > > > Ok. But don't allow passing such pointers into helpers, right?
> > > > We do that for raw_tp to preserve compat, but it would just exacerbate
> > > > the issue if we start doing it everywhere.
> > > > So it's just that dereferencing a _or_null pointer becomes an ok thing to do?
> > > > Let me mull over this for a bit.
> > > >
> > > > I'm not sure whether not doing the NULL check is better or worse
> > > > though. On one hand everything will work without checking for NULL, on
> > > > the other hand people may also assume the verifier isn't complaining
> > > > because the pointer is valid, and then they read data from the pointer
> > > > which always ends up being zero, meaning different things for
> > > > different kinds of fields.
> > > >
> > > > Just thinking out loud, but one of the other concerns would be that
> > > > we're encouraging people not to do these NULL checks, which means a
> > > > potential page fault penalty everytime that pointer _is_ NULL, instead
> > > > of a simple branch, which would certainly be a bit expensive. If this
> > > > becomes the common case, I think the prog execution latency penalty
> > > > will be big. It is something to consider.
> > >
> > > Ah, no, my bad, this won't be a problem now, as the JIT does emit a
> > > branch to check for kernel addresses, but it probably will be if
> > > https://lore.kernel.org/bpf/20240619092216.1780946-1-memxor@gmail.com/
> > > gets accepted.
> >
> > I applied this and tried to find out the time it takes to dereference
> > a NULL pointer using bpf_ktime_get_ns
> > With the patch above: time=3345 ns
> > Without (bpf-next right now): time=170 ns
> >
> > So I guess that means I should probably drop the patch above if we
> > decide to allow dereferencing NULL for all programs.
>
> Your concerns are valid.
> Accepting deref of trusted_or_null generically will cause these issues
> long term. It's indeed better to make programmers add explicit !=NULL
> to their programs.

Agreed. Not a huge fan of mutating reg->type in place with
mask/unmask, but it's fine overall. I think we could have removed half
of unmasks because they are done in error cases, at which point it
doesn't matter, but that's a minor complaint.

> So scratch my earlier suggestion. Let's keep this patch as-is with special
> hack for raw_tp only that we will hopefully resolve later
> either with __nullable suffix or compiler detection of nullability.
> The latter we discussed briefly with Eduard.
> It's doable to teach llvm to emit btf_tag to aid the verifier.
> gcc may catch up eventually.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments
  2024-11-02  0:32       ` Eduard Zingerman
@ 2024-11-08  5:08         ` Eduard Zingerman
  2024-11-08 20:13           ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Eduard Zingerman @ 2024-11-08  5:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Puranjay Mohan, Yonghong Song
  Cc: Kumar Kartikeya Dwivedi, bpf, kkd, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Steven Rostedt,
	Jiri Olsa, Juri Lelli, Kernel Team

On Fri, 2024-11-01 at 17:32 -0700, Eduard Zingerman wrote:
> On Fri, 2024-11-01 at 17:29 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> > Hmm.
> > Puranjay touched it last with extra logic.
> >
> > And before that David Vernet tried to address flakiness
> > in commit 4a54de65964d.
> > Yonghong also noticed lockups in paravirt
> > and added workaround 7015843afc.
> >
> > Your additional timeout/workaround makes sense to me,
> > but would be good to bisect whether Puranjay's change caused it.
>
> I'll debug what's going on some time later today or on Sat.

I finally had time to investigate this a bit.
First, here is how to trigger lockup:

  t1=send_signal/send_signal_perf_thread_remote; \
  t2=send_signal/send_signal_nmi_thread_remote; \
  for i in $(seq 1 100); do ./test_progs -t $t1,$t2; done

Must be both tests for whatever reason.
The failing test is 'send_signal_nmi_thread_remote'.

The test is organized as parent and child processes communicating
various events to each other. The intended sequence of events:
- child:
  - install SIGUSR1 handler
  - notify parent
  - wait for parent
- parent:
  - open PERF_COUNT_SW_CPU_CLOCK event
  - attach BPF program to the event
  - notify child
  - enter busy loop for 10^8 iterations
  - wait for child
- BPF program:
  - send SIGUSR1 to child
- child:
  - poll for SIGUSR1 in a busy loop
  - notify parent
- parent:
  - check value communicated by child,
    terminate test.

The lockup happens because on every other test run perf event is not
triggered, child does not receive SIGUSR1 and thus both parent and
child are stuck.

For 'send_signal_nmi_thread_remote' perf event is defined as:

	struct perf_event_attr attr = {
		.sample_period = 1,
		.type = PERF_TYPE_HARDWARE,
		.config = PERF_COUNT_HW_CPU_CYCLES,
	};

And is opened for parent process pid.

Apparently, the perf event is not always triggered between lines
send_signal.c:165-180. And at line 180 parent enters system call,
so cpu cycles stop ticking for 'parent', thus if perf event
had not been triggered already it won't be triggered at all
(as far as I understand).

Applying same fix as Yonghong did in 7015843afc is sufficient to
reliably trigger perf event:

--- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -223,7 +223,8 @@ static void test_send_signal_perf(bool signal_thread, bool remote)
 static void test_send_signal_nmi(bool signal_thread, bool remote)
 {
        struct perf_event_attr attr = {
-               .sample_period = 1,
+               .freq = 1,
+               .sample_freq = 1000,
                .type = PERF_TYPE_HARDWARE,
                .config = PERF_COUNT_HW_CPU_CYCLES,
        };

But I don't understand why.
As far as I can figure from kernel source code,
sample_period is measured in nanoseconds (is it?),
so busy loop at send_signal.c:174-175 should run long enough for perf
event to be triggered before.

Can someone with understanding of how perf event work explain why
above change helps?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments
  2024-11-08  5:08         ` Eduard Zingerman
@ 2024-11-08 20:13           ` Alexei Starovoitov
  2024-11-08 21:57             ` Yonghong Song
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2024-11-08 20:13 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Puranjay Mohan, Yonghong Song, Kumar Kartikeya Dwivedi, bpf, kkd,
	Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Steven Rostedt, Jiri Olsa, Juri Lelli, Kernel Team

On Thu, Nov 7, 2024 at 9:08 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2024-11-01 at 17:32 -0700, Eduard Zingerman wrote:
> > On Fri, 2024-11-01 at 17:29 -0700, Alexei Starovoitov wrote:
> >
> > [...]
> >
> > > Hmm.
> > > Puranjay touched it last with extra logic.
> > >
> > > And before that David Vernet tried to address flakiness
> > > in commit 4a54de65964d.
> > > Yonghong also noticed lockups in paravirt
> > > and added workaround 7015843afc.
> > >
> > > Your additional timeout/workaround makes sense to me,
> > > but would be good to bisect whether Puranjay's change caused it.
> >
> > I'll debug what's going on some time later today or on Sat.
>
> I finally had time to investigate this a bit.
> First, here is how to trigger lockup:
>
>   t1=send_signal/send_signal_perf_thread_remote; \
>   t2=send_signal/send_signal_nmi_thread_remote; \
>   for i in $(seq 1 100); do ./test_progs -t $t1,$t2; done
>
> Must be both tests for whatever reason.
> The failing test is 'send_signal_nmi_thread_remote'.
>
> The test is organized as parent and child processes communicating
> various events to each other. The intended sequence of events:
> - child:
>   - install SIGUSR1 handler
>   - notify parent
>   - wait for parent
> - parent:
>   - open PERF_COUNT_SW_CPU_CLOCK event
>   - attach BPF program to the event
>   - notify child
>   - enter busy loop for 10^8 iterations
>   - wait for child
> - BPF program:
>   - send SIGUSR1 to child
> - child:
>   - poll for SIGUSR1 in a busy loop
>   - notify parent
> - parent:
>   - check value communicated by child,
>     terminate test.
>
> The lockup happens because on every other test run perf event is not
> triggered, child does not receive SIGUSR1 and thus both parent and
> child are stuck.
>
> For 'send_signal_nmi_thread_remote' perf event is defined as:
>
>         struct perf_event_attr attr = {
>                 .sample_period = 1,
>                 .type = PERF_TYPE_HARDWARE,
>                 .config = PERF_COUNT_HW_CPU_CYCLES,
>         };
>
> And is opened for parent process pid.
>
> Apparently, the perf event is not always triggered between lines
> send_signal.c:165-180. And at line 180 parent enters system call,
> so cpu cycles stop ticking for 'parent', thus if perf event
> had not been triggered already it won't be triggered at all
> (as far as I understand).
>
> Applying same fix as Yonghong did in 7015843afc is sufficient to
> reliably trigger perf event:
>
> --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
> +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> @@ -223,7 +223,8 @@ static void test_send_signal_perf(bool signal_thread, bool remote)
>  static void test_send_signal_nmi(bool signal_thread, bool remote)
>  {
>         struct perf_event_attr attr = {
> -               .sample_period = 1,
> +               .freq = 1,
> +               .sample_freq = 1000,
>                 .type = PERF_TYPE_HARDWARE,
>                 .config = PERF_COUNT_HW_CPU_CYCLES,
>         };
>
> But I don't understand why.
> As far as I can figure from kernel source code,
> sample_period is measured in nanoseconds (is it?),

I believe sample_period is a number of samples.
1 means that perf suppose to generate event very often.
It means nanoseconds only for SW cpu_cycles.

let's apply above workaround and move on. Pls send a patch.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments
  2024-11-08 20:13           ` Alexei Starovoitov
@ 2024-11-08 21:57             ` Yonghong Song
  0 siblings, 0 replies; 19+ messages in thread
From: Yonghong Song @ 2024-11-08 21:57 UTC (permalink / raw)
  To: Alexei Starovoitov, Eduard Zingerman
  Cc: Puranjay Mohan, Kumar Kartikeya Dwivedi, bpf, kkd,
	Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Steven Rostedt, Jiri Olsa, Juri Lelli, Kernel Team




On 11/8/24 12:13 PM, Alexei Starovoitov wrote:
> On Thu, Nov 7, 2024 at 9:08 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>> On Fri, 2024-11-01 at 17:32 -0700, Eduard Zingerman wrote:
>>> On Fri, 2024-11-01 at 17:29 -0700, Alexei Starovoitov wrote:
>>>
>>> [...]
>>>
>>>> Hmm.
>>>> Puranjay touched it last with extra logic.
>>>>
>>>> And before that David Vernet tried to address flakiness
>>>> in commit 4a54de65964d.
>>>> Yonghong also noticed lockups in paravirt
>>>> and added workaround 7015843afc.
>>>>
>>>> Your additional timeout/workaround makes sense to me,
>>>> but would be good to bisect whether Puranjay's change caused it.
>>> I'll debug what's going on some time later today or on Sat.
>> I finally had time to investigate this a bit.
>> First, here is how to trigger lockup:
>>
>>    t1=send_signal/send_signal_perf_thread_remote; \
>>    t2=send_signal/send_signal_nmi_thread_remote; \
>>    for i in $(seq 1 100); do ./test_progs -t $t1,$t2; done
>>
>> Must be both tests for whatever reason.
>> The failing test is 'send_signal_nmi_thread_remote'.
>>
>> The test is organized as parent and child processes communicating
>> various events to each other. The intended sequence of events:
>> - child:
>>    - install SIGUSR1 handler
>>    - notify parent
>>    - wait for parent
>> - parent:
>>    - open PERF_COUNT_SW_CPU_CLOCK event
>>    - attach BPF program to the event
>>    - notify child
>>    - enter busy loop for 10^8 iterations
>>    - wait for child
>> - BPF program:
>>    - send SIGUSR1 to child
>> - child:
>>    - poll for SIGUSR1 in a busy loop
>>    - notify parent
>> - parent:
>>    - check value communicated by child,
>>      terminate test.
>>
>> The lockup happens because on every other test run perf event is not
>> triggered, child does not receive SIGUSR1 and thus both parent and
>> child are stuck.
>>
>> For 'send_signal_nmi_thread_remote' perf event is defined as:
>>
>>          struct perf_event_attr attr = {
>>                  .sample_period = 1,
>>                  .type = PERF_TYPE_HARDWARE,
>>                  .config = PERF_COUNT_HW_CPU_CYCLES,
>>          };
>>
>> And is opened for parent process pid.
>>
>> Apparently, the perf event is not always triggered between lines
>> send_signal.c:165-180. And at line 180 parent enters system call,
>> so cpu cycles stop ticking for 'parent', thus if perf event
>> had not been triggered already it won't be triggered at all
>> (as far as I understand).
>>
>> Applying same fix as Yonghong did in 7015843afc is sufficient to
>> reliably trigger perf event:
>>
>> --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
>> @@ -223,7 +223,8 @@ static void test_send_signal_perf(bool signal_thread, bool remote)
>>   static void test_send_signal_nmi(bool signal_thread, bool remote)
>>   {
>>          struct perf_event_attr attr = {
>> -               .sample_period = 1,
>> +               .freq = 1,
>> +               .sample_freq = 1000,
>>                  .type = PERF_TYPE_HARDWARE,
>>                  .config = PERF_COUNT_HW_CPU_CYCLES,
>>          };
>>
>> But I don't understand why.
>> As far as I can figure from kernel source code,
>> sample_period is measured in nanoseconds (is it?),
> I believe sample_period is a number of samples.
> 1 means that perf suppose to generate event very often.
> It means nanoseconds only for SW cpu_cycles.
>
> let's apply above workaround and move on. Pls send a patch.

sample_period = 1 intends to make sure we can get at one sample
since many samples will be generated. But too many samples may
cause *rare* issues in internal kernel logic in certain *corner*
cases. Agree with Alexei, let us just use a reasonable sample_freq
to fix the issue. Thanks!



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-11-08 21:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-01  0:00 [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments Kumar Kartikeya Dwivedi
2024-11-01  0:00 ` [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL Kumar Kartikeya Dwivedi
2024-11-01 19:16   ` Andrii Nakryiko
2024-11-01 22:55     ` Alexei Starovoitov
2024-11-03 16:16       ` Kumar Kartikeya Dwivedi
2024-11-03 16:40         ` Kumar Kartikeya Dwivedi
2024-11-03 17:00           ` Kumar Kartikeya Dwivedi
2024-11-03 17:37             ` Alexei Starovoitov
2024-11-06 20:17               ` Andrii Nakryiko
2024-11-01  0:00 ` [PATCH bpf-next v1 2/2] selftests/bpf: Add tests for raw_tp null handling Kumar Kartikeya Dwivedi
2024-11-01 19:19   ` Andrii Nakryiko
2024-11-03 15:58     ` Kumar Kartikeya Dwivedi
2024-11-01 13:18 ` [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments Kumar Kartikeya Dwivedi
2024-11-02  0:21   ` Eduard Zingerman
2024-11-02  0:29     ` Alexei Starovoitov
2024-11-02  0:32       ` Eduard Zingerman
2024-11-08  5:08         ` Eduard Zingerman
2024-11-08 20:13           ` Alexei Starovoitov
2024-11-08 21:57             ` Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox