bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v4 0/4] bpf: Add kfuncs for read-only string operations
@ 2025-05-07  6:40 Viktor Malik
  2025-05-07  6:40 ` [PATCH bpf-next v4 1/4] bpf: Teach vefier to handle const ptrs as args to kfuncs Viktor Malik
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Viktor Malik @ 2025-05-07  6:40 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Viktor Malik

String operations are commonly used in programming and BPF programs are
no exception. Since it is cumbersome to reimplement them over and over,
this series introduce kfuncs which provide the most common operations.
For now, we only limit ourselves to functions which do not copy memory
since these usually introduce undefined behaviour in case the
source/destination buffers overlap which would have to be prevented by
the verifier.

The kernel already contains implementations for all of these, however,
it is not possible to use them from BPF context. The main reason is that
the verifier is not able to check that it is safe to access the entire
string and that the string is null-terminated and the function won't
loop forever. Therefore, the operations are open-coded using
__get_kernel_nofault instead of plain dereference to make them safe. 

The first patch of the series teaches the verifier to allow read-only
memory to be passed to const arguments of kfuncs. With that change,
the added kfuncs take strings in three forms:
- string literals (i.e. pointers to read-only maps)
- global variables (i.e. pointers to read-write maps)
- stack-allocated buffers

Note that currently, it is not possible to pass strings from the BPF
program context (like function args) as the verifier doesn't treat them
as neither PTR_TO_MEM nor PTR_TO_BTF_ID.

Changes in v4 (all suggested by Andrii):
- Open-code all the kfuncs, not just the unbounded variants.
- Introduce `pagefault` lock guard to simplify the implementation
- Return appropriate error codes (-E2BIG and -EFAULT) on failures
- Const-ify all arguments and return values
- Add negative test-cases

Viktor Malik (4):
  bpf: Teach vefier to handle const ptrs as args to kfuncs
  uaccess: Define pagefault lock guard
  bpf: Add kfuncs for read-only string operations
  selftests/bpf: Add tests for string kfuncs

 include/linux/btf.h                           |   5 +
 include/linux/uaccess.h                       |   2 +
 kernel/bpf/helpers.c                          | 440 ++++++++++++++++++
 kernel/bpf/verifier.c                         |  10 +-
 .../selftests/bpf/prog_tests/string_kfuncs.c  |  65 +++
 .../bpf/progs/string_kfuncs_failure1.c        |  51 ++
 .../bpf/progs/string_kfuncs_failure2.c        |  24 +
 .../bpf/progs/string_kfuncs_success.c         |  87 ++++
 8 files changed, 680 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/string_kfuncs.c
 create mode 100644 tools/testing/selftests/bpf/progs/string_kfuncs_failure1.c
 create mode 100644 tools/testing/selftests/bpf/progs/string_kfuncs_failure2.c
 create mode 100644 tools/testing/selftests/bpf/progs/string_kfuncs_success.c

-- 
2.49.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v4 1/4] bpf: Teach vefier to handle const ptrs as args to kfuncs
  2025-05-07  6:40 [PATCH bpf-next v4 0/4] bpf: Add kfuncs for read-only string operations Viktor Malik
@ 2025-05-07  6:40 ` Viktor Malik
  2025-05-08  9:08   ` Matt Bobrowski
  2025-05-07  6:40 ` [PATCH bpf-next v4 2/4] uaccess: Define pagefault lock guard Viktor Malik
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Viktor Malik @ 2025-05-07  6:40 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Viktor Malik

When a kfunc takes a const pointer as an argument, the verifier should
not check that the memory can be accessed for writing as that may lead
to rejecting safe programs. Extend the verifier to detect such arguments
and skip the write access check for them.

The use-case for this change is passing string literals (i.e. read-only
maps) to read-only string kfuncs.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
---
 include/linux/btf.h   |  5 +++++
 kernel/bpf/verifier.c | 10 ++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index ebc0c0c9b944..5cb06c65d91f 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -391,6 +391,11 @@ static inline bool btf_type_is_type_tag(const struct btf_type *t)
 	return BTF_INFO_KIND(t->info) == BTF_KIND_TYPE_TAG;
 }
 
+static inline bool btf_type_is_const(const struct btf_type *t)
+{
+	return BTF_INFO_KIND(t->info) == BTF_KIND_CONST;
+}
+
 /* union is only a special case of struct:
  * all its offsetof(member) == 0
  */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 54c6953a8b84..e2d74c4d44c1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8186,7 +8186,7 @@ static int check_mem_size_reg(struct bpf_verifier_env *env,
 }
 
 static int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
-			 u32 regno, u32 mem_size)
+			 u32 regno, u32 mem_size, bool read_only)
 {
 	bool may_be_null = type_may_be_null(reg->type);
 	struct bpf_reg_state saved_reg;
@@ -8205,7 +8205,8 @@ static int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg
 	}
 
 	err = check_helper_mem_access(env, regno, mem_size, BPF_READ, true, NULL);
-	err = err ?: check_helper_mem_access(env, regno, mem_size, BPF_WRITE, true, NULL);
+	if (!read_only)
+		err = err ?: check_helper_mem_access(env, regno, mem_size, BPF_WRITE, true, NULL);
 
 	if (may_be_null)
 		*reg = saved_reg;
@@ -10361,7 +10362,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
 			ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE);
 			if (ret < 0)
 				return ret;
-			if (check_mem_reg(env, reg, regno, arg->mem_size))
+			if (check_mem_reg(env, reg, regno, arg->mem_size, false))
 				return -EINVAL;
 			if (!(arg->arg_type & PTR_MAYBE_NULL) && (reg->type & PTR_MAYBE_NULL)) {
 				bpf_log(log, "arg#%d is expected to be non-NULL\n", i);
@@ -13252,7 +13253,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 					i, btf_type_str(ref_t), ref_tname, PTR_ERR(resolve_ret));
 				return -EINVAL;
 			}
-			ret = check_mem_reg(env, reg, regno, type_size);
+			ret = check_mem_reg(env, reg, regno, type_size,
+					    btf_type_is_const(btf_type_by_id(btf, t->type)));
 			if (ret < 0)
 				return ret;
 			break;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v4 2/4] uaccess: Define pagefault lock guard
  2025-05-07  6:40 [PATCH bpf-next v4 0/4] bpf: Add kfuncs for read-only string operations Viktor Malik
  2025-05-07  6:40 ` [PATCH bpf-next v4 1/4] bpf: Teach vefier to handle const ptrs as args to kfuncs Viktor Malik
@ 2025-05-07  6:40 ` Viktor Malik
  2025-05-08 10:00   ` Matt Bobrowski
  2025-05-07  6:40 ` [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations Viktor Malik
  2025-05-07  6:40 ` [PATCH bpf-next v4 4/4] selftests/bpf: Add tests for string kfuncs Viktor Malik
  3 siblings, 1 reply; 25+ messages in thread
From: Viktor Malik @ 2025-05-07  6:40 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	linux-kernel, Viktor Malik

Define a pagefault lock guard which allows to simplify functions that
need to disable page faults.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
---
 include/linux/uaccess.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 7c06f4795670..1beb5b395d81 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -296,6 +296,8 @@ static inline bool pagefault_disabled(void)
  */
 #define faulthandler_disabled() (pagefault_disabled() || in_atomic())
 
+DEFINE_LOCK_GUARD_0(pagefault, pagefault_disable(), pagefault_enable())
+
 #ifndef CONFIG_ARCH_HAS_SUBPAGE_FAULTS
 
 /**
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations
  2025-05-07  6:40 [PATCH bpf-next v4 0/4] bpf: Add kfuncs for read-only string operations Viktor Malik
  2025-05-07  6:40 ` [PATCH bpf-next v4 1/4] bpf: Teach vefier to handle const ptrs as args to kfuncs Viktor Malik
  2025-05-07  6:40 ` [PATCH bpf-next v4 2/4] uaccess: Define pagefault lock guard Viktor Malik
@ 2025-05-07  6:40 ` Viktor Malik
  2025-05-08  9:41   ` Matt Bobrowski
  2025-05-09 18:20   ` Andrii Nakryiko
  2025-05-07  6:40 ` [PATCH bpf-next v4 4/4] selftests/bpf: Add tests for string kfuncs Viktor Malik
  3 siblings, 2 replies; 25+ messages in thread
From: Viktor Malik @ 2025-05-07  6:40 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Viktor Malik

String operations are commonly used so this exposes the most common ones
to BPF programs. For now, we limit ourselves to operations which do not
copy memory around.

Unfortunately, most in-kernel implementations assume that strings are
%NUL-terminated, which is not necessarily true, and therefore we cannot
use them directly in the BPF context. Instead, we open-code them using
__get_kernel_nofault instead of plain dereference to make them safe and
limit the strings length to XATTR_SIZE_MAX to make sure the functions
terminate. When __get_kernel_nofault fails, functions return -EFAULT.
Similarly, when the size bound is reached, the functions return -E2BIG.

At the moment, strings can be passed to the kfuncs in three forms:
- string literals (i.e. pointers to read-only maps)
- global variables (i.e. pointers to read-write maps)
- stack-allocated buffers

Note that currently, it is not possible to pass strings from the BPF
program context (like function args) as the verifier doesn't treat them
as neither PTR_TO_MEM nor PTR_TO_BTF_ID.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
---
 kernel/bpf/helpers.c | 440 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 440 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e3a2662f4e33..8fb7c2ca7ac0 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -23,6 +23,7 @@
 #include <linux/btf_ids.h>
 #include <linux/bpf_mem_alloc.h>
 #include <linux/kasan.h>
+#include <linux/uaccess.h>
 
 #include "../../lib/kstrtox.h"
 
@@ -3194,6 +3195,433 @@ __bpf_kfunc void bpf_local_irq_restore(unsigned long *flags__irq_flag)
 	local_irq_restore(*flags__irq_flag);
 }
 
+/* Kfuncs for string operations.
+ *
+ * Since strings are not necessarily %NUL-terminated, we cannot directly call
+ * in-kernel implementations. Instead, we open-code the implementations using
+ * __get_kernel_nofault instead of plain dereference to make them safe.
+ */
+
+/**
+ * bpf_strcmp - Compare two strings
+ * @s1: One string
+ * @s2: Another string
+ *
+ * Return:
+ * * %0       - Strings are equal
+ * * %-1      - @s1 is smaller
+ * * %1       - @s2 is smaller
+ * * %-EFAULT - Cannot read one of the strings
+ * * %-E2BIG  - One of strings is too large
+ */
+__bpf_kfunc int bpf_strcmp(const char *s1, const char *s2)
+{
+	char c1, c2;
+	int i;
+
+	if (!s1 || !s2)
+		return -EFAULT;
+
+	guard(pagefault)();
+	for (i = 0; i < XATTR_SIZE_MAX; i++) {
+		__get_kernel_nofault(&c1, s1, char, err_out);
+		__get_kernel_nofault(&c2, s2, char, err_out);
+		if (c1 != c2)
+			return c1 < c2 ? -1 : 1;
+		if (c1 == '\0')
+			return 0;
+		s1++;
+		s2++;
+	}
+	return -E2BIG;
+err_out:
+	return -EFAULT;
+}
+
+/**
+ * bpf_strchr - Find the first occurrence of a character in a string
+ * @s: The string to be searched
+ * @c: The character to search for
+ *
+ * Note that the %NUL-terminator is considered part of the string, and can
+ * be searched for.
+ *
+ * Return:
+ * * const char * - Pointer to the first occurrence of @c within @s
+ * * %NULL        - @c not found in @s
+ * * %-EFAULT     - Cannot read @s
+ * * %-E2BIG      - @s too large
+ */
+__bpf_kfunc const char *bpf_strchr(const char *s, char c)
+{
+	char sc;
+	int i;
+
+	if (!s)
+		return ERR_PTR(-EFAULT);
+
+	guard(pagefault)();
+	for (i = 0; i < XATTR_SIZE_MAX; i++) {
+		__get_kernel_nofault(&sc, s, char, err_out);
+		if (sc == c)
+			return s;
+		if (sc == '\0')
+			return NULL;
+		s++;
+	}
+	return ERR_PTR(-E2BIG);
+err_out:
+	return ERR_PTR(-EFAULT);
+}
+
+/**
+ * bpf_strnchr - Find a character in a length limited string
+ * @s: The string to be searched
+ * @count: The number of characters to be searched
+ * @c: The character to search for
+ *
+ * Note that the %NUL-terminator is considered part of the string, and can
+ * be searched for.
+ *
+ * Return:
+ * * const char * - Pointer to the first occurrence of @c within @s
+ * * %NULL        - @c not found in the first @count characters of @s
+ * * %-EFAULT     - Cannot read @s
+ * * %-E2BIG      - @s too large
+ */
+__bpf_kfunc const char *bpf_strnchr(const char *s, size_t count, char c)
+{
+	char sc;
+	int i;
+
+	if (!s)
+		return ERR_PTR(-EFAULT);
+
+	guard(pagefault)();
+	for (i = 0; i < count && i < XATTR_SIZE_MAX; i++) {
+		__get_kernel_nofault(&sc, s, char, err_out);
+		if (sc == c)
+			return s;
+		if (sc == '\0')
+			return NULL;
+		s++;
+	}
+	return i == XATTR_SIZE_MAX ? ERR_PTR(-E2BIG) : NULL;
+err_out:
+	return ERR_PTR(-EFAULT);
+}
+
+/**
+ * bpf_strchrnul - Find and return a character in a string, or end of string
+ * @s: The string to be searched
+ * @c: The character to search for
+ *
+ * Return:
+ * * const char * - Pointer to the first occurrence of @c within @s or pointer
+ *                  to the null byte at the end of @s when @c is not found
+ * * %-EFAULT     - Cannot read @s
+ * * %-E2BIG      - @s too large
+ */
+__bpf_kfunc const char *bpf_strchrnul(const char *s, char c)
+{
+	char sc;
+	int i;
+
+	if (!s)
+		return ERR_PTR(-EFAULT);
+
+	guard(pagefault)();
+	for (i = 0; i < XATTR_SIZE_MAX; i++) {
+		__get_kernel_nofault(&sc, s, char, err_out);
+		if (sc == '\0' || sc == c)
+			return s;
+		s++;
+	}
+	return ERR_PTR(-E2BIG);
+err_out:
+	return ERR_PTR(-EFAULT);
+}
+
+/**
+ * bpf_strrchr - Find the last occurrence of a character in a string
+ * @s: The string to be searched
+ * @c: The character to search for
+ *
+ * Return:
+ * * const char * - Pointer to the last occurrence of @c within @s
+ * * %NULL        - @c not found in @s
+ * * %-EFAULT     - Cannot read @s
+ * * %-E2BIG      - @s too large
+ */
+__bpf_kfunc const char *bpf_strrchr(const char *s, int c)
+{
+	const char *last = NULL;
+	char sc;
+	int i;
+
+	if (!s)
+		return ERR_PTR(-EFAULT);
+
+	guard(pagefault)();
+	for (i = 0; i < XATTR_SIZE_MAX; i++) {
+		__get_kernel_nofault(&sc, s, char, err_out);
+		if (sc == '\0')
+			return last;
+		if (sc == c)
+			last = s;
+		s++;
+	}
+	return ERR_PTR(-E2BIG);
+err_out:
+	return ERR_PTR(-EFAULT);
+}
+
+/**
+ * bpf_strlen - Calculate the length of a string
+ * @s: The string
+ *
+ * Return:
+ * * >=0      - The length of @s
+ * * %-EFAULT - Cannot read @s
+ * * %-E2BIG  - @s too large
+ */
+__bpf_kfunc int bpf_strlen(const char *s)
+{
+	char c;
+	int i;
+
+	if (!s)
+		return -EFAULT;
+
+	guard(pagefault)();
+	for (i = 0; i < XATTR_SIZE_MAX; i++) {
+		__get_kernel_nofault(&c, s, char, err_out);
+		if (c == '\0')
+			return i;
+		s++;
+	}
+	return -E2BIG;
+err_out:
+	return -EFAULT;
+}
+
+/**
+ * bpf_strlen - Calculate the length of a length-limited string
+ * @s: The string
+ * @count: The maximum number of characters to count
+ *
+ * Return:
+ * * >=0      - The length of @s
+ * * %-EFAULT - Cannot read @s
+ * * %-E2BIG  - @s too large
+ */
+__bpf_kfunc int bpf_strnlen(const char *s, size_t count)
+{
+	char c;
+	int i;
+
+	if (!s)
+		return -EFAULT;
+
+	guard(pagefault)();
+	for (i = 0; i < count && i < XATTR_SIZE_MAX; i++) {
+		__get_kernel_nofault(&c, s, char, err_out);
+		if (c == '\0')
+			return i;
+		s++;
+	}
+	return i == XATTR_SIZE_MAX ? -E2BIG : i;
+err_out:
+	return -EFAULT;
+}
+
+/**
+ * bpf_strspn - Calculate the length of the initial substring of @s which only
+ *              contains letters in @accept
+ * @s: The string to be searched
+ * @accept: The string to search for
+ *
+ * Return:
+ * * >=0      - The length of the initial substring of @s which only contains
+ *              letter in @accept
+ * * %-EFAULT - Cannot read @s
+ * * %-E2BIG  - @s too large
+ */
+__bpf_kfunc int bpf_strspn(const char *s, const char *accept)
+{
+	const char *p;
+	char c;
+	int i;
+
+	if (!s || !accept)
+		return -EFAULT;
+
+	guard(pagefault)();
+	for (i = 0; i < XATTR_SIZE_MAX; i++) {
+		__get_kernel_nofault(&c, s, char, err_out);
+		p = bpf_strchr(accept, c);
+		if (IS_ERR(p))
+			return PTR_ERR(p);
+		if (c == '\0' || !p)
+			return i;
+		s++;
+	}
+	return -E2BIG;
+err_out:
+	return -EFAULT;
+}
+
+/**
+ * strcspn - Calculate the length of the initial substring of @s which does not
+ *           contain letters in @reject
+ * @s: The string to be searched
+ * @reject: The string to avoid
+ *
+ * Return:
+ * * >=0      - The length of the initial substring of @s which does not contain
+ *              letters from @reject
+ * * %-EFAULT - Cannot read @s
+ * * %-E2BIG  - @s too large
+ */
+__bpf_kfunc int bpf_strcspn(const char *s, const char *reject)
+{
+	const char *p;
+	char c;
+	int i;
+
+	if (!s || !reject)
+		return -EFAULT;
+
+	guard(pagefault)();
+	for (i = 0; i < XATTR_SIZE_MAX; i++) {
+		__get_kernel_nofault(&c, s, char, err_out);
+		p = bpf_strchr(reject, c);
+		if (IS_ERR(p))
+			return PTR_ERR(p);
+		if (c == '\0' || p)
+			return i;
+		s++;
+	}
+	return -E2BIG;
+err_out:
+	return -EFAULT;
+}
+
+/**
+ * bpf_strpbrk - Find the first occurrence of a set of characters
+ * @s: The string to be searched
+ * @accept: The characters to search for
+ *
+ * Return:
+ * * const char * - Pointer to the first occurrence of a character from @accept
+ *                  within @s
+ * * %NULL        - No character from @accept found in @s
+ * * %-EFAULT     - Cannot read one of the strings
+ * * %-E2BIG      - One of the strings is too large
+ */
+__bpf_kfunc const char *bpf_strpbrk(const char *s, const char *accept)
+{
+	const char *p;
+	char c;
+	int i;
+
+	if (!s || !accept)
+		return ERR_PTR(-EFAULT);
+
+	guard(pagefault)();
+	for (i = 0; i < XATTR_SIZE_MAX; i++) {
+		__get_kernel_nofault(&c, s, char, err_out);
+		if (c == '\0')
+			return NULL;
+		p = bpf_strchr(accept, c);
+		if (IS_ERR(p))
+			return p;
+		if (p)
+			return s;
+		s++;
+	}
+	return ERR_PTR(-E2BIG);
+err_out:
+	return ERR_PTR(-EFAULT);
+}
+
+/**
+ * bpf_strstr - Find the first substring in a string
+ * @s1: The string to be searched
+ * @s2: The string to search for
+ *
+ * Return:
+ * * const char * - Pointer to the first occurrence of @s2 within @s1
+ * * %NULL        - @s2 is not a substring of @s1
+ * * %-EFAULT     - Cannot read one of the strings
+ * * %-E2BIG      - One of the strings is too large
+ */
+__bpf_kfunc const char *bpf_strstr(const char *s1, const char *s2)
+{
+	char c1, c2;
+	int i, j;
+
+	if (!s1 || !s2)
+		return ERR_PTR(-EFAULT);
+
+	guard(pagefault)();
+	for (i = 0; i < XATTR_SIZE_MAX; i++) {
+		for (j = 0; j < XATTR_SIZE_MAX; j++) {
+			__get_kernel_nofault(&c1, s1 + j, char, err_out);
+			__get_kernel_nofault(&c2, s2 + j, char, err_out);
+			if (c2 == '\0')
+				return s1;
+			if (c1 == '\0')
+				return NULL;
+			if (c1 != c2)
+				break;
+		}
+		if (j == XATTR_SIZE_MAX)
+			return ERR_PTR(-E2BIG);
+		s1++;
+	}
+	return ERR_PTR(-E2BIG);
+err_out:
+	return ERR_PTR(-EFAULT);
+}
+
+/**
+ * bpf_strnstr - Find the first substring in a length-limited string
+ * @s1: The string to be searched
+ * @s2: The string to search for
+ * @len: the maximum number of characters to search
+ */
+__bpf_kfunc const char *bpf_strnstr(const char *s1, const char *s2, size_t len)
+{
+	char c1, c2;
+	int i, j;
+
+	if (!s1 || !s2)
+		return ERR_PTR(-EFAULT);
+
+	guard(pagefault)();
+	for (i = 0; i < XATTR_SIZE_MAX; i++) {
+		for (j = 0; i + j < len && j < XATTR_SIZE_MAX; j++) {
+			__get_kernel_nofault(&c1, s1 + j, char, err_out);
+			__get_kernel_nofault(&c2, s2 + j, char, err_out);
+			if (c2 == '\0')
+				return s1;
+			if (c1 == '\0')
+				return NULL;
+			if (c1 != c2)
+				break;
+		}
+		if (j == XATTR_SIZE_MAX)
+			return ERR_PTR(-E2BIG);
+		if (i + j == len)
+			return NULL;
+		s1++;
+	}
+	return ERR_PTR(-E2BIG);
+err_out:
+	return ERR_PTR(-EFAULT);
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(generic_btf_ids)
@@ -3294,6 +3722,18 @@ BTF_ID_FLAGS(func, bpf_iter_kmem_cache_next, KF_ITER_NEXT | KF_RET_NULL | KF_SLE
 BTF_ID_FLAGS(func, bpf_iter_kmem_cache_destroy, KF_ITER_DESTROY | KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_local_irq_save)
 BTF_ID_FLAGS(func, bpf_local_irq_restore)
+BTF_ID_FLAGS(func, bpf_strcmp);
+BTF_ID_FLAGS(func, bpf_strchr, KF_RET_NULL);
+BTF_ID_FLAGS(func, bpf_strchrnul);
+BTF_ID_FLAGS(func, bpf_strnchr, KF_RET_NULL);
+BTF_ID_FLAGS(func, bpf_strrchr, KF_RET_NULL);
+BTF_ID_FLAGS(func, bpf_strlen);
+BTF_ID_FLAGS(func, bpf_strnlen);
+BTF_ID_FLAGS(func, bpf_strspn);
+BTF_ID_FLAGS(func, bpf_strcspn);
+BTF_ID_FLAGS(func, bpf_strpbrk, KF_RET_NULL);
+BTF_ID_FLAGS(func, bpf_strstr, KF_RET_NULL);
+BTF_ID_FLAGS(func, bpf_strnstr, KF_RET_NULL);
 BTF_KFUNCS_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v4 4/4] selftests/bpf: Add tests for string kfuncs
  2025-05-07  6:40 [PATCH bpf-next v4 0/4] bpf: Add kfuncs for read-only string operations Viktor Malik
                   ` (2 preceding siblings ...)
  2025-05-07  6:40 ` [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations Viktor Malik
@ 2025-05-07  6:40 ` Viktor Malik
  3 siblings, 0 replies; 25+ messages in thread
From: Viktor Malik @ 2025-05-07  6:40 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Viktor Malik

Add both positive and negative tests cases using string kfuncs added in
the previous patch.

Positive tests check that the functions work as expected on various
inputs and that they accept strings of various forms.

Negative tests are of two kinds. First, we check that passing invalid
pointers is rejected by the verifier. Second, we check that even though
some arguments are allowed by the verifier, they make the string kfuncs
fail during runtime and return an appropriate error code. Such arguments
include the NULL literal (kfuncs return -EFAULT) and strings longer than
XATTR_SIZE_MAX (kfuncs return -E2BIG).

A majority of the tests use the RUN_TESTS helper which executes BPF
programs with BPF_PROG_TEST_RUN and check for the expected return value.
An exception to this are tests for long strings as we need to set the
strings from userspace and that cannot be done using the RUN_TESTS
infrastructure.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
---
 .../selftests/bpf/prog_tests/string_kfuncs.c  | 65 ++++++++++++++
 .../bpf/progs/string_kfuncs_failure1.c        | 51 +++++++++++
 .../bpf/progs/string_kfuncs_failure2.c        | 24 +++++
 .../bpf/progs/string_kfuncs_success.c         | 87 +++++++++++++++++++
 4 files changed, 227 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/string_kfuncs.c
 create mode 100644 tools/testing/selftests/bpf/progs/string_kfuncs_failure1.c
 create mode 100644 tools/testing/selftests/bpf/progs/string_kfuncs_failure2.c
 create mode 100644 tools/testing/selftests/bpf/progs/string_kfuncs_success.c

diff --git a/tools/testing/selftests/bpf/prog_tests/string_kfuncs.c b/tools/testing/selftests/bpf/prog_tests/string_kfuncs.c
new file mode 100644
index 000000000000..931f499343aa
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/string_kfuncs.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 Red Hat, Inc.*/
+#include <test_progs.h>
+#include "string_kfuncs_success.skel.h"
+#include "string_kfuncs_failure1.skel.h"
+#include "string_kfuncs_failure2.skel.h"
+#include <sys/mman.h>
+
+static const char * const string_kfuncs[] = {
+	"strcmp",
+	"strchr",
+	"strchrnul",
+	"strnchr",
+	"strrchr",
+	"strlen",
+	"strnlen",
+	"strspn",
+	"strcspn",
+	"strpbrk",
+	"strstr",
+	"strnstr",
+};
+
+void run_too_long_tests(void)
+{
+	struct string_kfuncs_failure2 *skel;
+	struct bpf_program *prog;
+	char test_name[256];
+	int err, i;
+
+	skel = string_kfuncs_failure2__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "string_kfuncs_failure2__open_and_load"))
+		return;
+
+	memset(skel->bss->long_str, 'a', sizeof(skel->bss->long_str));
+
+	for (i = 0; i < ARRAY_SIZE(string_kfuncs); i++) {
+		sprintf(test_name, "test_%s_too_long", string_kfuncs[i]);
+		if (!test__start_subtest(test_name))
+			continue;
+
+		prog = bpf_object__find_program_by_name(skel->obj, test_name);
+		if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+			goto cleanup;
+
+		LIBBPF_OPTS(bpf_test_run_opts, topts);
+		err = bpf_prog_test_run_opts(bpf_program__fd(prog), &topts);
+		if (!ASSERT_OK(err, "bpf_prog_test_run"))
+			goto cleanup;
+
+		ASSERT_EQ(topts.retval, -E2BIG, "reading too long string fails with -E2BIG");
+	}
+
+cleanup:
+	string_kfuncs_failure2__destroy(skel);
+}
+
+void test_string_kfuncs(void)
+{
+	RUN_TESTS(string_kfuncs_success);
+	RUN_TESTS(string_kfuncs_failure1);
+
+	run_too_long_tests();
+}
+
diff --git a/tools/testing/selftests/bpf/progs/string_kfuncs_failure1.c b/tools/testing/selftests/bpf/progs/string_kfuncs_failure1.c
new file mode 100644
index 000000000000..e7ec2a8b06cb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/string_kfuncs_failure1.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 Red Hat, Inc.*/
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char *nullptr = NULL;
+char *invalid_ptr = (char *)0x12345678;
+
+/* Passing NULL by value to string kfuncs is allowed by the verifier but the kfunc should return -EINVAL */
+SEC("syscall") __retval(-14) int test_strcmp_null(void *ctx) { return bpf_strcmp(NULL, NULL); }
+SEC("syscall") __retval(-14) int test_strchr_null(void *ctx) { return (u64)bpf_strchr(NULL, 'a'); }
+SEC("syscall") __retval(-14) int test_strchrnul_null(void *ctx) { return (u64)bpf_strchrnul(NULL, 'a'); }
+SEC("syscall") __retval(-14) int test_strnchr_null(void *ctx) { return (u64)bpf_strnchr(NULL, 1, 'a'); }
+SEC("syscall") __retval(-14) int test_strrchr_null(void *ctx) { return (u64)bpf_strrchr(NULL, 'a'); }
+SEC("syscall") __retval(-14) int test_strlen_null(void *ctx) { return bpf_strlen(NULL); }
+SEC("syscall") __retval(-14) int test_strnlen_null(void *ctx) { return bpf_strnlen(NULL, 1); }
+SEC("syscall") __retval(-14) int test_strspn_null(void *ctx) { return bpf_strspn(NULL, NULL); }
+SEC("syscall") __retval(-14) int test_strcspn_null(void *ctx) { return bpf_strcspn(NULL, NULL); }
+SEC("syscall") __retval(-14) int test_strpbrk_null(void *ctx) { return (u64)bpf_strpbrk(NULL, NULL); }
+SEC("syscall") __retval(-14) int test_strstr_null(void *ctx) { return (u64)bpf_strstr(NULL, NULL); }
+SEC("syscall") __retval(-14) int test_strnstr_null(void *ctx) { return (u64)bpf_strnstr(NULL, NULL, 1); }
+
+/* Passing a NULL or an invalid pointer to string kfuncs should be rejected by the verifier*/
+SEC("syscall") __failure int test_strcmp_nullptr(void *ctx) { return bpf_strcmp(nullptr, nullptr); }
+SEC("syscall") __failure int test_strchr_nullptr(void *ctx) { return (u64)bpf_strchr(nullptr, 'a'); }
+SEC("syscall") __failure int test_strchrnul_nullptr(void *ctx) { return (u64)bpf_strchrnul(nullptr, 'a'); }
+SEC("syscall") __failure int test_strnchr_nullptr(void *ctx) { return (u64)bpf_strnchr(nullptr, 1, 'a'); }
+SEC("syscall") __failure int test_strrchr_nullptr(void *ctx) { return (u64)bpf_strrchr(nullptr, 'a'); }
+SEC("syscall") __failure int test_strlen_nullptr(void *ctx) { return bpf_strlen(nullptr); }
+SEC("syscall") __failure int test_strnlen_nullptr(void *ctx) { return bpf_strnlen(nullptr, 1); }
+SEC("syscall") __failure int test_strspn_nullptr(void *ctx) { return bpf_strspn(nullptr, nullptr); }
+SEC("syscall") __failure int test_strcspn_nullptr(void *ctx) { return bpf_strcspn(nullptr, nullptr); }
+SEC("syscall") __failure int test_strpbrk_nullptr(void *ctx) { return (u64)bpf_strpbrk(nullptr, nullptr); }
+SEC("syscall") __failure int test_strstr_nullptr(void *ctx) { return (u64)bpf_strstr(nullptr, nullptr); }
+SEC("syscall") __failure int test_strnstr_nullptr(void *ctx) { return (u64)bpf_strnstr(nullptr, nullptr, 1); }
+
+SEC("syscall") __failure int test_strcmp_invalid_ptr(void *ctx) { return bpf_strcmp(invalid_ptr, invalid_ptr); }
+SEC("syscall") __failure int test_strchr_invalid_ptr(void *ctx) { return (u64)bpf_strchr(invalid_ptr, 'a'); }
+SEC("syscall") __failure int test_strchrnul_invalid_ptr(void *ctx) { return (u64)bpf_strchrnul(invalid_ptr, 'a'); }
+SEC("syscall") __failure int test_strnchr_invalid_ptr(void *ctx) { return (u64)bpf_strnchr(invalid_ptr, 1, 'a'); }
+SEC("syscall") __failure int test_strrchr_invalid_ptr(void *ctx) { return (u64)bpf_strrchr(invalid_ptr, 'a'); }
+SEC("syscall") __failure int test_strlen_invalid_ptr(void *ctx) { return bpf_strlen(invalid_ptr); }
+SEC("syscall") __failure int test_strnlen_invalid_ptr(void *ctx) { return bpf_strnlen(invalid_ptr, 1); }
+SEC("syscall") __failure int test_strspn_invalid_ptr(void *ctx) { return bpf_strspn(invalid_ptr, invalid_ptr); }
+SEC("syscall") __failure int test_strcspn_invalid_ptr(void *ctx) { return bpf_strcspn(invalid_ptr, invalid_ptr); }
+SEC("syscall") __failure int test_strpbrk_invalid_ptr(void *ctx) { return (u64)bpf_strpbrk(invalid_ptr, invalid_ptr); }
+SEC("syscall") __failure int test_strstr_invalid_ptr(void *ctx) { return (u64)bpf_strstr(invalid_ptr, invalid_ptr); }
+SEC("syscall") __failure int test_strnstr_invalid_ptr(void *ctx) { return (u64)bpf_strnstr(invalid_ptr, invalid_ptr, 1); }
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/string_kfuncs_failure2.c b/tools/testing/selftests/bpf/progs/string_kfuncs_failure2.c
new file mode 100644
index 000000000000..7f0f9b24890e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/string_kfuncs_failure2.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 Red Hat, Inc.*/
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <linux/limits.h>
+
+char long_str[XATTR_SIZE_MAX + 1];
+char a[] = "a";
+char b[] = "b";
+
+SEC("syscall") int test_strcmp_too_long(void *ctx) { return bpf_strcmp(long_str, long_str); }
+SEC("syscall") const char *test_strchr_too_long(void *ctx) { return bpf_strchr(long_str, 'b'); }
+SEC("syscall") const char *test_strchrnul_too_long(void *ctx) { return bpf_strchrnul(long_str, 'b'); }
+SEC("syscall") const char *test_strnchr_too_long(void *ctx) { return bpf_strnchr(long_str, sizeof(long_str), 'b'); }
+SEC("syscall") const char *test_strrchr_too_long(void *ctx) { return bpf_strrchr(long_str, 'b'); }
+SEC("syscall") int test_strlen_too_long(void *ctx) { return bpf_strlen(long_str); }
+SEC("syscall") int test_strnlen_too_long(void *ctx) { return bpf_strnlen(long_str, sizeof(long_str)); }
+SEC("syscall") int test_strspn_too_long(void *ctx) { return bpf_strspn(long_str, a); }
+SEC("syscall") int test_strcspn_too_long(void *ctx) { return bpf_strcspn(long_str, b); }
+SEC("syscall") const char *test_strpbrk_too_long(void *ctx) { return bpf_strpbrk(long_str, b); }
+SEC("syscall") const char *test_strstr_too_long(void *ctx) { return bpf_strstr(long_str, long_str); }
+SEC("syscall") const char *test_strnstr_too_long(void *ctx) { return bpf_strnstr(long_str, long_str, sizeof(long_str)); }
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/string_kfuncs_success.c b/tools/testing/selftests/bpf/progs/string_kfuncs_success.c
new file mode 100644
index 000000000000..df8d6599b7e5
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/string_kfuncs_success.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 Red Hat, Inc.*/
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char str[] = "hello world";
+
+#define __test(retval) SEC("syscall") __success __retval(retval)
+
+/* Functional tests */
+__test(0) int test_strcmp_eq(void *ctx) { return bpf_strcmp(str, "hello world"); }
+__test(1) int test_strcmp_neq(void *ctx) { return bpf_strcmp(str, "hello"); }
+__test(1) int test_strchr_found(void *ctx) { return bpf_strchr(str, 'e') - str; }
+__test(11) int test_strchr_null(void *ctx) { return bpf_strchr(str, '\0') - str; }
+__test(0) u64 test_strchr_notfound(void *ctx) { return (u64)bpf_strchr(str, 'x'); }
+__test(1) int test_strchrnul_found(void *ctx) { return bpf_strchrnul(str, 'e') - str; }
+__test(11) int test_strchrnul_notfound(void *ctx) { return bpf_strchrnul(str, 'x') - str; }
+__test(1) int test_strnchr_found(void *ctx) { return bpf_strnchr(str, 5, 'e') - str; }
+__test(11) int test_strnchr_null(void *ctx) { return bpf_strnchr(str, 12, '\0') - str; }
+__test(0) u64 test_strnchr_notfound(void *ctx) { return (u64)bpf_strnchr(str, 5, 'w'); }
+__test(9) int test_strrchr_found(void *ctx) { return bpf_strrchr(str, 'l') - str; }
+__test(0) u64 test_strrchr_notfound(void *ctx) { return (u64)bpf_strrchr(str, 'x'); }
+__test(11) size_t test_strlen(void *ctx) { return bpf_strlen(str); }
+__test(11) size_t test_strnlen(void *ctx) { return bpf_strnlen(str, 12); }
+__test(5) size_t test_strspn(void *ctx) { return bpf_strspn(str, "ehlo"); }
+__test(2) size_t test_strcspn(void *ctx) { return bpf_strcspn(str, "lo"); }
+__test(2) int test_strpbrk_found(void *ctx) { return bpf_strpbrk(str, "lo") - str; }
+__test(0) u64 test_strpbrk_notfound(void *ctx) { return (u64)bpf_strpbrk(str, "abc"); }
+__test(6) int test_strstr_found(void *ctx) { return bpf_strstr(str, "world") - str; }
+__test(0) u64 test_strstr_notfound(void *ctx) { return (u64)bpf_strstr(str, "hi"); }
+__test(0) int test_strstr_empty(void *ctx) { return bpf_strstr(str, "") - str; }
+__test(0) int test_strnstr_found(void *ctx) { return bpf_strnstr(str, "hello", 6) - str; }
+__test(0) u64 test_strnstr_notfound(void *ctx) { return (u64)bpf_strnstr(str, "hi", 10); }
+__test(0) int test_strnstr_empty(void *ctx) { return bpf_strnstr(str, "", 1) - str; }
+
+/* The above functional tests pass a global variable (i.e. a map) to the kfuncs.
+ * Now check that the kfuncs accept strings in other forms:
+ * - string literals (i.e. read-only maps)
+ * - stack-allocated buffers
+ */
+SEC("syscall")
+__success __retval(0)
+int test_string_kfuncs_literal(void *ctx)
+{
+	if (bpf_strcmp("abc", "abc") != 0) return -1;
+	if (bpf_strchr("abc", 'x') != NULL) return -1;
+	if (bpf_strchrnul("abc", 'x') == NULL) return -1;
+	if (bpf_strnchr("abc", 3, 'x') != NULL) return -1;
+	if (bpf_strrchr("abc", 'x') != NULL) return -1;
+	if (bpf_strlen("abc") != 3) return -1;
+	if (bpf_strnlen("abc", 3) != 3) return -1;
+	if (bpf_strspn("abc", "abc") != 3) return -1;
+	if (bpf_strcspn("abc", "abc") != 0) return -1;
+	if (bpf_strpbrk("abc", "def") != NULL) return -1;
+	if (bpf_strstr("abc", "def") != NULL) return -1;
+	if (bpf_strnstr("abc", "def", 3) != NULL) return -1;
+
+	return 0;
+}
+
+SEC("syscall")
+__success __retval(0)
+int test_string_kfuncs_buffer(void *ctx)
+{
+	char buffer[16];
+
+	__builtin_memset(buffer, 'a', sizeof(buffer));
+	buffer[sizeof(buffer) - 1] = '\0';
+
+	if (bpf_strcmp(buffer, buffer) != 0) return -1;
+	if (bpf_strchr(buffer, 'a') != buffer) return -1;
+	if (bpf_strchrnul(buffer, 'a') != buffer) return -1;
+	if (bpf_strnchr(buffer, sizeof(buffer), 'a') != buffer) return -1;
+	if (bpf_strrchr(buffer, 'b') != NULL) return -1;
+	if (bpf_strlen(buffer) != sizeof(buffer) - 1) return -1;
+	if (bpf_strnlen(buffer, sizeof(buffer)) != sizeof(buffer) - 1) return -1;
+	if (bpf_strspn(buffer, buffer) != sizeof(buffer) - 1) return -1;
+	if (bpf_strcspn(buffer, buffer) != 0) return -1;
+	if (bpf_strpbrk(buffer, buffer) != buffer) return -1;
+	if (bpf_strstr(buffer, buffer) != buffer) return -1;
+	if (bpf_strnstr(buffer, buffer, sizeof(buffer)) != buffer) return -1;
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 1/4] bpf: Teach vefier to handle const ptrs as args to kfuncs
  2025-05-07  6:40 ` [PATCH bpf-next v4 1/4] bpf: Teach vefier to handle const ptrs as args to kfuncs Viktor Malik
@ 2025-05-08  9:08   ` Matt Bobrowski
  2025-05-09 16:20     ` Alexei Starovoitov
  2025-05-13  7:58     ` Viktor Malik
  0 siblings, 2 replies; 25+ messages in thread
From: Matt Bobrowski @ 2025-05-08  9:08 UTC (permalink / raw)
  To: Viktor Malik
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On Wed, May 07, 2025 at 08:40:36AM +0200, Viktor Malik wrote:
> When a kfunc takes a const pointer as an argument, the verifier should
> not check that the memory can be accessed for writing as that may lead
> to rejecting safe programs. Extend the verifier to detect such arguments
> and skip the write access check for them.
> 
> The use-case for this change is passing string literals (i.e. read-only
> maps) to read-only string kfuncs.
> 
> Signed-off-by: Viktor Malik <vmalik@redhat.com>
> ---
>  include/linux/btf.h   |  5 +++++
>  kernel/bpf/verifier.c | 10 ++++++----
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index ebc0c0c9b944..5cb06c65d91f 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -391,6 +391,11 @@ static inline bool btf_type_is_type_tag(const struct btf_type *t)
>  	return BTF_INFO_KIND(t->info) == BTF_KIND_TYPE_TAG;
>  }
>  
> +static inline bool btf_type_is_const(const struct btf_type *t)
> +{
> +	return BTF_INFO_KIND(t->info) == BTF_KIND_CONST;
> +}

I've seen btf_type_is_* related helpers lean on btf_kind() instead
here, which ultimately just wraps BTF_INFO_KIND() macro.

>  /* union is only a special case of struct:
>   * all its offsetof(member) == 0
>   */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 54c6953a8b84..e2d74c4d44c1 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8186,7 +8186,7 @@ static int check_mem_size_reg(struct bpf_verifier_env *env,
>  }
>  
>  static int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> -			 u32 regno, u32 mem_size)
> +			 u32 regno, u32 mem_size, bool read_only)

Maybe s/read_only/write_mem_access?

>  {
>  	bool may_be_null = type_may_be_null(reg->type);
>  	struct bpf_reg_state saved_reg;
> @@ -8205,7 +8205,8 @@ static int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg
>  	}
>  
>  	err = check_helper_mem_access(env, regno, mem_size, BPF_READ, true, NULL);
> -	err = err ?: check_helper_mem_access(env, regno, mem_size, BPF_WRITE, true, NULL);
> +	if (!read_only)
> +		err = err ?: check_helper_mem_access(env, regno, mem_size, BPF_WRITE, true, NULL);

For clarity, I'd completely get rid of the ternary operator usage
here. You can short circuit the call to check_helper_mem_access() w/
BPF_WRITE by simply checking the error code value from the preceding
call to check_helper_mem_access() w/ BPF_READ in the branch condition
i.e.

```
err = check_helper_mem_access(..., BPF_READ, ...);
if (!err && write_mem_access)
   err = check_helper_mem_acces(..., BPF_WRITE, ...);
```

>  	if (may_be_null)
>  		*reg = saved_reg;
> @@ -10361,7 +10362,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
>  			ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE);
>  			if (ret < 0)
>  				return ret;
> -			if (check_mem_reg(env, reg, regno, arg->mem_size))
> +			if (check_mem_reg(env, reg, regno, arg->mem_size, false))

For clarity, I'd add: /*write_mem_access=*/false). Same with the below
call to check_mem_reg().

>  				return -EINVAL;
>  			if (!(arg->arg_type & PTR_MAYBE_NULL) && (reg->type & PTR_MAYBE_NULL)) {
>  				bpf_log(log, "arg#%d is expected to be non-NULL\n", i);
> @@ -13252,7 +13253,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  					i, btf_type_str(ref_t), ref_tname, PTR_ERR(resolve_ret));
>  				return -EINVAL;
>  			}
> -			ret = check_mem_reg(env, reg, regno, type_size);
> +			ret = check_mem_reg(env, reg, regno, type_size,
> +					    btf_type_is_const(btf_type_by_id(btf, t->type)));
>  			if (ret < 0)
>  				return ret;
>  			break;
> -- 
> 2.49.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations
  2025-05-07  6:40 ` [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations Viktor Malik
@ 2025-05-08  9:41   ` Matt Bobrowski
  2025-05-09 16:39     ` Alexei Starovoitov
  2025-05-15 12:32     ` Viktor Malik
  2025-05-09 18:20   ` Andrii Nakryiko
  1 sibling, 2 replies; 25+ messages in thread
From: Matt Bobrowski @ 2025-05-08  9:41 UTC (permalink / raw)
  To: Viktor Malik
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On Wed, May 07, 2025 at 08:40:38AM +0200, Viktor Malik wrote:
> String operations are commonly used so this exposes the most common ones
> to BPF programs. For now, we limit ourselves to operations which do not
> copy memory around.
> 
> Unfortunately, most in-kernel implementations assume that strings are
> %NUL-terminated, which is not necessarily true, and therefore we cannot
> use them directly in the BPF context. Instead, we open-code them using
> __get_kernel_nofault instead of plain dereference to make them safe and
> limit the strings length to XATTR_SIZE_MAX to make sure the functions
> terminate. When __get_kernel_nofault fails, functions return -EFAULT.
> Similarly, when the size bound is reached, the functions return -E2BIG.

Curious, why was XATTR_SIZE_MAX chosen as the upper bounds here? Just
an arbitrary value that felt large enough?

> At the moment, strings can be passed to the kfuncs in three forms:
> - string literals (i.e. pointers to read-only maps)
> - global variables (i.e. pointers to read-write maps)
> - stack-allocated buffers
> 
> Note that currently, it is not possible to pass strings from the BPF
> program context (like function args) as the verifier doesn't treat them
> as neither PTR_TO_MEM nor PTR_TO_BTF_ID.
> 
> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Viktor Malik <vmalik@redhat.com>
> ---
>  kernel/bpf/helpers.c | 440 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 440 insertions(+)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index e3a2662f4e33..8fb7c2ca7ac0 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -23,6 +23,7 @@
>  #include <linux/btf_ids.h>
>  #include <linux/bpf_mem_alloc.h>
>  #include <linux/kasan.h>
> +#include <linux/uaccess.h>
>  
>  #include "../../lib/kstrtox.h"
>  
> @@ -3194,6 +3195,433 @@ __bpf_kfunc void bpf_local_irq_restore(unsigned long *flags__irq_flag)
>  	local_irq_restore(*flags__irq_flag);
>  }
>  
> +/* Kfuncs for string operations.
> + *
> + * Since strings are not necessarily %NUL-terminated, we cannot directly call
> + * in-kernel implementations. Instead, we open-code the implementations using
> + * __get_kernel_nofault instead of plain dereference to make them safe.
> + */

Returning an -EFAULT error code for supplied arguments which are
deemed to be invalid is just a very weird semantic IMO. As a BPF
program author, I totally wouldn't expect a BPF kfunc to return
-EINVAL if I had supplied it something that it couldn't quite possibly
handle to begin with. Look at the prior art, being pre-existing BPF
kfuncs, and you'll see how they handle invalidly supplied arguments. I
totally understand that attempting to dereference a NULL-pointer would
ultimately result in a fault being raised, so it may feel rather
natural to also report an -EFAULT error code upon doing some
NULL-pointer checks that hold true, but from an API usability POV it
just seems awkward and wrong.

Another thing that I noticed was that none of these BPF kfunc
arguments make use of the parameter suffixes i.e. __str, __sz. Why is
that exactly? Will leaning on those break you in some way?

> +/**
> + * bpf_strcmp - Compare two strings
> + * @s1: One string
> + * @s2: Another string
> + *
> + * Return:
> + * * %0       - Strings are equal
> + * * %-1      - @s1 is smaller
> + * * %1       - @s2 is smaller
> + * * %-EFAULT - Cannot read one of the strings
> + * * %-E2BIG  - One of strings is too large
> + */
> +__bpf_kfunc int bpf_strcmp(const char *s1, const char *s2)
> +{
> +	char c1, c2;
> +	int i;
> +
> +	if (!s1 || !s2)
> +		return -EFAULT;
> +
> +	guard(pagefault)();
> +	for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +		__get_kernel_nofault(&c1, s1, char, err_out);
> +		__get_kernel_nofault(&c2, s2, char, err_out);
> +		if (c1 != c2)
> +			return c1 < c2 ? -1 : 1;
> +		if (c1 == '\0')
> +			return 0;
> +		s1++;
> +		s2++;
> +	}
> +	return -E2BIG;
> +err_out:
> +	return -EFAULT;
> +}
> +
> +/**
> + * bpf_strchr - Find the first occurrence of a character in a string
> + * @s: The string to be searched
> + * @c: The character to search for
> + *
> + * Note that the %NUL-terminator is considered part of the string, and can
> + * be searched for.
> + *
> + * Return:
> + * * const char * - Pointer to the first occurrence of @c within @s
> + * * %NULL        - @c not found in @s
> + * * %-EFAULT     - Cannot read @s
> + * * %-E2BIG      - @s too large
> + */
> +__bpf_kfunc const char *bpf_strchr(const char *s, char c)
> +{
> +	char sc;
> +	int i;
> +
> +	if (!s)
> +		return ERR_PTR(-EFAULT);
> +
> +	guard(pagefault)();
> +	for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +		__get_kernel_nofault(&sc, s, char, err_out);
> +		if (sc == c)
> +			return s;
> +		if (sc == '\0')
> +			return NULL;
> +		s++;
> +	}
> +	return ERR_PTR(-E2BIG);
> +err_out:
> +	return ERR_PTR(-EFAULT);
> +}
> +
> +/**
> + * bpf_strnchr - Find a character in a length limited string
> + * @s: The string to be searched
> + * @count: The number of characters to be searched
> + * @c: The character to search for
> + *
> + * Note that the %NUL-terminator is considered part of the string, and can
> + * be searched for.
> + *
> + * Return:
> + * * const char * - Pointer to the first occurrence of @c within @s
> + * * %NULL        - @c not found in the first @count characters of @s
> + * * %-EFAULT     - Cannot read @s
> + * * %-E2BIG      - @s too large
> + */
> +__bpf_kfunc const char *bpf_strnchr(const char *s, size_t count, char c)
> +{
> +	char sc;
> +	int i;
> +
> +	if (!s)
> +		return ERR_PTR(-EFAULT);
> +
> +	guard(pagefault)();
> +	for (i = 0; i < count && i < XATTR_SIZE_MAX; i++) {
> +		__get_kernel_nofault(&sc, s, char, err_out);
> +		if (sc == c)
> +			return s;
> +		if (sc == '\0')
> +			return NULL;
> +		s++;
> +	}
> +	return i == XATTR_SIZE_MAX ? ERR_PTR(-E2BIG) : NULL;
> +err_out:
> +	return ERR_PTR(-EFAULT);
> +}
> +
> +/**
> + * bpf_strchrnul - Find and return a character in a string, or end of string
> + * @s: The string to be searched
> + * @c: The character to search for
> + *
> + * Return:
> + * * const char * - Pointer to the first occurrence of @c within @s or pointer
> + *                  to the null byte at the end of @s when @c is not found
> + * * %-EFAULT     - Cannot read @s
> + * * %-E2BIG      - @s too large
> + */
> +__bpf_kfunc const char *bpf_strchrnul(const char *s, char c)
> +{
> +	char sc;
> +	int i;
> +
> +	if (!s)
> +		return ERR_PTR(-EFAULT);
> +
> +	guard(pagefault)();
> +	for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +		__get_kernel_nofault(&sc, s, char, err_out);
> +		if (sc == '\0' || sc == c)
> +			return s;
> +		s++;
> +	}
> +	return ERR_PTR(-E2BIG);
> +err_out:
> +	return ERR_PTR(-EFAULT);
> +}
> +
> +/**
> + * bpf_strrchr - Find the last occurrence of a character in a string
> + * @s: The string to be searched
> + * @c: The character to search for
> + *
> + * Return:
> + * * const char * - Pointer to the last occurrence of @c within @s
> + * * %NULL        - @c not found in @s
> + * * %-EFAULT     - Cannot read @s
> + * * %-E2BIG      - @s too large
> + */
> +__bpf_kfunc const char *bpf_strrchr(const char *s, int c)
> +{
> +	const char *last = NULL;
> +	char sc;
> +	int i;
> +
> +	if (!s)
> +		return ERR_PTR(-EFAULT);
> +
> +	guard(pagefault)();
> +	for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +		__get_kernel_nofault(&sc, s, char, err_out);
> +		if (sc == '\0')
> +			return last;
> +		if (sc == c)
> +			last = s;
> +		s++;
> +	}
> +	return ERR_PTR(-E2BIG);
> +err_out:
> +	return ERR_PTR(-EFAULT);
> +}
> +
> +/**
> + * bpf_strlen - Calculate the length of a string
> + * @s: The string
> + *
> + * Return:
> + * * >=0      - The length of @s
> + * * %-EFAULT - Cannot read @s
> + * * %-E2BIG  - @s too large
> + */
> +__bpf_kfunc int bpf_strlen(const char *s)
> +{
> +	char c;
> +	int i;
> +
> +	if (!s)
> +		return -EFAULT;
> +
> +	guard(pagefault)();
> +	for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +		__get_kernel_nofault(&c, s, char, err_out);
> +		if (c == '\0')
> +			return i;
> +		s++;
> +	}
> +	return -E2BIG;
> +err_out:
> +	return -EFAULT;
> +}
> +
> +/**
> + * bpf_strlen - Calculate the length of a length-limited string
> + * @s: The string
> + * @count: The maximum number of characters to count
> + *
> + * Return:
> + * * >=0      - The length of @s
> + * * %-EFAULT - Cannot read @s
> + * * %-E2BIG  - @s too large
> + */
> +__bpf_kfunc int bpf_strnlen(const char *s, size_t count)
> +{
> +	char c;
> +	int i;
> +
> +	if (!s)
> +		return -EFAULT;
> +
> +	guard(pagefault)();
> +	for (i = 0; i < count && i < XATTR_SIZE_MAX; i++) {
> +		__get_kernel_nofault(&c, s, char, err_out);
> +		if (c == '\0')
> +			return i;
> +		s++;
> +	}
> +	return i == XATTR_SIZE_MAX ? -E2BIG : i;
> +err_out:
> +	return -EFAULT;
> +}
> +
> +/**
> + * bpf_strspn - Calculate the length of the initial substring of @s which only
> + *              contains letters in @accept
> + * @s: The string to be searched
> + * @accept: The string to search for
> + *
> + * Return:
> + * * >=0      - The length of the initial substring of @s which only contains
> + *              letter in @accept
> + * * %-EFAULT - Cannot read @s
> + * * %-E2BIG  - @s too large
v> + */
> +__bpf_kfunc int bpf_strspn(const char *s, const char *accept)
> +{
> +	const char *p;
> +	char c;
> +	int i;
> +
> +	if (!s || !accept)
> +		return -EFAULT;
> +
> +	guard(pagefault)();
> +	for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +		__get_kernel_nofault(&c, s, char, err_out);
> +		p = bpf_strchr(accept, c);
> +		if (IS_ERR(p))
> +			return PTR_ERR(p);
> +		if (c == '\0' || !p)
> +			return i;
> +		s++;
> +	}
> +	return -E2BIG;
> +err_out:
> +	return -EFAULT;
> +}
> +
> +/**
> + * strcspn - Calculate the length of the initial substring of @s which does not
> + *           contain letters in @reject
> + * @s: The string to be searched
> + * @reject: The string to avoid
> + *
> + * Return:
> + * * >=0      - The length of the initial substring of @s which does not contain
> + *              letters from @reject
> + * * %-EFAULT - Cannot read @s
> + * * %-E2BIG  - @s too large
> + */
> +__bpf_kfunc int bpf_strcspn(const char *s, const char *reject)
> +{
> +	const char *p;
> +	char c;
> +	int i;
> +
> +	if (!s || !reject)
> +		return -EFAULT;
> +
> +	guard(pagefault)();
> +	for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +		__get_kernel_nofault(&c, s, char, err_out);
> +		p = bpf_strchr(reject, c);
> +		if (IS_ERR(p))
> +			return PTR_ERR(p);
> +		if (c == '\0' || p)
> +			return i;
> +		s++;
> +	}
> +	return -E2BIG;
> +err_out:
> +	return -EFAULT;
> +}
> +
> +/**
> + * bpf_strpbrk - Find the first occurrence of a set of characters
> + * @s: The string to be searched
> + * @accept: The characters to search for
> + *
> + * Return:
> + * * const char * - Pointer to the first occurrence of a character from @accept
> + *                  within @s
> + * * %NULL        - No character from @accept found in @s
> + * * %-EFAULT     - Cannot read one of the strings
> + * * %-E2BIG      - One of the strings is too large
> + */
> +__bpf_kfunc const char *bpf_strpbrk(const char *s, const char *accept)
> +{
> +	const char *p;
> +	char c;
> +	int i;
> +
> +	if (!s || !accept)
> +		return ERR_PTR(-EFAULT);
> +
> +	guard(pagefault)();
> +	for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +		__get_kernel_nofault(&c, s, char, err_out);
> +		if (c == '\0')
> +			return NULL;
> +		p = bpf_strchr(accept, c);
> +		if (IS_ERR(p))
> +			return p;
> +		if (p)
> +			return s;
> +		s++;
> +	}
> +	return ERR_PTR(-E2BIG);
> +err_out:
> +	return ERR_PTR(-EFAULT);
> +}
> +
> +/**
> + * bpf_strstr - Find the first substring in a string
> + * @s1: The string to be searched
> + * @s2: The string to search for
> + *
> + * Return:
> + * * const char * - Pointer to the first occurrence of @s2 within @s1
> + * * %NULL        - @s2 is not a substring of @s1
> + * * %-EFAULT     - Cannot read one of the strings
> + * * %-E2BIG      - One of the strings is too large
> + */
> +__bpf_kfunc const char *bpf_strstr(const char *s1, const char *s2)
> +{
> +	char c1, c2;
> +	int i, j;
> +
> +	if (!s1 || !s2)
> +		return ERR_PTR(-EFAULT);
> +
> +	guard(pagefault)();
> +	for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +		for (j = 0; j < XATTR_SIZE_MAX; j++) {
> +			__get_kernel_nofault(&c1, s1 + j, char, err_out);
> +			__get_kernel_nofault(&c2, s2 + j, char, err_out);
> +			if (c2 == '\0')
> +				return s1;
> +			if (c1 == '\0')
> +				return NULL;
> +			if (c1 != c2)
> +				break;
> +		}
> +		if (j == XATTR_SIZE_MAX)
> +			return ERR_PTR(-E2BIG);
> +		s1++;
> +	}
> +	return ERR_PTR(-E2BIG);
> +err_out:
> +	return ERR_PTR(-EFAULT);
> +}
> +
> +/**
> + * bpf_strnstr - Find the first substring in a length-limited string
> + * @s1: The string to be searched
> + * @s2: The string to search for
> + * @len: the maximum number of characters to search

Return: ...?

> + */
> +__bpf_kfunc const char *bpf_strnstr(const char *s1, const char *s2, size_t len)
> +{
> +	char c1, c2;
> +	int i, j;
> +
> +	if (!s1 || !s2)
> +		return ERR_PTR(-EFAULT);
> +
> +	guard(pagefault)();
> +	for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +		for (j = 0; i + j < len && j < XATTR_SIZE_MAX; j++) {
> +			__get_kernel_nofault(&c1, s1 + j, char, err_out);
> +			__get_kernel_nofault(&c2, s2 + j, char, err_out);
> +			if (c2 == '\0')
> +				return s1;
> +			if (c1 == '\0')
> +				return NULL;
> +			if (c1 != c2)
> +				break;
> +		}
> +		if (j == XATTR_SIZE_MAX)
> +			return ERR_PTR(-E2BIG);
> +		if (i + j == len)
> +			return NULL;
> +		s1++;
> +	}
> +	return ERR_PTR(-E2BIG);
> +err_out:
> +	return ERR_PTR(-EFAULT);
> +}
> +
>  __bpf_kfunc_end_defs();
>  
>  BTF_KFUNCS_START(generic_btf_ids)
> @@ -3294,6 +3722,18 @@ BTF_ID_FLAGS(func, bpf_iter_kmem_cache_next, KF_ITER_NEXT | KF_RET_NULL | KF_SLE
>  BTF_ID_FLAGS(func, bpf_iter_kmem_cache_destroy, KF_ITER_DESTROY | KF_SLEEPABLE)
>  BTF_ID_FLAGS(func, bpf_local_irq_save)
>  BTF_ID_FLAGS(func, bpf_local_irq_restore)
> +BTF_ID_FLAGS(func, bpf_strcmp);
> +BTF_ID_FLAGS(func, bpf_strchr, KF_RET_NULL);
> +BTF_ID_FLAGS(func, bpf_strchrnul);
> +BTF_ID_FLAGS(func, bpf_strnchr, KF_RET_NULL);
> +BTF_ID_FLAGS(func, bpf_strrchr, KF_RET_NULL);
> +BTF_ID_FLAGS(func, bpf_strlen);
> +BTF_ID_FLAGS(func, bpf_strnlen);
> +BTF_ID_FLAGS(func, bpf_strspn);
> +BTF_ID_FLAGS(func, bpf_strcspn);
> +BTF_ID_FLAGS(func, bpf_strpbrk, KF_RET_NULL);
> +BTF_ID_FLAGS(func, bpf_strstr, KF_RET_NULL);
> +BTF_ID_FLAGS(func, bpf_strnstr, KF_RET_NULL);
>  BTF_KFUNCS_END(common_btf_ids)
>  
>  static const struct btf_kfunc_id_set common_kfunc_set = {
> -- 
> 2.49.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 2/4] uaccess: Define pagefault lock guard
  2025-05-07  6:40 ` [PATCH bpf-next v4 2/4] uaccess: Define pagefault lock guard Viktor Malik
@ 2025-05-08 10:00   ` Matt Bobrowski
  2025-05-09 18:20     ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Matt Bobrowski @ 2025-05-08 10:00 UTC (permalink / raw)
  To: Viktor Malik
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	linux-kernel

On Wed, May 07, 2025 at 08:40:37AM +0200, Viktor Malik wrote:
> Define a pagefault lock guard which allows to simplify functions that
> need to disable page faults.
> 
> Signed-off-by: Viktor Malik <vmalik@redhat.com>
> ---
>  include/linux/uaccess.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 7c06f4795670..1beb5b395d81 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -296,6 +296,8 @@ static inline bool pagefault_disabled(void)
>   */
>  #define faulthandler_disabled() (pagefault_disabled() || in_atomic())
>  
> +DEFINE_LOCK_GUARD_0(pagefault, pagefault_disable(), pagefault_enable())

I can't help but mention that naming this scope-based cleanup helper
`pagefault` just seems overly ambiguous. That's just me though...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 1/4] bpf: Teach vefier to handle const ptrs as args to kfuncs
  2025-05-08  9:08   ` Matt Bobrowski
@ 2025-05-09 16:20     ` Alexei Starovoitov
  2025-05-13  6:48       ` Matt Bobrowski
  2025-05-13  7:58     ` Viktor Malik
  1 sibling, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2025-05-09 16:20 UTC (permalink / raw)
  To: Matt Bobrowski
  Cc: Viktor Malik, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa

On Thu, May 8, 2025 at 2:09 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
>
> >
> >  static int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > -                      u32 regno, u32 mem_size)
> > +                      u32 regno, u32 mem_size, bool read_only)
>
> Maybe s/read_only/write_mem_access?

'bool' arguments are not readable at the callsite.
Let's use enum bpf_access_type BPF_READ|WRITE here
or introduce another enum ?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations
  2025-05-08  9:41   ` Matt Bobrowski
@ 2025-05-09 16:39     ` Alexei Starovoitov
  2025-05-28  9:05       ` Viktor Malik
  2025-05-15 12:32     ` Viktor Malik
  1 sibling, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2025-05-09 16:39 UTC (permalink / raw)
  To: Matt Bobrowski
  Cc: Viktor Malik, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa

On Thu, May 8, 2025 at 2:42 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
>
> On Wed, May 07, 2025 at 08:40:38AM +0200, Viktor Malik wrote:
> > String operations are commonly used so this exposes the most common ones
> > to BPF programs. For now, we limit ourselves to operations which do not
> > copy memory around.
> >
> > Unfortunately, most in-kernel implementations assume that strings are
> > %NUL-terminated, which is not necessarily true, and therefore we cannot
> > use them directly in the BPF context. Instead, we open-code them using
> > __get_kernel_nofault instead of plain dereference to make them safe and
> > limit the strings length to XATTR_SIZE_MAX to make sure the functions
> > terminate. When __get_kernel_nofault fails, functions return -EFAULT.
> > Similarly, when the size bound is reached, the functions return -E2BIG.
>
> Curious, why was XATTR_SIZE_MAX chosen as the upper bounds here? Just
> an arbitrary value that felt large enough?
>
> > At the moment, strings can be passed to the kfuncs in three forms:
> > - string literals (i.e. pointers to read-only maps)
> > - global variables (i.e. pointers to read-write maps)
> > - stack-allocated buffers
> >
> > Note that currently, it is not possible to pass strings from the BPF
> > program context (like function args) as the verifier doesn't treat them
> > as neither PTR_TO_MEM nor PTR_TO_BTF_ID.
> >
> > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > Signed-off-by: Viktor Malik <vmalik@redhat.com>
> > ---
> >  kernel/bpf/helpers.c | 440 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 440 insertions(+)
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index e3a2662f4e33..8fb7c2ca7ac0 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -23,6 +23,7 @@
> >  #include <linux/btf_ids.h>
> >  #include <linux/bpf_mem_alloc.h>
> >  #include <linux/kasan.h>
> > +#include <linux/uaccess.h>
> >
> >  #include "../../lib/kstrtox.h"
> >
> > @@ -3194,6 +3195,433 @@ __bpf_kfunc void bpf_local_irq_restore(unsigned long *flags__irq_flag)
> >       local_irq_restore(*flags__irq_flag);
> >  }
> >
> > +/* Kfuncs for string operations.
> > + *
> > + * Since strings are not necessarily %NUL-terminated, we cannot directly call
> > + * in-kernel implementations. Instead, we open-code the implementations using
> > + * __get_kernel_nofault instead of plain dereference to make them safe.
> > + */
>
> Returning an -EFAULT error code for supplied arguments which are
> deemed to be invalid is just a very weird semantic IMO. As a BPF
> program author, I totally wouldn't expect a BPF kfunc to return
> -EINVAL if I had supplied it something that it couldn't quite possibly
> handle to begin with. Look at the prior art, being pre-existing BPF
> kfuncs, and you'll see how they handle invalidly supplied arguments. I
> totally understand that attempting to dereference a NULL-pointer would
> ultimately result in a fault being raised, so it may feel rather
> natural to also report an -EFAULT error code upon doing some
> NULL-pointer checks that hold true, but from an API usability POV it
> just seems awkward and wrong.

I mostly agree with the above.

On the first look, all the checks like:
+       if (!s || !accept)
+               return ERR_PTR(-EFAULT);

looks like a premature optimization, since
__get_kernel_nofault() will handle it just fine.

But there is a different reason to do it and the error code
should probably be different.

Consider that copy_from_kernel_nofault() has the following check
that is meaningful on several architectures including x86:
        if (!copy_from_kernel_nofault_allowed(src, size))
                return -ERANGE;

It's there to avoid accidentally reading user addresses and
NULL is one such user address.

Doing it for every pointer inside the loop will hurt performance,
but doing it once in the beginning maybe ok?
If we want to optimize it we can introduce the helper:

static bool copy_maybe_allowed(const void *ptr)
{
#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
  if ((unsigned long)ptr < TASK_SIZE)
     return false;
#else
  if (!ptr)
     return false;
#endif
  return true;
}

and modify above as:
  if (!copy_maybe_allowed(s) || !copy_maybe_allowed(accept))
    return ERR_PTR(-ERANGE);

bikeshed: shorter/better name for helper..

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 2/4] uaccess: Define pagefault lock guard
  2025-05-08 10:00   ` Matt Bobrowski
@ 2025-05-09 18:20     ` Andrii Nakryiko
  2025-05-13  6:55       ` Matt Bobrowski
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2025-05-09 18:20 UTC (permalink / raw)
  To: Matt Bobrowski
  Cc: Viktor Malik, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, linux-kernel

On Thu, May 8, 2025 at 3:01 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
>
> On Wed, May 07, 2025 at 08:40:37AM +0200, Viktor Malik wrote:
> > Define a pagefault lock guard which allows to simplify functions that
> > need to disable page faults.
> >
> > Signed-off-by: Viktor Malik <vmalik@redhat.com>
> > ---
> >  include/linux/uaccess.h | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> > index 7c06f4795670..1beb5b395d81 100644
> > --- a/include/linux/uaccess.h
> > +++ b/include/linux/uaccess.h
> > @@ -296,6 +296,8 @@ static inline bool pagefault_disabled(void)
> >   */
> >  #define faulthandler_disabled() (pagefault_disabled() || in_atomic())
> >
> > +DEFINE_LOCK_GUARD_0(pagefault, pagefault_disable(), pagefault_enable())
>
> I can't help but mention that naming this scope-based cleanup helper
> `pagefault` just seems overly ambiguous. That's just me though...

I do see the concern, but

DEFINE_LOCK_GUARD_0(preempt, preempt_disable(), preempt_enable())
DEFINE_LOCK_GUARD_0(irq, local_irq_disable(), local_irq_enable())

so we are just staying consistent here? But also "guard (against) the
pagefault" does (internally) read somewhat meaningfully, no?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations
  2025-05-07  6:40 ` [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations Viktor Malik
  2025-05-08  9:41   ` Matt Bobrowski
@ 2025-05-09 18:20   ` Andrii Nakryiko
  2025-05-09 21:37     ` Eduard Zingerman
  1 sibling, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2025-05-09 18:20 UTC (permalink / raw)
  To: Viktor Malik
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On Tue, May 6, 2025 at 11:41 PM Viktor Malik <vmalik@redhat.com> wrote:
>
> String operations are commonly used so this exposes the most common ones
> to BPF programs. For now, we limit ourselves to operations which do not
> copy memory around.
>
> Unfortunately, most in-kernel implementations assume that strings are
> %NUL-terminated, which is not necessarily true, and therefore we cannot
> use them directly in the BPF context. Instead, we open-code them using
> __get_kernel_nofault instead of plain dereference to make them safe and
> limit the strings length to XATTR_SIZE_MAX to make sure the functions
> terminate. When __get_kernel_nofault fails, functions return -EFAULT.
> Similarly, when the size bound is reached, the functions return -E2BIG.
>
> At the moment, strings can be passed to the kfuncs in three forms:
> - string literals (i.e. pointers to read-only maps)
> - global variables (i.e. pointers to read-write maps)
> - stack-allocated buffers
>
> Note that currently, it is not possible to pass strings from the BPF
> program context (like function args) as the verifier doesn't treat them
> as neither PTR_TO_MEM nor PTR_TO_BTF_ID.
>
> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Viktor Malik <vmalik@redhat.com>
> ---
>  kernel/bpf/helpers.c | 440 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 440 insertions(+)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index e3a2662f4e33..8fb7c2ca7ac0 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -23,6 +23,7 @@
>  #include <linux/btf_ids.h>
>  #include <linux/bpf_mem_alloc.h>
>  #include <linux/kasan.h>
> +#include <linux/uaccess.h>
>
>  #include "../../lib/kstrtox.h"
>
> @@ -3194,6 +3195,433 @@ __bpf_kfunc void bpf_local_irq_restore(unsigned long *flags__irq_flag)
>         local_irq_restore(*flags__irq_flag);
>  }
>
> +/* Kfuncs for string operations.
> + *
> + * Since strings are not necessarily %NUL-terminated, we cannot directly call
> + * in-kernel implementations. Instead, we open-code the implementations using
> + * __get_kernel_nofault instead of plain dereference to make them safe.
> + */
> +
> +/**
> + * bpf_strcmp - Compare two strings
> + * @s1: One string
> + * @s2: Another string
> + *
> + * Return:
> + * * %0       - Strings are equal
> + * * %-1      - @s1 is smaller
> + * * %1       - @s2 is smaller
> + * * %-EFAULT - Cannot read one of the strings
> + * * %-E2BIG  - One of strings is too large
> + */
> +__bpf_kfunc int bpf_strcmp(const char *s1, const char *s2)
> +{
> +       char c1, c2;
> +       int i;
> +
> +       if (!s1 || !s2)
> +               return -EFAULT;
> +
> +       guard(pagefault)();
> +       for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +               __get_kernel_nofault(&c1, s1, char, err_out);
> +               __get_kernel_nofault(&c2, s2, char, err_out);
> +               if (c1 != c2)
> +                       return c1 < c2 ? -1 : 1;
> +               if (c1 == '\0')
> +                       return 0;
> +               s1++;
> +               s2++;
> +       }
> +       return -E2BIG;
> +err_out:
> +       return -EFAULT;
> +}
> +
> +/**
> + * bpf_strchr - Find the first occurrence of a character in a string
> + * @s: The string to be searched
> + * @c: The character to search for
> + *
> + * Note that the %NUL-terminator is considered part of the string, and can
> + * be searched for.
> + *
> + * Return:
> + * * const char * - Pointer to the first occurrence of @c within @s
> + * * %NULL        - @c not found in @s
> + * * %-EFAULT     - Cannot read @s
> + * * %-E2BIG      - @s too large
> + */
> +__bpf_kfunc const char *bpf_strchr(const char *s, char c)

so let's say we found the character, we return a pointer to it, and
that memory goes away (because we never owned it, so we don't really
know what and when will happen with it). Question, will verifier allow
BPF program to dereference this pointer? If yes, that's a problem. But
if not, then I'm not sure there is much point in returning a pointer.


I'm just trying to imply that in BPF world integer-based APIs work
better/safer, overall? For strings, we can switch any
pointer-returning API to position-returning (or negative error) API
and it would more or less naturally fit into BPF API surface, no?

> +{
> +       char sc;
> +       int i;
> +
> +       if (!s)
> +               return ERR_PTR(-EFAULT);
> +
> +       guard(pagefault)();
> +       for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +               __get_kernel_nofault(&sc, s, char, err_out);
> +               if (sc == c)
> +                       return s;
> +               if (sc == '\0')
> +                       return NULL;
> +               s++;
> +       }
> +       return ERR_PTR(-E2BIG);
> +err_out:
> +       return ERR_PTR(-EFAULT);

this implementation can be replaced with just `return bpf_strnchr(s,
XATTR_SIZE_MAX, c);`, no?

> +}
> +
> +/**
> + * bpf_strnchr - Find a character in a length limited string
> + * @s: The string to be searched
> + * @count: The number of characters to be searched
> + * @c: The character to search for
> + *
> + * Note that the %NUL-terminator is considered part of the string, and can
> + * be searched for.
> + *
> + * Return:
> + * * const char * - Pointer to the first occurrence of @c within @s
> + * * %NULL        - @c not found in the first @count characters of @s
> + * * %-EFAULT     - Cannot read @s
> + * * %-E2BIG      - @s too large
> + */
> +__bpf_kfunc const char *bpf_strnchr(const char *s, size_t count, char c)
> +{
> +       char sc;
> +       int i;
> +
> +       if (!s)
> +               return ERR_PTR(-EFAULT);
> +
> +       guard(pagefault)();
> +       for (i = 0; i < count && i < XATTR_SIZE_MAX; i++) {
> +               __get_kernel_nofault(&sc, s, char, err_out);
> +               if (sc == c)
> +                       return s;
> +               if (sc == '\0')
> +                       return NULL;
> +               s++;
> +       }
> +       return i == XATTR_SIZE_MAX ? ERR_PTR(-E2BIG) : NULL;
> +err_out:
> +       return ERR_PTR(-EFAULT);
> +}
> +

[...]

> +/**
> + * bpf_strlen - Calculate the length of a string
> + * @s: The string
> + *
> + * Return:
> + * * >=0      - The length of @s
> + * * %-EFAULT - Cannot read @s
> + * * %-E2BIG  - @s too large
> + */
> +__bpf_kfunc int bpf_strlen(const char *s)
> +{
> +       char c;
> +       int i;
> +
> +       if (!s)
> +               return -EFAULT;
> +
> +       guard(pagefault)();
> +       for (i = 0; i < XATTR_SIZE_MAX; i++) {
> +               __get_kernel_nofault(&c, s, char, err_out);
> +               if (c == '\0')
> +                       return i;
> +               s++;
> +       }
> +       return -E2BIG;
> +err_out:
> +       return -EFAULT;


return bpf_strnlen(s, XATTR_SIZE_MAX)?

You get the idea.

[...]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations
  2025-05-09 18:20   ` Andrii Nakryiko
@ 2025-05-09 21:37     ` Eduard Zingerman
  2025-05-09 22:03       ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Eduard Zingerman @ 2025-05-09 21:37 UTC (permalink / raw)
  To: Andrii Nakryiko, Viktor Malik
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On Fri, 2025-05-09 at 11:20 -0700, Andrii Nakryiko wrote:

[...]

> > +/**
> > + * bpf_strchr - Find the first occurrence of a character in a string
> > + * @s: The string to be searched
> > + * @c: The character to search for
> > + *
> > + * Note that the %NUL-terminator is considered part of the string, and can
> > + * be searched for.
> > + *
> > + * Return:
> > + * * const char * - Pointer to the first occurrence of @c within @s
> > + * * %NULL        - @c not found in @s
> > + * * %-EFAULT     - Cannot read @s
> > + * * %-E2BIG      - @s too large
> > + */
> > +__bpf_kfunc const char *bpf_strchr(const char *s, char c)
> 
> so let's say we found the character, we return a pointer to it, and
> that memory goes away (because we never owned it, so we don't really
> know what and when will happen with it). Question, will verifier allow
> BPF program to dereference this pointer? If yes, that's a problem. But
> if not, then I'm not sure there is much point in returning a pointer.
> 
> 
> I'm just trying to imply that in BPF world integer-based APIs work
> better/safer, overall? For strings, we can switch any
> pointer-returning API to position-returning (or negative error) API
> and it would more or less naturally fit into BPF API surface, no?

Integer based API solves the problem with memory access but is not
really ergonomic. W/o special logic in verifier the returned int would
be unbounded, hence the user would have to compare it with string
length before using.

It looks like some verifier logic is necessary regardless of API being
integer or pointer based. In any case verifier needs additional rules
for each pointer type to adjust bounds on the return value or its refobj_id.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations
  2025-05-09 21:37     ` Eduard Zingerman
@ 2025-05-09 22:03       ` Andrii Nakryiko
  2025-05-15 12:27         ` Viktor Malik
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2025-05-09 22:03 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Viktor Malik, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On Fri, May 9, 2025 at 2:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2025-05-09 at 11:20 -0700, Andrii Nakryiko wrote:
>
> [...]
>
> > > +/**
> > > + * bpf_strchr - Find the first occurrence of a character in a string
> > > + * @s: The string to be searched
> > > + * @c: The character to search for
> > > + *
> > > + * Note that the %NUL-terminator is considered part of the string, and can
> > > + * be searched for.
> > > + *
> > > + * Return:
> > > + * * const char * - Pointer to the first occurrence of @c within @s
> > > + * * %NULL        - @c not found in @s
> > > + * * %-EFAULT     - Cannot read @s
> > > + * * %-E2BIG      - @s too large
> > > + */
> > > +__bpf_kfunc const char *bpf_strchr(const char *s, char c)
> >
> > so let's say we found the character, we return a pointer to it, and
> > that memory goes away (because we never owned it, so we don't really
> > know what and when will happen with it). Question, will verifier allow
> > BPF program to dereference this pointer? If yes, that's a problem. But
> > if not, then I'm not sure there is much point in returning a pointer.
> >
> >
> > I'm just trying to imply that in BPF world integer-based APIs work
> > better/safer, overall? For strings, we can switch any
> > pointer-returning API to position-returning (or negative error) API
> > and it would more or less naturally fit into BPF API surface, no?
>
> Integer based API solves the problem with memory access but is not
> really ergonomic. W/o special logic in verifier the returned int would
> be unbounded, hence the user would have to compare it with string
> length before using.
>
> It looks like some verifier logic is necessary regardless of API being
> integer or pointer based. In any case verifier needs additional rules
> for each pointer type to adjust bounds on the return value or its refobj_id.
>

You can't safely dereference any pointer returned from these APIs,
because the memory might not be there anymore.

For integers, same idea. If you use bpf_probe_read_{kernel,user} to
read data, then verifier doesn't care about the value of integer.

But that's not ergonomic, so in some other thread few days ago I was
proposing that we should add an untyped counterpart to bpf_core_cast()
that would just make any memory accesses performed using
__get_kernel_nofault() semantics. And so then:


const char *str = <random value or we got it from somewhere untrusted>;
int space_idx = bpf_strchr(str, ' ');
if (space_idx < 0)
  return -1; /* bad luck */

const char *s = bpf_mem_cast(str);
char buf[64] = {};

bpf_for(i, 0, space_idx)
    buf[i] = s[i]; /* MAGIC */

bpf_printk("STUFF BEFORE SPACE: %s", buf);


Tbh, when dealing with libc string APIs, I still very frequently
convert resulting pointers into indices, so I don't think it's
actually an API regression to have index-based string APIs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 1/4] bpf: Teach vefier to handle const ptrs as args to kfuncs
  2025-05-09 16:20     ` Alexei Starovoitov
@ 2025-05-13  6:48       ` Matt Bobrowski
  2025-05-13  7:54         ` Viktor Malik
  0 siblings, 1 reply; 25+ messages in thread
From: Matt Bobrowski @ 2025-05-13  6:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Viktor Malik, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa

On Fri, May 09, 2025 at 09:20:53AM -0700, Alexei Starovoitov wrote:
> On Thu, May 8, 2025 at 2:09 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> >
> > >
> > >  static int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > > -                      u32 regno, u32 mem_size)
> > > +                      u32 regno, u32 mem_size, bool read_only)
> >
> > Maybe s/read_only/write_mem_access?
> 
> 'bool' arguments are not readable at the callsite.
> Let's use enum bpf_access_type BPF_READ|WRITE here
> or introduce another enum ?

Yes, I agree, and using enum bpf_access_type is also something that
had crossed my mind. I think that's what should be used here in favour
of the boolean.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 2/4] uaccess: Define pagefault lock guard
  2025-05-09 18:20     ` Andrii Nakryiko
@ 2025-05-13  6:55       ` Matt Bobrowski
  0 siblings, 0 replies; 25+ messages in thread
From: Matt Bobrowski @ 2025-05-13  6:55 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Viktor Malik, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, linux-kernel

On Fri, May 09, 2025 at 11:20:48AM -0700, Andrii Nakryiko wrote:
> On Thu, May 8, 2025 at 3:01 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> >
> > On Wed, May 07, 2025 at 08:40:37AM +0200, Viktor Malik wrote:
> > > Define a pagefault lock guard which allows to simplify functions that
> > > need to disable page faults.
> > >
> > > Signed-off-by: Viktor Malik <vmalik@redhat.com>
> > > ---
> > >  include/linux/uaccess.h | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> > > index 7c06f4795670..1beb5b395d81 100644
> > > --- a/include/linux/uaccess.h
> > > +++ b/include/linux/uaccess.h
> > > @@ -296,6 +296,8 @@ static inline bool pagefault_disabled(void)
> > >   */
> > >  #define faulthandler_disabled() (pagefault_disabled() || in_atomic())
> > >
> > > +DEFINE_LOCK_GUARD_0(pagefault, pagefault_disable(), pagefault_enable())
> >
> > I can't help but mention that naming this scope-based cleanup helper
> > `pagefault` just seems overly ambiguous. That's just me though...
> 
> I do see the concern, but
> 
> DEFINE_LOCK_GUARD_0(preempt, preempt_disable(), preempt_enable())
> DEFINE_LOCK_GUARD_0(irq, local_irq_disable(), local_irq_enable())
> 
> so we are just staying consistent here? But also "guard (against) the
> pagefault" does (internally) read somewhat meaningfully, no?

Now that you've written it out like that, yes I do agree, that does
read somewhat meaningfully. I also don't have any better suggestions
at this point, so I think leaving it as it is now is also totally
fine.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 1/4] bpf: Teach vefier to handle const ptrs as args to kfuncs
  2025-05-13  6:48       ` Matt Bobrowski
@ 2025-05-13  7:54         ` Viktor Malik
  2025-05-13 14:58           ` Alexei Starovoitov
  0 siblings, 1 reply; 25+ messages in thread
From: Viktor Malik @ 2025-05-13  7:54 UTC (permalink / raw)
  To: Matt Bobrowski, Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On 5/13/25 08:48, Matt Bobrowski wrote:
> On Fri, May 09, 2025 at 09:20:53AM -0700, Alexei Starovoitov wrote:
>> On Thu, May 8, 2025 at 2:09 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
>>>
>>>>
>>>>  static int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
>>>> -                      u32 regno, u32 mem_size)
>>>> +                      u32 regno, u32 mem_size, bool read_only)
>>>
>>> Maybe s/read_only/write_mem_access?
>>
>> 'bool' arguments are not readable at the callsite.
>> Let's use enum bpf_access_type BPF_READ|WRITE here
>> or introduce another enum ?
> 
> Yes, I agree, and using enum bpf_access_type is also something that
> had crossed my mind. I think that's what should be used here in favour
> of the boolean.

Reusing bpf_access_type feels like the right thing here, however, it is
missing an option for read/write access. Should we introduce a new
BPF_READ_WRITE enum value? Or assume that BPF_WRITE should also perform
the BPF_READ check? Or make this argument an int and pass
`BPF_READ | BPF_WRITE`?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 1/4] bpf: Teach vefier to handle const ptrs as args to kfuncs
  2025-05-08  9:08   ` Matt Bobrowski
  2025-05-09 16:20     ` Alexei Starovoitov
@ 2025-05-13  7:58     ` Viktor Malik
  1 sibling, 0 replies; 25+ messages in thread
From: Viktor Malik @ 2025-05-13  7:58 UTC (permalink / raw)
  To: Matt Bobrowski
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On 5/8/25 11:08, Matt Bobrowski wrote:
> On Wed, May 07, 2025 at 08:40:36AM +0200, Viktor Malik wrote:
>> When a kfunc takes a const pointer as an argument, the verifier should
>> not check that the memory can be accessed for writing as that may lead
>> to rejecting safe programs. Extend the verifier to detect such arguments
>> and skip the write access check for them.
>>
>> The use-case for this change is passing string literals (i.e. read-only
>> maps) to read-only string kfuncs.
>>
>> Signed-off-by: Viktor Malik <vmalik@redhat.com>
>> ---
>>  include/linux/btf.h   |  5 +++++
>>  kernel/bpf/verifier.c | 10 ++++++----
>>  2 files changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>> index ebc0c0c9b944..5cb06c65d91f 100644
>> --- a/include/linux/btf.h
>> +++ b/include/linux/btf.h
>> @@ -391,6 +391,11 @@ static inline bool btf_type_is_type_tag(const struct btf_type *t)
>>  	return BTF_INFO_KIND(t->info) == BTF_KIND_TYPE_TAG;
>>  }
>>  
>> +static inline bool btf_type_is_const(const struct btf_type *t)
>> +{
>> +	return BTF_INFO_KIND(t->info) == BTF_KIND_CONST;
>> +}
> 
> I've seen btf_type_is_* related helpers lean on btf_kind() instead
> here, which ultimately just wraps BTF_INFO_KIND() macro.

This function is using the same style as 7 btf_type_is_* helpers
directly above it so I don't think that it'd be doing something
non-standard here.

>>  /* union is only a special case of struct:
>>   * all its offsetof(member) == 0
>>   */
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 54c6953a8b84..e2d74c4d44c1 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -8186,7 +8186,7 @@ static int check_mem_size_reg(struct bpf_verifier_env *env,
>>  }
>>  
>>  static int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
>> -			 u32 regno, u32 mem_size)
>> +			 u32 regno, u32 mem_size, bool read_only)
> 
> Maybe s/read_only/write_mem_access?
> 
>>  {
>>  	bool may_be_null = type_may_be_null(reg->type);
>>  	struct bpf_reg_state saved_reg;
>> @@ -8205,7 +8205,8 @@ static int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg
>>  	}
>>  
>>  	err = check_helper_mem_access(env, regno, mem_size, BPF_READ, true, NULL);
>> -	err = err ?: check_helper_mem_access(env, regno, mem_size, BPF_WRITE, true, NULL);
>> +	if (!read_only)
>> +		err = err ?: check_helper_mem_access(env, regno, mem_size, BPF_WRITE, true, NULL);
> 
> For clarity, I'd completely get rid of the ternary operator usage
> here. You can short circuit the call to check_helper_mem_access() w/
> BPF_WRITE by simply checking the error code value from the preceding
> call to check_helper_mem_access() w/ BPF_READ in the branch condition
> i.e.
> 
> ```
> err = check_helper_mem_access(..., BPF_READ, ...);
> if (!err && write_mem_access)
>    err = check_helper_mem_acces(..., BPF_WRITE, ...);
> ```

That's a nice idea, thanks! I'll definitely use it in some form
(depending on how we end up representing the access type param).

> 
>>  	if (may_be_null)
>>  		*reg = saved_reg;
>> @@ -10361,7 +10362,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
>>  			ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE);
>>  			if (ret < 0)
>>  				return ret;
>> -			if (check_mem_reg(env, reg, regno, arg->mem_size))
>> +			if (check_mem_reg(env, reg, regno, arg->mem_size, false))
> 
> For clarity, I'd add: /*write_mem_access=*/false). Same with the below
> call to check_mem_reg().
> 
>>  				return -EINVAL;
>>  			if (!(arg->arg_type & PTR_MAYBE_NULL) && (reg->type & PTR_MAYBE_NULL)) {
>>  				bpf_log(log, "arg#%d is expected to be non-NULL\n", i);
>> @@ -13252,7 +13253,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>>  					i, btf_type_str(ref_t), ref_tname, PTR_ERR(resolve_ret));
>>  				return -EINVAL;
>>  			}
>> -			ret = check_mem_reg(env, reg, regno, type_size);
>> +			ret = check_mem_reg(env, reg, regno, type_size,
>> +					    btf_type_is_const(btf_type_by_id(btf, t->type)));
>>  			if (ret < 0)
>>  				return ret;
>>  			break;
>> -- 
>> 2.49.0
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 1/4] bpf: Teach vefier to handle const ptrs as args to kfuncs
  2025-05-13  7:54         ` Viktor Malik
@ 2025-05-13 14:58           ` Alexei Starovoitov
  0 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2025-05-13 14:58 UTC (permalink / raw)
  To: Viktor Malik
  Cc: Matt Bobrowski, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa

On Tue, May 13, 2025 at 12:54 AM Viktor Malik <vmalik@redhat.com> wrote:
>
> On 5/13/25 08:48, Matt Bobrowski wrote:
> > On Fri, May 09, 2025 at 09:20:53AM -0700, Alexei Starovoitov wrote:
> >> On Thu, May 8, 2025 at 2:09 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> >>>
> >>>>
> >>>>  static int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> >>>> -                      u32 regno, u32 mem_size)
> >>>> +                      u32 regno, u32 mem_size, bool read_only)
> >>>
> >>> Maybe s/read_only/write_mem_access?
> >>
> >> 'bool' arguments are not readable at the callsite.
> >> Let's use enum bpf_access_type BPF_READ|WRITE here
> >> or introduce another enum ?
> >
> > Yes, I agree, and using enum bpf_access_type is also something that
> > had crossed my mind. I think that's what should be used here in favour
> > of the boolean.
>
> Reusing bpf_access_type feels like the right thing here, however, it is
> missing an option for read/write access. Should we introduce a new
> BPF_READ_WRITE enum value? Or assume that BPF_WRITE should also perform
> the BPF_READ check? Or make this argument an int and pass
> `BPF_READ | BPF_WRITE`?

I think most, if not all, other places in the verifier
assume that BPF_WRITE also means that read has to work.
So I don't see a need for new enum values.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations
  2025-05-09 22:03       ` Andrii Nakryiko
@ 2025-05-15 12:27         ` Viktor Malik
  2025-05-15 17:17           ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Viktor Malik @ 2025-05-15 12:27 UTC (permalink / raw)
  To: Andrii Nakryiko, Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On 5/10/25 00:03, Andrii Nakryiko wrote:
> On Fri, May 9, 2025 at 2:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>
>> On Fri, 2025-05-09 at 11:20 -0700, Andrii Nakryiko wrote:
>>
>> [...]
>>
>>>> +/**
>>>> + * bpf_strchr - Find the first occurrence of a character in a string
>>>> + * @s: The string to be searched
>>>> + * @c: The character to search for
>>>> + *
>>>> + * Note that the %NUL-terminator is considered part of the string, and can
>>>> + * be searched for.
>>>> + *
>>>> + * Return:
>>>> + * * const char * - Pointer to the first occurrence of @c within @s
>>>> + * * %NULL        - @c not found in @s
>>>> + * * %-EFAULT     - Cannot read @s
>>>> + * * %-E2BIG      - @s too large
>>>> + */
>>>> +__bpf_kfunc const char *bpf_strchr(const char *s, char c)
>>>
>>> so let's say we found the character, we return a pointer to it, and
>>> that memory goes away (because we never owned it, so we don't really
>>> know what and when will happen with it). Question, will verifier allow
>>> BPF program to dereference this pointer? If yes, that's a problem. But
>>> if not, then I'm not sure there is much point in returning a pointer.

You are right, at the moment, the verifier marks the returned pointers
as `rdonly_mem_or_null` so an attempt to dereference them will result
into a verifier error. Which is clearly not very useful.

I'd say that, theoretically, the pointers returned from these kfuncs
should be treated by the verifier in the same way as the passed
pointers. That is, if PTR_TO_MAP_VALUE is passed,
PTR_TO_MAP_VALUE_OR_NULL should be returned, and so on.

>>> I'm just trying to imply that in BPF world integer-based APIs work
>>> better/safer, overall? For strings, we can switch any
>>> pointer-returning API to position-returning (or negative error) API
>>> and it would more or less naturally fit into BPF API surface, no?
>>
>> Integer based API solves the problem with memory access but is not
>> really ergonomic. W/o special logic in verifier the returned int would
>> be unbounded, hence the user would have to compare it with string
>> length before using.
>>
>> It looks like some verifier logic is necessary regardless of API being
>> integer or pointer based. In any case verifier needs additional rules
>> for each pointer type to adjust bounds on the return value or its refobj_id.

IMO the problem here is that we can't just say anything about the
returned pointer (or index) rather than it is within the bounds of the
original string (or within the passed size for bounded kfuncs). So, any
access to that pointer with an offset other than 0 will still need an
explicit bounds check.

> You can't safely dereference any pointer returned from these APIs,
> because the memory might not be there anymore.

You can't if the memory comes from an untrusted source. But what if the
memory is owned by the BPF program (e.g. on stack or in a map)? Then, it
should be possible to dereference it safely, shouldn't it? IMHO, this
would be quite a common use-case: read string into BPF memory using
bpf_probe_read_str -> use string kfunc to search it -> do something with
the returned pointer (dereference it). From ergonomics perspective, it
shouldn't be necessary to use bpf_probe_read or __get_kernel_nofault
again.

> For integers, same idea. If you use bpf_probe_read_{kernel,user} to
> read data, then verifier doesn't care about the value of integer.
> 
> But that's not ergonomic, so in some other thread few days ago I was
> proposing that we should add an untyped counterpart to bpf_core_cast()
> that would just make any memory accesses performed using
> __get_kernel_nofault() semantics. And so then:
> 
> 
> const char *str = <random value or we got it from somewhere untrusted>;
> int space_idx = bpf_strchr(str, ' ');
> if (space_idx < 0)
>   return -1; /* bad luck */
> 
> const char *s = bpf_mem_cast(str);
> char buf[64] = {};
> 
> bpf_for(i, 0, space_idx)
>     buf[i] = s[i]; /* MAGIC */
> 
> bpf_printk("STUFF BEFORE SPACE: %s", buf);

I can imagine that this would be a nice helper for accessing untrusted
memory in general but I don't think it's specific to string kfuncs. At
the moment, reading an untrusted string requires bpf_probe_read_str so
calling it after processing the string via a kfunc doesn't introduce any
extra call.

BTW note that at the moment, the kfuncs do not accept strings from
untrusted sources as the verifier doesn't know how to treat `char *`
kfunc args and treats them as scalars (which are incompatible with other
pointers). Here, changing also the kfunc args to ints would probably
help, although I think that it would not be ergonomic at all. So, we
still need some verifier work to handle `char *` args to kfuncs.

> Tbh, when dealing with libc string APIs, I still very frequently
> convert resulting pointers into indices, so I don't think it's
> actually an API regression to have index-based string APIs

I agree here. In addition, it looks to me that returning pointers would
require extra verifier work while returning indices would not. And we
still need to apply extra bounds checks or access helpers in a majority
of use-cases so we don't loose much ergonomics with integer-based APIs.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations
  2025-05-08  9:41   ` Matt Bobrowski
  2025-05-09 16:39     ` Alexei Starovoitov
@ 2025-05-15 12:32     ` Viktor Malik
  1 sibling, 0 replies; 25+ messages in thread
From: Viktor Malik @ 2025-05-15 12:32 UTC (permalink / raw)
  To: Matt Bobrowski
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On 5/8/25 11:41, Matt Bobrowski wrote:
> On Wed, May 07, 2025 at 08:40:38AM +0200, Viktor Malik wrote:
>> String operations are commonly used so this exposes the most common ones
>> to BPF programs. For now, we limit ourselves to operations which do not
>> copy memory around.
>>
>> Unfortunately, most in-kernel implementations assume that strings are
>> %NUL-terminated, which is not necessarily true, and therefore we cannot
>> use them directly in the BPF context. Instead, we open-code them using
>> __get_kernel_nofault instead of plain dereference to make them safe and
>> limit the strings length to XATTR_SIZE_MAX to make sure the functions
>> terminate. When __get_kernel_nofault fails, functions return -EFAULT.
>> Similarly, when the size bound is reached, the functions return -E2BIG.
> 
> Curious, why was XATTR_SIZE_MAX chosen as the upper bounds here? Just
> an arbitrary value that felt large enough?

Yes, exactly that.

> 
>> At the moment, strings can be passed to the kfuncs in three forms:
>> - string literals (i.e. pointers to read-only maps)
>> - global variables (i.e. pointers to read-write maps)
>> - stack-allocated buffers
>>
>> Note that currently, it is not possible to pass strings from the BPF
>> program context (like function args) as the verifier doesn't treat them
>> as neither PTR_TO_MEM nor PTR_TO_BTF_ID.
>>
>> Suggested-by: Alexei Starovoitov <ast@kernel.org>
>> Signed-off-by: Viktor Malik <vmalik@redhat.com>
>> ---
>>  kernel/bpf/helpers.c | 440 +++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 440 insertions(+)
>>
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index e3a2662f4e33..8fb7c2ca7ac0 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -23,6 +23,7 @@
>>  #include <linux/btf_ids.h>
>>  #include <linux/bpf_mem_alloc.h>
>>  #include <linux/kasan.h>
>> +#include <linux/uaccess.h>
>>  
>>  #include "../../lib/kstrtox.h"
>>  
>> @@ -3194,6 +3195,433 @@ __bpf_kfunc void bpf_local_irq_restore(unsigned long *flags__irq_flag)
>>  	local_irq_restore(*flags__irq_flag);
>>  }
>>  
>> +/* Kfuncs for string operations.
>> + *
>> + * Since strings are not necessarily %NUL-terminated, we cannot directly call
>> + * in-kernel implementations. Instead, we open-code the implementations using
>> + * __get_kernel_nofault instead of plain dereference to make them safe.
>> + */
> 
> Returning an -EFAULT error code for supplied arguments which are
> deemed to be invalid is just a very weird semantic IMO. As a BPF
> program author, I totally wouldn't expect a BPF kfunc to return
> -EINVAL if I had supplied it something that it couldn't quite possibly
> handle to begin with. Look at the prior art, being pre-existing BPF
> kfuncs, and you'll see how they handle invalidly supplied arguments. I
> totally understand that attempting to dereference a NULL-pointer would
> ultimately result in a fault being raised, so it may feel rather
> natural to also report an -EFAULT error code upon doing some
> NULL-pointer checks that hold true, but from an API usability POV it
> just seems awkward and wrong.
> 
> Another thing that I noticed was that none of these BPF kfunc
> arguments make use of the parameter suffixes i.e. __str, __sz. Why is
> that exactly? Will leaning on those break you in some way?

The reason is that they both require literal values to be passed which
is a limitation that we don't want in these APIs.

Viktor


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations
  2025-05-15 12:27         ` Viktor Malik
@ 2025-05-15 17:17           ` Andrii Nakryiko
  2025-05-16  5:59             ` Viktor Malik
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2025-05-15 17:17 UTC (permalink / raw)
  To: Viktor Malik
  Cc: Eduard Zingerman, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On Thu, May 15, 2025 at 5:27 AM Viktor Malik <vmalik@redhat.com> wrote:
>
> On 5/10/25 00:03, Andrii Nakryiko wrote:
> > On Fri, May 9, 2025 at 2:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >>
> >> On Fri, 2025-05-09 at 11:20 -0700, Andrii Nakryiko wrote:
> >>
> >> [...]
> >>
> >>>> +/**
> >>>> + * bpf_strchr - Find the first occurrence of a character in a string
> >>>> + * @s: The string to be searched
> >>>> + * @c: The character to search for
> >>>> + *
> >>>> + * Note that the %NUL-terminator is considered part of the string, and can
> >>>> + * be searched for.
> >>>> + *
> >>>> + * Return:
> >>>> + * * const char * - Pointer to the first occurrence of @c within @s
> >>>> + * * %NULL        - @c not found in @s
> >>>> + * * %-EFAULT     - Cannot read @s
> >>>> + * * %-E2BIG      - @s too large
> >>>> + */
> >>>> +__bpf_kfunc const char *bpf_strchr(const char *s, char c)
> >>>
> >>> so let's say we found the character, we return a pointer to it, and
> >>> that memory goes away (because we never owned it, so we don't really
> >>> know what and when will happen with it). Question, will verifier allow
> >>> BPF program to dereference this pointer? If yes, that's a problem. But
> >>> if not, then I'm not sure there is much point in returning a pointer.
>
> You are right, at the moment, the verifier marks the returned pointers
> as `rdonly_mem_or_null` so an attempt to dereference them will result
> into a verifier error. Which is clearly not very useful.
>
> I'd say that, theoretically, the pointers returned from these kfuncs
> should be treated by the verifier in the same way as the passed
> pointers. That is, if PTR_TO_MAP_VALUE is passed,
> PTR_TO_MAP_VALUE_OR_NULL should be returned, and so on.
>
> >>> I'm just trying to imply that in BPF world integer-based APIs work
> >>> better/safer, overall? For strings, we can switch any
> >>> pointer-returning API to position-returning (or negative error) API
> >>> and it would more or less naturally fit into BPF API surface, no?
> >>
> >> Integer based API solves the problem with memory access but is not
> >> really ergonomic. W/o special logic in verifier the returned int would
> >> be unbounded, hence the user would have to compare it with string
> >> length before using.
> >>
> >> It looks like some verifier logic is necessary regardless of API being
> >> integer or pointer based. In any case verifier needs additional rules
> >> for each pointer type to adjust bounds on the return value or its refobj_id.
>
> IMO the problem here is that we can't just say anything about the
> returned pointer (or index) rather than it is within the bounds of the
> original string (or within the passed size for bounded kfuncs). So, any
> access to that pointer with an offset other than 0 will still need an
> explicit bounds check.

Exactly.

>
> > You can't safely dereference any pointer returned from these APIs,
> > because the memory might not be there anymore.
>
> You can't if the memory comes from an untrusted source. But what if the
> memory is owned by the BPF program (e.g. on stack or in a map)? Then, it
> should be possible to dereference it safely, shouldn't it? IMHO, this
> would be quite a common use-case: read string into BPF memory using
> bpf_probe_read_str -> use string kfunc to search it -> do something with
> the returned pointer (dereference it). From ergonomics perspective, it
> shouldn't be necessary to use bpf_probe_read or __get_kernel_nofault
> again.

For bpf_probe_read_str -> use kfunc scenario, I thought the main
*goal* is to avoid the bpf_probe_read_str operation altogether. That's
why we allow unsafe pointers passed into those kfuncs you are adding
and why we use __get_kernel_nofault internally.

So with that, you'd actually just use, say, bpf_strchr(), get back
some pointer or index, calculate substring (prefix) length, and *then*
maybe bpf_probe_read_str into ringbuf or local buffer, if you'd like
to capture the data and do some post processing.

>
> > For integers, same idea. If you use bpf_probe_read_{kernel,user} to
> > read data, then verifier doesn't care about the value of integer.
> >
> > But that's not ergonomic, so in some other thread few days ago I was
> > proposing that we should add an untyped counterpart to bpf_core_cast()
> > that would just make any memory accesses performed using
> > __get_kernel_nofault() semantics. And so then:
> >
> >
> > const char *str = <random value or we got it from somewhere untrusted>;
> > int space_idx = bpf_strchr(str, ' ');
> > if (space_idx < 0)
> >   return -1; /* bad luck */
> >
> > const char *s = bpf_mem_cast(str);
> > char buf[64] = {};
> >
> > bpf_for(i, 0, space_idx)
> >     buf[i] = s[i]; /* MAGIC */
> >
> > bpf_printk("STUFF BEFORE SPACE: %s", buf);
>
> I can imagine that this would be a nice helper for accessing untrusted
> memory in general but I don't think it's specific to string kfuncs. At
> the moment, reading an untrusted string requires bpf_probe_read_str so
> calling it after processing the string via a kfunc doesn't introduce any
> extra call.

I might be confused, see above. My impression was that with your
functions we won't need bpf_probe_read_str() and that was the whole
point of your __get_kernel_nofault-based reimplementation. If not for
that, we'd use trusted memory pointers and just used internal kernel
strings, knowing that we are working with MAP_VALUE or PTR_TO_STACK of
well-bounded size.

Then again, see what I wrote above. I imagine that the user would not
do bpf_probe_read_str() at all. I'll do bpf_strchr(), followed by
bpf_memcast(), followed by iterating from 0 to the index returned from
bpf_strchr(), if I need to process the substring.

>
> BTW note that at the moment, the kfuncs do not accept strings from
> untrusted sources as the verifier doesn't know how to treat `char *`
> kfunc args and treats them as scalars (which are incompatible with other
> pointers). Here, changing also the kfunc args to ints would probably
> help, although I think that it would not be ergonomic at all. So, we
> still need some verifier work to handle `char *` args to kfuncs.

Ok, so that's probably the missing piece. We need to teach verifiers
to allow such untrusted pointers. I thought that was the whole idea
behind adding these APIs: to allow such usage.

>
> > Tbh, when dealing with libc string APIs, I still very frequently
> > convert resulting pointers into indices, so I don't think it's
> > actually an API regression to have index-based string APIs
>
> I agree here. In addition, it looks to me that returning pointers would
> require extra verifier work while returning indices would not. And we
> still need to apply extra bounds checks or access helpers in a majority
> of use-cases so we don't loose much ergonomics with integer-based APIs.
>

+1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations
  2025-05-15 17:17           ` Andrii Nakryiko
@ 2025-05-16  5:59             ` Viktor Malik
  2025-05-16 15:50               ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Viktor Malik @ 2025-05-16  5:59 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Eduard Zingerman, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On 5/15/25 19:17, Andrii Nakryiko wrote:
> On Thu, May 15, 2025 at 5:27 AM Viktor Malik <vmalik@redhat.com> wrote:
>>
>> On 5/10/25 00:03, Andrii Nakryiko wrote:
>>> On Fri, May 9, 2025 at 2:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>>>
>>>> On Fri, 2025-05-09 at 11:20 -0700, Andrii Nakryiko wrote:
>>>>
>>>> [...]
>>>>
>>>>>> +/**
>>>>>> + * bpf_strchr - Find the first occurrence of a character in a string
>>>>>> + * @s: The string to be searched
>>>>>> + * @c: The character to search for
>>>>>> + *
>>>>>> + * Note that the %NUL-terminator is considered part of the string, and can
>>>>>> + * be searched for.
>>>>>> + *
>>>>>> + * Return:
>>>>>> + * * const char * - Pointer to the first occurrence of @c within @s
>>>>>> + * * %NULL        - @c not found in @s
>>>>>> + * * %-EFAULT     - Cannot read @s
>>>>>> + * * %-E2BIG      - @s too large
>>>>>> + */
>>>>>> +__bpf_kfunc const char *bpf_strchr(const char *s, char c)
>>>>>
>>>>> so let's say we found the character, we return a pointer to it, and
>>>>> that memory goes away (because we never owned it, so we don't really
>>>>> know what and when will happen with it). Question, will verifier allow
>>>>> BPF program to dereference this pointer? If yes, that's a problem. But
>>>>> if not, then I'm not sure there is much point in returning a pointer.
>>
>> You are right, at the moment, the verifier marks the returned pointers
>> as `rdonly_mem_or_null` so an attempt to dereference them will result
>> into a verifier error. Which is clearly not very useful.
>>
>> I'd say that, theoretically, the pointers returned from these kfuncs
>> should be treated by the verifier in the same way as the passed
>> pointers. That is, if PTR_TO_MAP_VALUE is passed,
>> PTR_TO_MAP_VALUE_OR_NULL should be returned, and so on.
>>
>>>>> I'm just trying to imply that in BPF world integer-based APIs work
>>>>> better/safer, overall? For strings, we can switch any
>>>>> pointer-returning API to position-returning (or negative error) API
>>>>> and it would more or less naturally fit into BPF API surface, no?
>>>>
>>>> Integer based API solves the problem with memory access but is not
>>>> really ergonomic. W/o special logic in verifier the returned int would
>>>> be unbounded, hence the user would have to compare it with string
>>>> length before using.
>>>>
>>>> It looks like some verifier logic is necessary regardless of API being
>>>> integer or pointer based. In any case verifier needs additional rules
>>>> for each pointer type to adjust bounds on the return value or its refobj_id.
>>
>> IMO the problem here is that we can't just say anything about the
>> returned pointer (or index) rather than it is within the bounds of the
>> original string (or within the passed size for bounded kfuncs). So, any
>> access to that pointer with an offset other than 0 will still need an
>> explicit bounds check.
> 
> Exactly.
> 
>>
>>> You can't safely dereference any pointer returned from these APIs,
>>> because the memory might not be there anymore.
>>
>> You can't if the memory comes from an untrusted source. But what if the
>> memory is owned by the BPF program (e.g. on stack or in a map)? Then, it
>> should be possible to dereference it safely, shouldn't it? IMHO, this
>> would be quite a common use-case: read string into BPF memory using
>> bpf_probe_read_str -> use string kfunc to search it -> do something with
>> the returned pointer (dereference it). From ergonomics perspective, it
>> shouldn't be necessary to use bpf_probe_read or __get_kernel_nofault
>> again.
> 
> For bpf_probe_read_str -> use kfunc scenario, I thought the main
> *goal* is to avoid the bpf_probe_read_str operation altogether. That's
> why we allow unsafe pointers passed into those kfuncs you are adding
> and why we use __get_kernel_nofault internally.

My original use-case (for bpftrace) was pure ergonimics - we typically
have a string on stack or in a map and instead of writing the string
operation by hand, we could use a pre-defined kfunc. But your suggested
use-case is probably even more valuable.

> So with that, you'd actually just use, say, bpf_strchr(), get back
> some pointer or index, calculate substring (prefix) length, and *then*
> maybe bpf_probe_read_str into ringbuf or local buffer, if you'd like
> to capture the data and do some post processing.

Agreed. The great strength I can see in this is that in many cases, you
don't need the follow-up bpf_probe_read_str at all - getting the length
of the (sub)string, testing for substring or character inclusion, etc.

>>
>>> For integers, same idea. If you use bpf_probe_read_{kernel,user} to
>>> read data, then verifier doesn't care about the value of integer.
>>>
>>> But that's not ergonomic, so in some other thread few days ago I was
>>> proposing that we should add an untyped counterpart to bpf_core_cast()
>>> that would just make any memory accesses performed using
>>> __get_kernel_nofault() semantics. And so then:
>>>
>>>
>>> const char *str = <random value or we got it from somewhere untrusted>;
>>> int space_idx = bpf_strchr(str, ' ');
>>> if (space_idx < 0)
>>>   return -1; /* bad luck */
>>>
>>> const char *s = bpf_mem_cast(str);
>>> char buf[64] = {};
>>>
>>> bpf_for(i, 0, space_idx)
>>>     buf[i] = s[i]; /* MAGIC */
>>>
>>> bpf_printk("STUFF BEFORE SPACE: %s", buf);
>>
>> I can imagine that this would be a nice helper for accessing untrusted
>> memory in general but I don't think it's specific to string kfuncs. At
>> the moment, reading an untrusted string requires bpf_probe_read_str so
>> calling it after processing the string via a kfunc doesn't introduce any
>> extra call.
> 
> I might be confused, see above. My impression was that with your
> functions we won't need bpf_probe_read_str() and that was the whole
> point of your __get_kernel_nofault-based reimplementation. If not for
> that, we'd use trusted memory pointers and just used internal kernel
> strings, knowing that we are working with MAP_VALUE or PTR_TO_STACK of
> well-bounded size.
> 
> Then again, see what I wrote above. I imagine that the user would not
> do bpf_probe_read_str() at all. I'll do bpf_strchr(), followed by
> bpf_memcast(), followed by iterating from 0 to the index returned from
> bpf_strchr(), if I need to process the substring.

Yeah, I can see value in that although I'm not sure what's the
difference between bpf_mem_cast and bpf_probe_read_str since they both
use __get_kernel_nofault under the hood. Just ergonomics and the fact
that bpf_mem_cast doesn't need a buffer? Also, shouldn't the
dereferences after bpf_mem_cast be called under pagefault_disable?

>> BTW note that at the moment, the kfuncs do not accept strings from
>> untrusted sources as the verifier doesn't know how to treat `char *`
>> kfunc args and treats them as scalars (which are incompatible with other
>> pointers). Here, changing also the kfunc args to ints would probably
>> help, although I think that it would not be ergonomic at all. So, we
>> still need some verifier work to handle `char *` args to kfuncs.
> 
> Ok, so that's probably the missing piece. We need to teach verifiers
> to allow such untrusted pointers. I thought that was the whole idea
> behind adding these APIs: to allow such usage.

Yes, I'll look into that. Do you want to merge everything together or
should I post the updated (likely int-based) kfuncs in the meantime as
they are useful for trusted pointers as well?

Viktor

>>
>>> Tbh, when dealing with libc string APIs, I still very frequently
>>> convert resulting pointers into indices, so I don't think it's
>>> actually an API regression to have index-based string APIs
>>
>> I agree here. In addition, it looks to me that returning pointers would
>> require extra verifier work while returning indices would not. And we
>> still need to apply extra bounds checks or access helpers in a majority
>> of use-cases so we don't loose much ergonomics with integer-based APIs.
>>
> 
> +1
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations
  2025-05-16  5:59             ` Viktor Malik
@ 2025-05-16 15:50               ` Andrii Nakryiko
  0 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2025-05-16 15:50 UTC (permalink / raw)
  To: Viktor Malik
  Cc: Eduard Zingerman, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On Thu, May 15, 2025 at 10:59 PM Viktor Malik <vmalik@redhat.com> wrote:
>
> On 5/15/25 19:17, Andrii Nakryiko wrote:
> > On Thu, May 15, 2025 at 5:27 AM Viktor Malik <vmalik@redhat.com> wrote:
> >>
> >> On 5/10/25 00:03, Andrii Nakryiko wrote:
> >>> On Fri, May 9, 2025 at 2:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >>>>
> >>>> On Fri, 2025-05-09 at 11:20 -0700, Andrii Nakryiko wrote:
> >>>>
> >>>> [...]
> >>>>
> >>>>>> +/**
> >>>>>> + * bpf_strchr - Find the first occurrence of a character in a string
> >>>>>> + * @s: The string to be searched
> >>>>>> + * @c: The character to search for
> >>>>>> + *
> >>>>>> + * Note that the %NUL-terminator is considered part of the string, and can
> >>>>>> + * be searched for.
> >>>>>> + *
> >>>>>> + * Return:
> >>>>>> + * * const char * - Pointer to the first occurrence of @c within @s
> >>>>>> + * * %NULL        - @c not found in @s
> >>>>>> + * * %-EFAULT     - Cannot read @s
> >>>>>> + * * %-E2BIG      - @s too large
> >>>>>> + */
> >>>>>> +__bpf_kfunc const char *bpf_strchr(const char *s, char c)
> >>>>>
> >>>>> so let's say we found the character, we return a pointer to it, and
> >>>>> that memory goes away (because we never owned it, so we don't really
> >>>>> know what and when will happen with it). Question, will verifier allow
> >>>>> BPF program to dereference this pointer? If yes, that's a problem. But
> >>>>> if not, then I'm not sure there is much point in returning a pointer.
> >>
> >> You are right, at the moment, the verifier marks the returned pointers
> >> as `rdonly_mem_or_null` so an attempt to dereference them will result
> >> into a verifier error. Which is clearly not very useful.
> >>
> >> I'd say that, theoretically, the pointers returned from these kfuncs
> >> should be treated by the verifier in the same way as the passed
> >> pointers. That is, if PTR_TO_MAP_VALUE is passed,
> >> PTR_TO_MAP_VALUE_OR_NULL should be returned, and so on.
> >>
> >>>>> I'm just trying to imply that in BPF world integer-based APIs work
> >>>>> better/safer, overall? For strings, we can switch any
> >>>>> pointer-returning API to position-returning (or negative error) API
> >>>>> and it would more or less naturally fit into BPF API surface, no?
> >>>>
> >>>> Integer based API solves the problem with memory access but is not
> >>>> really ergonomic. W/o special logic in verifier the returned int would
> >>>> be unbounded, hence the user would have to compare it with string
> >>>> length before using.
> >>>>
> >>>> It looks like some verifier logic is necessary regardless of API being
> >>>> integer or pointer based. In any case verifier needs additional rules
> >>>> for each pointer type to adjust bounds on the return value or its refobj_id.
> >>
> >> IMO the problem here is that we can't just say anything about the
> >> returned pointer (or index) rather than it is within the bounds of the
> >> original string (or within the passed size for bounded kfuncs). So, any
> >> access to that pointer with an offset other than 0 will still need an
> >> explicit bounds check.
> >
> > Exactly.
> >
> >>
> >>> You can't safely dereference any pointer returned from these APIs,
> >>> because the memory might not be there anymore.
> >>
> >> You can't if the memory comes from an untrusted source. But what if the
> >> memory is owned by the BPF program (e.g. on stack or in a map)? Then, it
> >> should be possible to dereference it safely, shouldn't it? IMHO, this
> >> would be quite a common use-case: read string into BPF memory using
> >> bpf_probe_read_str -> use string kfunc to search it -> do something with
> >> the returned pointer (dereference it). From ergonomics perspective, it
> >> shouldn't be necessary to use bpf_probe_read or __get_kernel_nofault
> >> again.
> >
> > For bpf_probe_read_str -> use kfunc scenario, I thought the main
> > *goal* is to avoid the bpf_probe_read_str operation altogether. That's
> > why we allow unsafe pointers passed into those kfuncs you are adding
> > and why we use __get_kernel_nofault internally.
>
> My original use-case (for bpftrace) was pure ergonimics - we typically
> have a string on stack or in a map and instead of writing the string
> operation by hand, we could use a pre-defined kfunc. But your suggested
> use-case is probably even more valuable.
>
> > So with that, you'd actually just use, say, bpf_strchr(), get back
> > some pointer or index, calculate substring (prefix) length, and *then*
> > maybe bpf_probe_read_str into ringbuf or local buffer, if you'd like
> > to capture the data and do some post processing.
>
> Agreed. The great strength I can see in this is that in many cases, you
> don't need the follow-up bpf_probe_read_str at all - getting the length
> of the (sub)string, testing for substring or character inclusion, etc.
>
> >>
> >>> For integers, same idea. If you use bpf_probe_read_{kernel,user} to
> >>> read data, then verifier doesn't care about the value of integer.
> >>>
> >>> But that's not ergonomic, so in some other thread few days ago I was
> >>> proposing that we should add an untyped counterpart to bpf_core_cast()
> >>> that would just make any memory accesses performed using
> >>> __get_kernel_nofault() semantics. And so then:
> >>>
> >>>
> >>> const char *str = <random value or we got it from somewhere untrusted>;
> >>> int space_idx = bpf_strchr(str, ' ');
> >>> if (space_idx < 0)
> >>>   return -1; /* bad luck */
> >>>
> >>> const char *s = bpf_mem_cast(str);
> >>> char buf[64] = {};
> >>>
> >>> bpf_for(i, 0, space_idx)
> >>>     buf[i] = s[i]; /* MAGIC */
> >>>
> >>> bpf_printk("STUFF BEFORE SPACE: %s", buf);
> >>
> >> I can imagine that this would be a nice helper for accessing untrusted
> >> memory in general but I don't think it's specific to string kfuncs. At
> >> the moment, reading an untrusted string requires bpf_probe_read_str so
> >> calling it after processing the string via a kfunc doesn't introduce any
> >> extra call.
> >
> > I might be confused, see above. My impression was that with your
> > functions we won't need bpf_probe_read_str() and that was the whole
> > point of your __get_kernel_nofault-based reimplementation. If not for
> > that, we'd use trusted memory pointers and just used internal kernel
> > strings, knowing that we are working with MAP_VALUE or PTR_TO_STACK of
> > well-bounded size.
> >
> > Then again, see what I wrote above. I imagine that the user would not
> > do bpf_probe_read_str() at all. I'll do bpf_strchr(), followed by
> > bpf_memcast(), followed by iterating from 0 to the index returned from
> > bpf_strchr(), if I need to process the substring.
>
> Yeah, I can see value in that although I'm not sure what's the
> difference between bpf_mem_cast and bpf_probe_read_str since they both
> use __get_kernel_nofault under the hood. Just ergonomics and the fact
> that bpf_mem_cast doesn't need a buffer? Also, shouldn't the

It's not "just doesn't need a buffer", that's a major difference (and
that buffer is sometimes a major limitation and blocker) Strings can
be short *most of the time*, but occasionally could be very long. In
both cases you want to process them in their entirety, if possible,
but you can't afford pre-allocating buffer for the worst case just to
be able to make a local copy.

> dereferences after bpf_mem_cast be called under pagefault_disable?
>
> >> BTW note that at the moment, the kfuncs do not accept strings from
> >> untrusted sources as the verifier doesn't know how to treat `char *`
> >> kfunc args and treats them as scalars (which are incompatible with other
> >> pointers). Here, changing also the kfunc args to ints would probably
> >> help, although I think that it would not be ergonomic at all. So, we
> >> still need some verifier work to handle `char *` args to kfuncs.
> >
> > Ok, so that's probably the missing piece. We need to teach verifiers
> > to allow such untrusted pointers. I thought that was the whole idea
> > behind adding these APIs: to allow such usage.
>
> Yes, I'll look into that. Do you want to merge everything together or
> should I post the updated (likely int-based) kfuncs in the meantime as
> they are useful for trusted pointers as well?

I'd switch all APIs to integers. And to allow passing any untrusted
pointer, I believe we already have __ign suffix for kfunc argument. So
please add __ign where appropriate, and let's write a few more tests
validating that all this works?

bpf_mem_cast() is probably a separate feature, so I wouldn't block on that.

>
> Viktor
>
> >>
> >>> Tbh, when dealing with libc string APIs, I still very frequently
> >>> convert resulting pointers into indices, so I don't think it's
> >>> actually an API regression to have index-based string APIs
> >>
> >> I agree here. In addition, it looks to me that returning pointers would
> >> require extra verifier work while returning indices would not. And we
> >> still need to apply extra bounds checks or access helpers in a majority
> >> of use-cases so we don't loose much ergonomics with integer-based APIs.
> >>
> >
> > +1
> >
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations
  2025-05-09 16:39     ` Alexei Starovoitov
@ 2025-05-28  9:05       ` Viktor Malik
  0 siblings, 0 replies; 25+ messages in thread
From: Viktor Malik @ 2025-05-28  9:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Matt Bobrowski

On 5/9/25 18:39, Alexei Starovoitov wrote:
> On Thu, May 8, 2025 at 2:42 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
>>
>> On Wed, May 07, 2025 at 08:40:38AM +0200, Viktor Malik wrote:
>>> String operations are commonly used so this exposes the most common ones
>>> to BPF programs. For now, we limit ourselves to operations which do not
>>> copy memory around.
>>>
>>> Unfortunately, most in-kernel implementations assume that strings are
>>> %NUL-terminated, which is not necessarily true, and therefore we cannot
>>> use them directly in the BPF context. Instead, we open-code them using
>>> __get_kernel_nofault instead of plain dereference to make them safe and
>>> limit the strings length to XATTR_SIZE_MAX to make sure the functions
>>> terminate. When __get_kernel_nofault fails, functions return -EFAULT.
>>> Similarly, when the size bound is reached, the functions return -E2BIG.
>>
>> Curious, why was XATTR_SIZE_MAX chosen as the upper bounds here? Just
>> an arbitrary value that felt large enough?
>>
>>> At the moment, strings can be passed to the kfuncs in three forms:
>>> - string literals (i.e. pointers to read-only maps)
>>> - global variables (i.e. pointers to read-write maps)
>>> - stack-allocated buffers
>>>
>>> Note that currently, it is not possible to pass strings from the BPF
>>> program context (like function args) as the verifier doesn't treat them
>>> as neither PTR_TO_MEM nor PTR_TO_BTF_ID.
>>>
>>> Suggested-by: Alexei Starovoitov <ast@kernel.org>
>>> Signed-off-by: Viktor Malik <vmalik@redhat.com>
>>> ---
>>>  kernel/bpf/helpers.c | 440 +++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 440 insertions(+)
>>>
>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>> index e3a2662f4e33..8fb7c2ca7ac0 100644
>>> --- a/kernel/bpf/helpers.c
>>> +++ b/kernel/bpf/helpers.c
>>> @@ -23,6 +23,7 @@
>>>  #include <linux/btf_ids.h>
>>>  #include <linux/bpf_mem_alloc.h>
>>>  #include <linux/kasan.h>
>>> +#include <linux/uaccess.h>
>>>
>>>  #include "../../lib/kstrtox.h"
>>>
>>> @@ -3194,6 +3195,433 @@ __bpf_kfunc void bpf_local_irq_restore(unsigned long *flags__irq_flag)
>>>       local_irq_restore(*flags__irq_flag);
>>>  }
>>>
>>> +/* Kfuncs for string operations.
>>> + *
>>> + * Since strings are not necessarily %NUL-terminated, we cannot directly call
>>> + * in-kernel implementations. Instead, we open-code the implementations using
>>> + * __get_kernel_nofault instead of plain dereference to make them safe.
>>> + */
>>
>> Returning an -EFAULT error code for supplied arguments which are
>> deemed to be invalid is just a very weird semantic IMO. As a BPF
>> program author, I totally wouldn't expect a BPF kfunc to return
>> -EINVAL if I had supplied it something that it couldn't quite possibly
>> handle to begin with. Look at the prior art, being pre-existing BPF
>> kfuncs, and you'll see how they handle invalidly supplied arguments. I
>> totally understand that attempting to dereference a NULL-pointer would
>> ultimately result in a fault being raised, so it may feel rather
>> natural to also report an -EFAULT error code upon doing some
>> NULL-pointer checks that hold true, but from an API usability POV it
>> just seems awkward and wrong.
> 
> I mostly agree with the above.
> 
> On the first look, all the checks like:
> +       if (!s || !accept)
> +               return ERR_PTR(-EFAULT);
> 
> looks like a premature optimization, since
> __get_kernel_nofault() will handle it just fine.
> 
> But there is a different reason to do it and the error code
> should probably be different.
> 
> Consider that copy_from_kernel_nofault() has the following check
> that is meaningful on several architectures including x86:
>         if (!copy_from_kernel_nofault_allowed(src, size))
>                 return -ERANGE;
> 
> It's there to avoid accidentally reading user addresses and
> NULL is one such user address.

Ah, that explains why I was getting pagefaults when passing NULL,
despite using `guard(pagefault)()`. Thanks for the context.

> 
> Doing it for every pointer inside the loop will hurt performance,
> but doing it once in the beginning maybe ok?
> If we want to optimize it we can introduce the helper:
> 
> static bool copy_maybe_allowed(const void *ptr)
> {
> #ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
>   if ((unsigned long)ptr < TASK_SIZE)
>      return false;
> #else
>   if (!ptr)
>      return false;
> #endif
>   return true;
> }
> 
> and modify above as:
>   if (!copy_maybe_allowed(s) || !copy_maybe_allowed(accept))
>     return ERR_PTR(-ERANGE);
> 
> bikeshed: shorter/better name for helper..

Is there any reason why `copy_from_kernel_nofault_allowed(ptr, 1)` will
not do here?

Thanks!
Viktor


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-05-28  9:06 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-07  6:40 [PATCH bpf-next v4 0/4] bpf: Add kfuncs for read-only string operations Viktor Malik
2025-05-07  6:40 ` [PATCH bpf-next v4 1/4] bpf: Teach vefier to handle const ptrs as args to kfuncs Viktor Malik
2025-05-08  9:08   ` Matt Bobrowski
2025-05-09 16:20     ` Alexei Starovoitov
2025-05-13  6:48       ` Matt Bobrowski
2025-05-13  7:54         ` Viktor Malik
2025-05-13 14:58           ` Alexei Starovoitov
2025-05-13  7:58     ` Viktor Malik
2025-05-07  6:40 ` [PATCH bpf-next v4 2/4] uaccess: Define pagefault lock guard Viktor Malik
2025-05-08 10:00   ` Matt Bobrowski
2025-05-09 18:20     ` Andrii Nakryiko
2025-05-13  6:55       ` Matt Bobrowski
2025-05-07  6:40 ` [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations Viktor Malik
2025-05-08  9:41   ` Matt Bobrowski
2025-05-09 16:39     ` Alexei Starovoitov
2025-05-28  9:05       ` Viktor Malik
2025-05-15 12:32     ` Viktor Malik
2025-05-09 18:20   ` Andrii Nakryiko
2025-05-09 21:37     ` Eduard Zingerman
2025-05-09 22:03       ` Andrii Nakryiko
2025-05-15 12:27         ` Viktor Malik
2025-05-15 17:17           ` Andrii Nakryiko
2025-05-16  5:59             ` Viktor Malik
2025-05-16 15:50               ` Andrii Nakryiko
2025-05-07  6:40 ` [PATCH bpf-next v4 4/4] selftests/bpf: Add tests for string kfuncs Viktor Malik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).