* [PATCH bpf-next v6 1/5] bpf: Introduce task_file open-coded iterator kfuncs
2024-12-17 23:34 [PATCH bpf-next v6 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
@ 2024-12-17 23:37 ` Juntong Deng
2024-12-19 16:19 ` Yonghong Song
2024-12-17 23:37 ` [PATCH bpf-next v6 2/5] selftests/bpf: Add tests for open-coded style process file iterator Juntong Deng
` (4 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Juntong Deng @ 2024-12-17 23:37 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
brauner
Cc: bpf, linux-kernel, linux-fsdevel
This patch adds the open-coded iterator style process file iterator
kfuncs bpf_iter_task_file_{new,next,destroy} that iterates over all
files opened by the specified process.
bpf_iter_task_file_next returns a pointer to bpf_iter_task_file_item,
which currently contains *task, *file, fd. This is an extensible
structure that enables compatibility with different versions
through CO-RE.
The reference to struct file acquired by the previous
bpf_iter_task_file_next() is released in the next
bpf_iter_task_file_next(), and the last reference is released in the
last bpf_iter_task_file_next() that returns NULL.
In the bpf_iter_task_file_destroy(), if the iterator does not iterate to
the end, then the last struct file reference is released at this time.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
---
kernel/bpf/helpers.c | 3 ++
kernel/bpf/task_iter.c | 91 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 94 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index cd5f9884d85b..61a652bea0ba 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -3147,6 +3147,9 @@ BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_task_file_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_task_file_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_task_file_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_dynptr_adjust)
BTF_ID_FLAGS(func, bpf_dynptr_is_null)
BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 98d9b4c0daff..149a95762f68 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -1027,6 +1027,97 @@ __bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it)
{
}
+struct bpf_iter_task_file_item {
+ struct task_struct *task;
+ struct file *file;
+ unsigned int fd;
+} __aligned(8);
+
+struct bpf_iter_task_file {
+ __u64 __opaque[4];
+} __aligned(8);
+
+struct bpf_iter_task_file_kern {
+ struct bpf_iter_task_file_item item;
+ unsigned int next_fd;
+} __aligned(8);
+
+/**
+ * bpf_iter_task_file_new() - Initialize a new task file iterator for a task,
+ * used to iterate over all files opened by a specified task
+ *
+ * @it: the new bpf_iter_task_file to be created
+ * @task: a pointer pointing to a task to be iterated over
+ */
+__bpf_kfunc int bpf_iter_task_file_new(struct bpf_iter_task_file *it, struct task_struct *task)
+{
+ struct bpf_iter_task_file_kern *kit = (void *)it;
+ struct bpf_iter_task_file_item *item = &kit->item;
+
+ BUILD_BUG_ON(sizeof(struct bpf_iter_task_file_kern) > sizeof(struct bpf_iter_task_file));
+ BUILD_BUG_ON(__alignof__(struct bpf_iter_task_file_kern) !=
+ __alignof__(struct bpf_iter_task_file));
+
+ item->task = get_task_struct(task);
+ item->file = NULL;
+ item->fd = 0;
+ kit->next_fd = 0;
+
+ return 0;
+}
+
+/**
+ * bpf_iter_task_file_next() - Get the next file in bpf_iter_task_file
+ *
+ * bpf_iter_task_file_next acquires a reference to the struct file.
+ *
+ * The reference to struct file acquired by the previous
+ * bpf_iter_task_file_next() is released in the next bpf_iter_task_file_next(),
+ * and the last reference is released in the last bpf_iter_task_file_next()
+ * that returns NULL.
+ *
+ * @it: the bpf_iter_task_file to be checked
+ *
+ * @returns a pointer to bpf_iter_task_file_item
+ */
+__bpf_kfunc struct bpf_iter_task_file_item *bpf_iter_task_file_next(struct bpf_iter_task_file *it)
+{
+ struct bpf_iter_task_file_kern *kit = (void *)it;
+ struct bpf_iter_task_file_item *item = &kit->item;
+
+ if (item->file)
+ fput(item->file);
+
+ item->file = fget_task_next(item->task, &kit->next_fd);
+ item->fd = kit->next_fd;
+
+ kit->next_fd++;
+
+ if (!item->file)
+ return NULL;
+
+ return item;
+}
+
+/**
+ * bpf_iter_task_file_destroy() - Destroy a bpf_iter_task_file
+ *
+ * If the iterator does not iterate to the end, then the last
+ * struct file reference is released at this time.
+ *
+ * @it: the bpf_iter_task_file to be destroyed
+ */
+__bpf_kfunc void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it)
+{
+ struct bpf_iter_task_file_kern *kit = (void *)it;
+ struct bpf_iter_task_file_item *item = &kit->item;
+
+ if (item->file)
+ fput(item->file);
+
+ put_task_struct(item->task);
+}
+
__bpf_kfunc_end_defs();
DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH bpf-next v6 1/5] bpf: Introduce task_file open-coded iterator kfuncs
2024-12-17 23:37 ` [PATCH bpf-next v6 1/5] bpf: Introduce task_file open-coded iterator kfuncs Juntong Deng
@ 2024-12-19 16:19 ` Yonghong Song
0 siblings, 0 replies; 21+ messages in thread
From: Yonghong Song @ 2024-12-19 16:19 UTC (permalink / raw)
To: Juntong Deng, ast, daniel, john.fastabend, andrii, martin.lau,
eddyz87, song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
brauner
Cc: bpf, linux-kernel, linux-fsdevel
On 12/17/24 3:37 PM, Juntong Deng wrote:
> This patch adds the open-coded iterator style process file iterator
> kfuncs bpf_iter_task_file_{new,next,destroy} that iterates over all
> files opened by the specified process.
>
> bpf_iter_task_file_next returns a pointer to bpf_iter_task_file_item,
> which currently contains *task, *file, fd. This is an extensible
> structure that enables compatibility with different versions
> through CO-RE.
>
> The reference to struct file acquired by the previous
> bpf_iter_task_file_next() is released in the next
> bpf_iter_task_file_next(), and the last reference is released in the
> last bpf_iter_task_file_next() that returns NULL.
>
> In the bpf_iter_task_file_destroy(), if the iterator does not iterate to
> the end, then the last struct file reference is released at this time.
>
> Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
> ---
> kernel/bpf/helpers.c | 3 ++
> kernel/bpf/task_iter.c | 91 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 94 insertions(+)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index cd5f9884d85b..61a652bea0ba 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -3147,6 +3147,9 @@ BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
> BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
> BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_task_file_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_iter_task_file_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_task_file_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 98d9b4c0daff..149a95762f68 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -1027,6 +1027,97 @@ __bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it)
> {
> }
>
> +struct bpf_iter_task_file_item {
> + struct task_struct *task;
> + struct file *file;
> + unsigned int fd;
> +} __aligned(8);
We probably do not __aligned(8) here as alignment has been
guaranteed in struct bpf_iter_task_file_kern.
> +
> +struct bpf_iter_task_file {
> + __u64 __opaque[4];
> +} __aligned(8);
> +
> +struct bpf_iter_task_file_kern {
> + struct bpf_iter_task_file_item item;
> + unsigned int next_fd;
> +} __aligned(8);
> +
> +/**
> + * bpf_iter_task_file_new() - Initialize a new task file iterator for a task,
> + * used to iterate over all files opened by a specified task
> + *
> + * @it: the new bpf_iter_task_file to be created
> + * @task: a pointer pointing to a task to be iterated over
> + */
> +__bpf_kfunc int bpf_iter_task_file_new(struct bpf_iter_task_file *it, struct task_struct *task)
> +{
> + struct bpf_iter_task_file_kern *kit = (void *)it;
> + struct bpf_iter_task_file_item *item = &kit->item;
> +
> + BUILD_BUG_ON(sizeof(struct bpf_iter_task_file_kern) > sizeof(struct bpf_iter_task_file));
> + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_file_kern) !=
> + __alignof__(struct bpf_iter_task_file));
> +
> + item->task = get_task_struct(task);
> + item->file = NULL;
> + item->fd = 0;
> + kit->next_fd = 0;
> +
> + return 0;
> +}
> +
> +/**
> + * bpf_iter_task_file_next() - Get the next file in bpf_iter_task_file
> + *
> + * bpf_iter_task_file_next acquires a reference to the struct file.
> + *
> + * The reference to struct file acquired by the previous
> + * bpf_iter_task_file_next() is released in the next bpf_iter_task_file_next(),
> + * and the last reference is released in the last bpf_iter_task_file_next()
> + * that returns NULL.
> + *
> + * @it: the bpf_iter_task_file to be checked
> + *
> + * @returns a pointer to bpf_iter_task_file_item
> + */
> +__bpf_kfunc struct bpf_iter_task_file_item *bpf_iter_task_file_next(struct bpf_iter_task_file *it)
> +{
> + struct bpf_iter_task_file_kern *kit = (void *)it;
> + struct bpf_iter_task_file_item *item = &kit->item;
> +
> + if (item->file)
> + fput(item->file);
> +
> + item->file = fget_task_next(item->task, &kit->next_fd);
> + item->fd = kit->next_fd;
> +
> + kit->next_fd++;
> +
> + if (!item->file)
> + return NULL;
Maybe move the above if statement right after
iterm->file = fget_task_next(item->task, &kit->next_fd);
to make code more coherent?
> +
> + return item;
> +}
> +
> +/**
> + * bpf_iter_task_file_destroy() - Destroy a bpf_iter_task_file
> + *
> + * If the iterator does not iterate to the end, then the last
> + * struct file reference is released at this time.
> + *
> + * @it: the bpf_iter_task_file to be destroyed
> + */
> +__bpf_kfunc void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it)
> +{
> + struct bpf_iter_task_file_kern *kit = (void *)it;
> + struct bpf_iter_task_file_item *item = &kit->item;
> +
> + if (item->file)
> + fput(item->file);
> +
> + put_task_struct(item->task);
> +}
> +
> __bpf_kfunc_end_defs();
>
> DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH bpf-next v6 2/5] selftests/bpf: Add tests for open-coded style process file iterator
2024-12-17 23:34 [PATCH bpf-next v6 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
2024-12-17 23:37 ` [PATCH bpf-next v6 1/5] bpf: Introduce task_file open-coded iterator kfuncs Juntong Deng
@ 2024-12-17 23:37 ` Juntong Deng
2024-12-19 16:35 ` Yonghong Song
2024-12-17 23:37 ` [PATCH bpf-next v6 3/5] bpf: Add bpf_fget_task() kfunc Juntong Deng
` (3 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Juntong Deng @ 2024-12-17 23:37 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
brauner
Cc: bpf, linux-kernel, linux-fsdevel
This patch adds test cases for open-coded style process file iterator.
Test cases related to process files are run in the newly created child
process. Close all opened files inherited from the parent process in
the child process to avoid the files opened by the parent process
affecting the test results.
In addition, this patch adds failure test cases where bpf programs
cannot pass the verifier due to uninitialized or untrusted
arguments, etc.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
---
.../testing/selftests/bpf/bpf_experimental.h | 7 ++
.../testing/selftests/bpf/prog_tests/iters.c | 79 ++++++++++++++++
.../selftests/bpf/progs/iters_task_file.c | 86 ++++++++++++++++++
.../bpf/progs/iters_task_file_failure.c | 91 +++++++++++++++++++
4 files changed, 263 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c
create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file_failure.c
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index cd8ecd39c3f3..ce1520c56b55 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -588,4 +588,11 @@ extern int bpf_iter_kmem_cache_new(struct bpf_iter_kmem_cache *it) __weak __ksym
extern struct kmem_cache *bpf_iter_kmem_cache_next(struct bpf_iter_kmem_cache *it) __weak __ksym;
extern void bpf_iter_kmem_cache_destroy(struct bpf_iter_kmem_cache *it) __weak __ksym;
+struct bpf_iter_task_file;
+struct bpf_iter_task_file_item;
+extern int bpf_iter_task_file_new(struct bpf_iter_task_file *it, struct task_struct *task) __ksym;
+extern struct bpf_iter_task_file_item *
+bpf_iter_task_file_next(struct bpf_iter_task_file *it) __ksym;
+extern void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it) __ksym;
+
#endif
diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
index 3cea71f9c500..cfe5b56cc027 100644
--- a/tools/testing/selftests/bpf/prog_tests/iters.c
+++ b/tools/testing/selftests/bpf/prog_tests/iters.c
@@ -1,6 +1,8 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+#define _GNU_SOURCE
+#include <sys/socket.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sys/wait.h>
@@ -16,11 +18,13 @@
#include "iters_num.skel.h"
#include "iters_testmod.skel.h"
#include "iters_testmod_seq.skel.h"
+#include "iters_task_file.skel.h"
#include "iters_task_vma.skel.h"
#include "iters_task.skel.h"
#include "iters_css_task.skel.h"
#include "iters_css.skel.h"
#include "iters_task_failure.skel.h"
+#include "iters_task_file_failure.skel.h"
static void subtest_num_iters(void)
{
@@ -291,6 +295,78 @@ static void subtest_css_iters(void)
iters_css__destroy(skel);
}
+static int task_file_test_process(void *args)
+{
+ int pipefd[2], sockfd, err = 0;
+
+ /* Create a clean file descriptor table for the test process */
+ close_range(0, ~0U, 0);
+
+ if (pipe(pipefd) < 0)
+ return 1;
+
+ sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
+ if (sockfd < 0) {
+ err = 2;
+ goto cleanup_pipe;
+ }
+
+ usleep(1);
+
+ close(sockfd);
+cleanup_pipe:
+ close(pipefd[0]);
+ close(pipefd[1]);
+ return err;
+}
+
+static void subtest_task_file_iters(void)
+{
+ const int stack_size = 1024 * 1024;
+ struct iters_task_file *skel;
+ int child_pid, wstatus, err;
+ char *stack;
+
+ skel = iters_task_file__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "open_and_load"))
+ return;
+
+ if (!ASSERT_OK(skel->bss->err, "pre_test_err"))
+ goto cleanup_skel;
+
+ skel->bss->parent_pid = getpid();
+ skel->bss->count = 0;
+
+ err = iters_task_file__attach(skel);
+ if (!ASSERT_OK(err, "skel_attach"))
+ goto cleanup_skel;
+
+ stack = (char *)malloc(stack_size);
+ if (!ASSERT_OK_PTR(stack, "clone_stack"))
+ goto cleanup_attach;
+
+ /* Note that there is no CLONE_FILES */
+ child_pid = clone(task_file_test_process, stack + stack_size, CLONE_VM | SIGCHLD, NULL);
+ if (!ASSERT_GT(child_pid, -1, "child_pid"))
+ goto cleanup_stack;
+
+ if (!ASSERT_GT(waitpid(child_pid, &wstatus, 0), -1, "waitpid"))
+ goto cleanup_stack;
+
+ if (!ASSERT_OK(WEXITSTATUS(wstatus), "run_task_file_iters_test_err"))
+ goto cleanup_stack;
+
+ ASSERT_EQ(skel->bss->count, 1, "run_task_file_iters_test_count_err");
+ ASSERT_OK(skel->bss->err, "run_task_file_iters_test_failure");
+
+cleanup_stack:
+ free(stack);
+cleanup_attach:
+ iters_task_file__detach(skel);
+cleanup_skel:
+ iters_task_file__destroy(skel);
+}
+
void test_iters(void)
{
RUN_TESTS(iters_state_safety);
@@ -315,5 +391,8 @@ void test_iters(void)
subtest_css_task_iters();
if (test__start_subtest("css"))
subtest_css_iters();
+ if (test__start_subtest("task_file"))
+ subtest_task_file_iters();
RUN_TESTS(iters_task_failure);
+ RUN_TESTS(iters_task_file_failure);
}
diff --git a/tools/testing/selftests/bpf/progs/iters_task_file.c b/tools/testing/selftests/bpf/progs/iters_task_file.c
new file mode 100644
index 000000000000..47941530e51b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task_file.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+int err, parent_pid, count;
+
+extern const void pipefifo_fops __ksym;
+extern const void socket_file_ops __ksym;
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int test_bpf_iter_task_file(void *ctx)
+{
+ struct bpf_iter_task_file task_file_it;
+ struct bpf_iter_task_file_item *item;
+ struct task_struct *task;
+
+ task = bpf_get_current_task_btf();
+ if (task->parent->pid != parent_pid)
+ return 0;
+
+ count++;
+
+ bpf_iter_task_file_new(&task_file_it, task);
+
+ item = bpf_iter_task_file_next(&task_file_it);
+ if (item == NULL) {
+ err = 1;
+ goto cleanup;
+ }
+
+ if (item->fd != 0) {
+ err = 2;
+ goto cleanup;
+ }
+
+ if (item->file->f_op != &pipefifo_fops) {
+ err = 3;
+ goto cleanup;
+ }
+
+ item = bpf_iter_task_file_next(&task_file_it);
+ if (item == NULL) {
+ err = 4;
+ goto cleanup;
+ }
+
+ if (item->fd != 1) {
+ err = 5;
+ goto cleanup;
+ }
+
+ if (item->file->f_op != &pipefifo_fops) {
+ err = 6;
+ goto cleanup;
+ }
+
+ item = bpf_iter_task_file_next(&task_file_it);
+ if (item == NULL) {
+ err = 7;
+ goto cleanup;
+ }
+
+ if (item->fd != 2) {
+ err = 8;
+ goto cleanup;
+ }
+
+ if (item->file->f_op != &socket_file_ops) {
+ err = 9;
+ goto cleanup;
+ }
+
+ item = bpf_iter_task_file_next(&task_file_it);
+ if (item != NULL)
+ err = 10;
+cleanup:
+ bpf_iter_task_file_destroy(&task_file_it);
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/iters_task_file_failure.c b/tools/testing/selftests/bpf/progs/iters_task_file_failure.c
new file mode 100644
index 000000000000..66e522583850
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task_file_failure.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+SEC("syscall")
+__failure __msg("expected uninitialized iter_task_file as arg #1")
+int bpf_iter_task_file_new_inited_iter(void *ctx)
+{
+ struct bpf_iter_task_file task_file_it;
+ struct task_struct *task;
+
+ task = bpf_get_current_task_btf();
+
+ bpf_iter_task_file_new(&task_file_it, task);
+
+ bpf_iter_task_file_new(&task_file_it, task);
+
+ bpf_iter_task_file_destroy(&task_file_it);
+ return 0;
+}
+
+SEC("syscall")
+__failure __msg("Possibly NULL pointer passed to trusted arg1")
+int bpf_iter_task_file_new_null_task(void *ctx)
+{
+ struct bpf_iter_task_file task_file_it;
+ struct task_struct *task = NULL;
+
+ bpf_iter_task_file_new(&task_file_it, task);
+
+ bpf_iter_task_file_destroy(&task_file_it);
+ return 0;
+}
+
+SEC("syscall")
+__failure __msg("R2 must be referenced or trusted")
+int bpf_iter_task_file_new_untrusted_task(void *ctx)
+{
+ struct bpf_iter_task_file task_file_it;
+ struct task_struct *task;
+
+ task = bpf_get_current_task_btf()->parent;
+
+ bpf_iter_task_file_new(&task_file_it, task);
+
+ bpf_iter_task_file_destroy(&task_file_it);
+ return 0;
+}
+
+SEC("syscall")
+__failure __msg("Unreleased reference")
+int bpf_iter_task_file_no_destory(void *ctx)
+{
+ struct bpf_iter_task_file task_file_it;
+ struct task_struct *task;
+
+ task = bpf_get_current_task_btf();
+
+ bpf_iter_task_file_new(&task_file_it, task);
+
+ return 0;
+}
+
+SEC("syscall")
+__failure __msg("expected an initialized iter_task_file as arg #1")
+int bpf_iter_task_file_next_uninit_iter(void *ctx)
+{
+ struct bpf_iter_task_file task_file_it;
+
+ bpf_iter_task_file_next(&task_file_it);
+
+ return 0;
+}
+
+SEC("syscall")
+__failure __msg("expected an initialized iter_task_file as arg #1")
+int bpf_iter_task_file_destroy_uninit_iter(void *ctx)
+{
+ struct bpf_iter_task_file task_file_it;
+
+ bpf_iter_task_file_destroy(&task_file_it);
+
+ return 0;
+}
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH bpf-next v6 2/5] selftests/bpf: Add tests for open-coded style process file iterator
2024-12-17 23:37 ` [PATCH bpf-next v6 2/5] selftests/bpf: Add tests for open-coded style process file iterator Juntong Deng
@ 2024-12-19 16:35 ` Yonghong Song
2024-12-24 0:43 ` Juntong Deng
0 siblings, 1 reply; 21+ messages in thread
From: Yonghong Song @ 2024-12-19 16:35 UTC (permalink / raw)
To: Juntong Deng, ast, daniel, john.fastabend, andrii, martin.lau,
eddyz87, song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
brauner
Cc: bpf, linux-kernel, linux-fsdevel
On 12/17/24 3:37 PM, Juntong Deng wrote:
> This patch adds test cases for open-coded style process file iterator.
>
> Test cases related to process files are run in the newly created child
> process. Close all opened files inherited from the parent process in
> the child process to avoid the files opened by the parent process
> affecting the test results.
>
> In addition, this patch adds failure test cases where bpf programs
> cannot pass the verifier due to uninitialized or untrusted
> arguments, etc.
>
> Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
> ---
> .../testing/selftests/bpf/bpf_experimental.h | 7 ++
> .../testing/selftests/bpf/prog_tests/iters.c | 79 ++++++++++++++++
> .../selftests/bpf/progs/iters_task_file.c | 86 ++++++++++++++++++
> .../bpf/progs/iters_task_file_failure.c | 91 +++++++++++++++++++
> 4 files changed, 263 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c
> create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file_failure.c
>
> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> index cd8ecd39c3f3..ce1520c56b55 100644
> --- a/tools/testing/selftests/bpf/bpf_experimental.h
> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> @@ -588,4 +588,11 @@ extern int bpf_iter_kmem_cache_new(struct bpf_iter_kmem_cache *it) __weak __ksym
> extern struct kmem_cache *bpf_iter_kmem_cache_next(struct bpf_iter_kmem_cache *it) __weak __ksym;
> extern void bpf_iter_kmem_cache_destroy(struct bpf_iter_kmem_cache *it) __weak __ksym;
>
> +struct bpf_iter_task_file;
> +struct bpf_iter_task_file_item;
> +extern int bpf_iter_task_file_new(struct bpf_iter_task_file *it, struct task_struct *task) __ksym;
> +extern struct bpf_iter_task_file_item *
> +bpf_iter_task_file_next(struct bpf_iter_task_file *it) __ksym;
> +extern void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it) __ksym;
All the above declarations should be in vmlinux.h already and I see your below bpf prog already
included vmlinux.h, there is no need to put them here.
> +
> #endif
> diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
> index 3cea71f9c500..cfe5b56cc027 100644
> --- a/tools/testing/selftests/bpf/prog_tests/iters.c
> +++ b/tools/testing/selftests/bpf/prog_tests/iters.c
> @@ -1,6 +1,8 @@
> // SPDX-License-Identifier: GPL-2.0
> /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
>
> +#define _GNU_SOURCE
> +#include <sys/socket.h>
> #include <sys/syscall.h>
> #include <sys/mman.h>
> #include <sys/wait.h>
> @@ -16,11 +18,13 @@
> #include "iters_num.skel.h"
> #include "iters_testmod.skel.h"
> #include "iters_testmod_seq.skel.h"
> +#include "iters_task_file.skel.h"
> #include "iters_task_vma.skel.h"
> #include "iters_task.skel.h"
> #include "iters_css_task.skel.h"
> #include "iters_css.skel.h"
> #include "iters_task_failure.skel.h"
> +#include "iters_task_file_failure.skel.h"
>
> static void subtest_num_iters(void)
> {
> @@ -291,6 +295,78 @@ static void subtest_css_iters(void)
> iters_css__destroy(skel);
> }
>
> +static int task_file_test_process(void *args)
> +{
> + int pipefd[2], sockfd, err = 0;
> +
> + /* Create a clean file descriptor table for the test process */
> + close_range(0, ~0U, 0);
> +
> + if (pipe(pipefd) < 0)
> + return 1;
> +
> + sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
> + if (sockfd < 0) {
> + err = 2;
> + goto cleanup_pipe;
> + }
> +
> + usleep(1);
> +
> + close(sockfd);
> +cleanup_pipe:
> + close(pipefd[0]);
> + close(pipefd[1]);
> + return err;
> +}
> +
> +static void subtest_task_file_iters(void)
> +{
> + const int stack_size = 1024 * 1024;
> + struct iters_task_file *skel;
> + int child_pid, wstatus, err;
> + char *stack;
> +
> + skel = iters_task_file__open_and_load();
> + if (!ASSERT_OK_PTR(skel, "open_and_load"))
> + return;
> +
> + if (!ASSERT_OK(skel->bss->err, "pre_test_err"))
> + goto cleanup_skel;
> +
> + skel->bss->parent_pid = getpid();
> + skel->bss->count = 0;
> +
> + err = iters_task_file__attach(skel);
> + if (!ASSERT_OK(err, "skel_attach"))
> + goto cleanup_skel;
> +
> + stack = (char *)malloc(stack_size);
> + if (!ASSERT_OK_PTR(stack, "clone_stack"))
> + goto cleanup_attach;
> +
> + /* Note that there is no CLONE_FILES */
> + child_pid = clone(task_file_test_process, stack + stack_size, CLONE_VM | SIGCHLD, NULL);
> + if (!ASSERT_GT(child_pid, -1, "child_pid"))
> + goto cleanup_stack;
> +
> + if (!ASSERT_GT(waitpid(child_pid, &wstatus, 0), -1, "waitpid"))
> + goto cleanup_stack;
> +
> + if (!ASSERT_OK(WEXITSTATUS(wstatus), "run_task_file_iters_test_err"))
> + goto cleanup_stack;
> +
> + ASSERT_EQ(skel->bss->count, 1, "run_task_file_iters_test_count_err");
> + ASSERT_OK(skel->bss->err, "run_task_file_iters_test_failure");
> +
> +cleanup_stack:
> + free(stack);
> +cleanup_attach:
> + iters_task_file__detach(skel);
> +cleanup_skel:
> + iters_task_file__destroy(skel);
> +}
> +
> void test_iters(void)
> {
> RUN_TESTS(iters_state_safety);
> @@ -315,5 +391,8 @@ void test_iters(void)
> subtest_css_task_iters();
> if (test__start_subtest("css"))
> subtest_css_iters();
> + if (test__start_subtest("task_file"))
> + subtest_task_file_iters();
> RUN_TESTS(iters_task_failure);
> + RUN_TESTS(iters_task_file_failure);
> }
> diff --git a/tools/testing/selftests/bpf/progs/iters_task_file.c b/tools/testing/selftests/bpf/progs/iters_task_file.c
> new file mode 100644
> index 000000000000..47941530e51b
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/iters_task_file.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include "vmlinux.h"
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_helpers.h>
> +#include "bpf_misc.h"
> +#include "bpf_experimental.h"
> +#include "task_kfunc_common.h"
> +
> +char _license[] SEC("license") = "GPL";
> +
> +int err, parent_pid, count;
> +
> +extern const void pipefifo_fops __ksym;
> +extern const void socket_file_ops __ksym;
There is no need to have 'const' in the above two extern declarations.
> +
> +SEC("fentry/" SYS_PREFIX "sys_nanosleep")
> +int test_bpf_iter_task_file(void *ctx)
> +{
> + struct bpf_iter_task_file task_file_it;
> + struct bpf_iter_task_file_item *item;
> + struct task_struct *task;
> +
> + task = bpf_get_current_task_btf();
> + if (task->parent->pid != parent_pid)
> + return 0;
> +
> + count++;
> +
> + bpf_iter_task_file_new(&task_file_it, task);
> +
> + item = bpf_iter_task_file_next(&task_file_it);
> + if (item == NULL) {
> + err = 1;
> + goto cleanup;
> + }
> +
> + if (item->fd != 0) {
> + err = 2;
> + goto cleanup;
> + }
> +
> + if (item->file->f_op != &pipefifo_fops) {
> + err = 3;
> + goto cleanup;
> + }
> +
> + item = bpf_iter_task_file_next(&task_file_it);
> + if (item == NULL) {
> + err = 4;
> + goto cleanup;
> + }
> +
> + if (item->fd != 1) {
> + err = 5;
> + goto cleanup;
> + }
> +
> + if (item->file->f_op != &pipefifo_fops) {
> + err = 6;
> + goto cleanup;
> + }
> +
> + item = bpf_iter_task_file_next(&task_file_it);
> + if (item == NULL) {
> + err = 7;
> + goto cleanup;
> + }
> +
> + if (item->fd != 2) {
> + err = 8;
> + goto cleanup;
> + }
> +
> + if (item->file->f_op != &socket_file_ops) {
> + err = 9;
> + goto cleanup;
> + }
> +
> + item = bpf_iter_task_file_next(&task_file_it);
> + if (item != NULL)
> + err = 10;
> +cleanup:
> + bpf_iter_task_file_destroy(&task_file_it);
> + return 0;
> +}
[...]
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH bpf-next v6 2/5] selftests/bpf: Add tests for open-coded style process file iterator
2024-12-19 16:35 ` Yonghong Song
@ 2024-12-24 0:43 ` Juntong Deng
0 siblings, 0 replies; 21+ messages in thread
From: Juntong Deng @ 2024-12-24 0:43 UTC (permalink / raw)
To: Yonghong Song, ast, daniel, john.fastabend, andrii, martin.lau,
eddyz87, song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
brauner
Cc: bpf, linux-kernel, linux-fsdevel
On 2024/12/19 16:35, Yonghong Song wrote:
>
>
>
> On 12/17/24 3:37 PM, Juntong Deng wrote:
>> This patch adds test cases for open-coded style process file iterator.
>>
>> Test cases related to process files are run in the newly created child
>> process. Close all opened files inherited from the parent process in
>> the child process to avoid the files opened by the parent process
>> affecting the test results.
>>
>> In addition, this patch adds failure test cases where bpf programs
>> cannot pass the verifier due to uninitialized or untrusted
>> arguments, etc.
>>
>> Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
>> ---
>> .../testing/selftests/bpf/bpf_experimental.h | 7 ++
>> .../testing/selftests/bpf/prog_tests/iters.c | 79 ++++++++++++++++
>> .../selftests/bpf/progs/iters_task_file.c | 86 ++++++++++++++++++
>> .../bpf/progs/iters_task_file_failure.c | 91 +++++++++++++++++++
>> 4 files changed, 263 insertions(+)
>> create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c
>> create mode 100644 tools/testing/selftests/bpf/progs/
>> iters_task_file_failure.c
>>
>> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/
>> testing/selftests/bpf/bpf_experimental.h
>> index cd8ecd39c3f3..ce1520c56b55 100644
>> --- a/tools/testing/selftests/bpf/bpf_experimental.h
>> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
>> @@ -588,4 +588,11 @@ extern int bpf_iter_kmem_cache_new(struct
>> bpf_iter_kmem_cache *it) __weak __ksym
>> extern struct kmem_cache *bpf_iter_kmem_cache_next(struct
>> bpf_iter_kmem_cache *it) __weak __ksym;
>> extern void bpf_iter_kmem_cache_destroy(struct bpf_iter_kmem_cache
>> *it) __weak __ksym;
>> +struct bpf_iter_task_file;
>> +struct bpf_iter_task_file_item;
>> +extern int bpf_iter_task_file_new(struct bpf_iter_task_file *it,
>> struct task_struct *task) __ksym;
>> +extern struct bpf_iter_task_file_item *
>> +bpf_iter_task_file_next(struct bpf_iter_task_file *it) __ksym;
>> +extern void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it)
>> __ksym;
>
> All the above declarations should be in vmlinux.h already and I see your
> below bpf prog already
> included vmlinux.h, there is no need to put them here.
>
>> +
>> #endif
>> diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/
>> testing/selftests/bpf/prog_tests/iters.c
>> index 3cea71f9c500..cfe5b56cc027 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/iters.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/iters.c
>> @@ -1,6 +1,8 @@
>> // SPDX-License-Identifier: GPL-2.0
>> /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
>> +#define _GNU_SOURCE
>> +#include <sys/socket.h>
>> #include <sys/syscall.h>
>> #include <sys/mman.h>
>> #include <sys/wait.h>
>> @@ -16,11 +18,13 @@
>> #include "iters_num.skel.h"
>> #include "iters_testmod.skel.h"
>> #include "iters_testmod_seq.skel.h"
>> +#include "iters_task_file.skel.h"
>> #include "iters_task_vma.skel.h"
>> #include "iters_task.skel.h"
>> #include "iters_css_task.skel.h"
>> #include "iters_css.skel.h"
>> #include "iters_task_failure.skel.h"
>> +#include "iters_task_file_failure.skel.h"
>> static void subtest_num_iters(void)
>> {
>> @@ -291,6 +295,78 @@ static void subtest_css_iters(void)
>> iters_css__destroy(skel);
>> }
>> +static int task_file_test_process(void *args)
>> +{
>> + int pipefd[2], sockfd, err = 0;
>> +
>> + /* Create a clean file descriptor table for the test process */
>> + close_range(0, ~0U, 0);
>> +
>> + if (pipe(pipefd) < 0)
>> + return 1;
>> +
>> + sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
>> + if (sockfd < 0) {
>> + err = 2;
>> + goto cleanup_pipe;
>> + }
>> +
>> + usleep(1);
>> +
>> + close(sockfd);
>> +cleanup_pipe:
>> + close(pipefd[0]);
>> + close(pipefd[1]);
>> + return err;
>> +}
>> +
>> +static void subtest_task_file_iters(void)
>> +{
>> + const int stack_size = 1024 * 1024;
>> + struct iters_task_file *skel;
>> + int child_pid, wstatus, err;
>> + char *stack;
>> +
>> + skel = iters_task_file__open_and_load();
>> + if (!ASSERT_OK_PTR(skel, "open_and_load"))
>> + return;
>> +
>> + if (!ASSERT_OK(skel->bss->err, "pre_test_err"))
>> + goto cleanup_skel;
>> +
>> + skel->bss->parent_pid = getpid();
>> + skel->bss->count = 0;
>> +
>> + err = iters_task_file__attach(skel);
>> + if (!ASSERT_OK(err, "skel_attach"))
>> + goto cleanup_skel;
>> +
>> + stack = (char *)malloc(stack_size);
>> + if (!ASSERT_OK_PTR(stack, "clone_stack"))
>> + goto cleanup_attach;
>> +
>> + /* Note that there is no CLONE_FILES */
>> + child_pid = clone(task_file_test_process, stack + stack_size,
>> CLONE_VM | SIGCHLD, NULL);
>> + if (!ASSERT_GT(child_pid, -1, "child_pid"))
>> + goto cleanup_stack;
>> +
>> + if (!ASSERT_GT(waitpid(child_pid, &wstatus, 0), -1, "waitpid"))
>> + goto cleanup_stack;
>> +
>> + if (!ASSERT_OK(WEXITSTATUS(wstatus),
>> "run_task_file_iters_test_err"))
>> + goto cleanup_stack;
>> +
>> + ASSERT_EQ(skel->bss->count, 1,
>> "run_task_file_iters_test_count_err");
>> + ASSERT_OK(skel->bss->err, "run_task_file_iters_test_failure");
>> +
>> +cleanup_stack:
>> + free(stack);
>> +cleanup_attach:
>> + iters_task_file__detach(skel);
>> +cleanup_skel:
>> + iters_task_file__destroy(skel);
>> +}
>> +
>> void test_iters(void)
>> {
>> RUN_TESTS(iters_state_safety);
>> @@ -315,5 +391,8 @@ void test_iters(void)
>> subtest_css_task_iters();
>> if (test__start_subtest("css"))
>> subtest_css_iters();
>> + if (test__start_subtest("task_file"))
>> + subtest_task_file_iters();
>> RUN_TESTS(iters_task_failure);
>> + RUN_TESTS(iters_task_file_failure);
>> }
>> diff --git a/tools/testing/selftests/bpf/progs/iters_task_file.c b/
>> tools/testing/selftests/bpf/progs/iters_task_file.c
>> new file mode 100644
>> index 000000000000..47941530e51b
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/progs/iters_task_file.c
>> @@ -0,0 +1,86 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#include "vmlinux.h"
>> +#include <bpf/bpf_tracing.h>
>> +#include <bpf/bpf_helpers.h>
>> +#include "bpf_misc.h"
>> +#include "bpf_experimental.h"
>> +#include "task_kfunc_common.h"
>> +
>> +char _license[] SEC("license") = "GPL";
>> +
>> +int err, parent_pid, count;
>> +
>> +extern const void pipefifo_fops __ksym;
>> +extern const void socket_file_ops __ksym;
>
> There is no need to have 'const' in the above two extern declarations.
>
Thanks for all your feedback.
I will fix all the problems you pointed out in the next version.
>> +
>> +SEC("fentry/" SYS_PREFIX "sys_nanosleep")
>> +int test_bpf_iter_task_file(void *ctx)
>> +{
>> + struct bpf_iter_task_file task_file_it;
>> + struct bpf_iter_task_file_item *item;
>> + struct task_struct *task;
>> +
>> + task = bpf_get_current_task_btf();
>> + if (task->parent->pid != parent_pid)
>> + return 0;
>> +
>> + count++;
>> +
>> + bpf_iter_task_file_new(&task_file_it, task);
>> +
>> + item = bpf_iter_task_file_next(&task_file_it);
>> + if (item == NULL) {
>> + err = 1;
>> + goto cleanup;
>> + }
>> +
>> + if (item->fd != 0) {
>> + err = 2;
>> + goto cleanup;
>> + }
>> +
>> + if (item->file->f_op != &pipefifo_fops) {
>> + err = 3;
>> + goto cleanup;
>> + }
>> +
>> + item = bpf_iter_task_file_next(&task_file_it);
>> + if (item == NULL) {
>> + err = 4;
>> + goto cleanup;
>> + }
>> +
>> + if (item->fd != 1) {
>> + err = 5;
>> + goto cleanup;
>> + }
>> +
>> + if (item->file->f_op != &pipefifo_fops) {
>> + err = 6;
>> + goto cleanup;
>> + }
>> +
>> + item = bpf_iter_task_file_next(&task_file_it);
>> + if (item == NULL) {
>> + err = 7;
>> + goto cleanup;
>> + }
>> +
>> + if (item->fd != 2) {
>> + err = 8;
>> + goto cleanup;
>> + }
>> +
>> + if (item->file->f_op != &socket_file_ops) {
>> + err = 9;
>> + goto cleanup;
>> + }
>> +
>> + item = bpf_iter_task_file_next(&task_file_it);
>> + if (item != NULL)
>> + err = 10;
>> +cleanup:
>> + bpf_iter_task_file_destroy(&task_file_it);
>> + return 0;
>> +}
>
> [...]
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH bpf-next v6 3/5] bpf: Add bpf_fget_task() kfunc
2024-12-17 23:34 [PATCH bpf-next v6 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
2024-12-17 23:37 ` [PATCH bpf-next v6 1/5] bpf: Introduce task_file open-coded iterator kfuncs Juntong Deng
2024-12-17 23:37 ` [PATCH bpf-next v6 2/5] selftests/bpf: Add tests for open-coded style process file iterator Juntong Deng
@ 2024-12-17 23:37 ` Juntong Deng
2024-12-17 23:37 ` [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type Juntong Deng
` (2 subsequent siblings)
5 siblings, 0 replies; 21+ messages in thread
From: Juntong Deng @ 2024-12-17 23:37 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
brauner
Cc: bpf, linux-kernel, linux-fsdevel
This patch adds bpf_fget_task() kfunc.
bpf_fget_task() is used to get a pointer to the struct file
corresponding to the task file descriptor. Note that this function
acquires a reference to struct file.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
---
fs/bpf_fs_kfuncs.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 3fe9f59ef867..4a810046dcf3 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -152,6 +152,23 @@ __bpf_kfunc int bpf_get_file_xattr(struct file *file, const char *name__str,
return bpf_get_dentry_xattr(dentry, name__str, value_p);
}
+/**
+ * bpf_fget_task() - Get a pointer to the struct file corresponding to
+ * the task file descriptor
+ *
+ * Note that this function acquires a reference to struct file.
+ *
+ * @task: the specified struct task_struct
+ * @fd: the file descriptor
+ *
+ * @returns the corresponding struct file pointer if found,
+ * otherwise returns NULL
+ */
+__bpf_kfunc struct file *bpf_fget_task(struct task_struct *task, unsigned int fd)
+{
+ return fget_task(task, fd);
+}
+
__bpf_kfunc_end_defs();
BTF_KFUNCS_START(bpf_fs_kfunc_set_ids)
@@ -161,6 +178,7 @@ BTF_ID_FLAGS(func, bpf_put_file, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_path_d_path, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_get_dentry_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_get_file_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_fget_task, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
BTF_KFUNCS_END(bpf_fs_kfunc_set_ids)
static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type
2024-12-17 23:34 [PATCH bpf-next v6 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
` (2 preceding siblings ...)
2024-12-17 23:37 ` [PATCH bpf-next v6 3/5] bpf: Add bpf_fget_task() kfunc Juntong Deng
@ 2024-12-17 23:37 ` Juntong Deng
2024-12-19 16:41 ` Alexei Starovoitov
2024-12-17 23:37 ` [PATCH bpf-next v6 5/5] selftests/bpf: Add tests for bpf_fget_task() kfunc Juntong Deng
2024-12-19 16:11 ` [PATCH bpf-next v6 0/5] bpf: Add open-coded style process file iterator and " Yonghong Song
5 siblings, 1 reply; 21+ messages in thread
From: Juntong Deng @ 2024-12-17 23:37 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
brauner
Cc: bpf, linux-kernel, linux-fsdevel
Currently fs kfuncs are only available for LSM program type, but fs
kfuncs are generic and useful for scenarios other than LSM.
This patch makes fs kfuncs available for SYSCALL program type.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
---
fs/bpf_fs_kfuncs.c | 20 ++++---------------
.../selftests/bpf/progs/verifier_vfs_reject.c | 10 ----------
2 files changed, 4 insertions(+), 26 deletions(-)
diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 4a810046dcf3..6010fccc9db8 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -26,8 +26,6 @@ __bpf_kfunc_start_defs();
* acquired by this BPF kfunc will result in the BPF program being rejected by
* the BPF verifier.
*
- * This BPF kfunc may only be called from BPF LSM programs.
- *
* Internally, this BPF kfunc leans on get_task_exe_file(), such that calling
* bpf_get_task_exe_file() would be analogous to calling get_task_exe_file()
* directly in kernel context.
@@ -49,8 +47,6 @@ __bpf_kfunc struct file *bpf_get_task_exe_file(struct task_struct *task)
* passed to this BPF kfunc. Attempting to pass an unreferenced file pointer, or
* any other arbitrary pointer for that matter, will result in the BPF program
* being rejected by the BPF verifier.
- *
- * This BPF kfunc may only be called from BPF LSM programs.
*/
__bpf_kfunc void bpf_put_file(struct file *file)
{
@@ -70,8 +66,6 @@ __bpf_kfunc void bpf_put_file(struct file *file)
* reference, or else the BPF program will be outright rejected by the BPF
* verifier.
*
- * This BPF kfunc may only be called from BPF LSM programs.
- *
* Return: A positive integer corresponding to the length of the resolved
* pathname in *buf*, including the NUL termination character. On error, a
* negative integer is returned.
@@ -181,23 +175,17 @@ BTF_ID_FLAGS(func, bpf_get_file_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_fget_task, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
BTF_KFUNCS_END(bpf_fs_kfunc_set_ids)
-static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
-{
- if (!btf_id_set8_contains(&bpf_fs_kfunc_set_ids, kfunc_id) ||
- prog->type == BPF_PROG_TYPE_LSM)
- return 0;
- return -EACCES;
-}
-
static const struct btf_kfunc_id_set bpf_fs_kfunc_set = {
.owner = THIS_MODULE,
.set = &bpf_fs_kfunc_set_ids,
- .filter = bpf_fs_kfuncs_filter,
};
static int __init bpf_fs_kfuncs_init(void)
{
- return register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM, &bpf_fs_kfunc_set);
+ int ret;
+
+ ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM, &bpf_fs_kfunc_set);
+ return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_fs_kfunc_set);
}
late_initcall(bpf_fs_kfuncs_init);
diff --git a/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c b/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
index d6d3f4fcb24c..5aab75fd2fa5 100644
--- a/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
+++ b/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
@@ -148,14 +148,4 @@ int BPF_PROG(path_d_path_kfunc_invalid_buf_sz, struct file *file)
return 0;
}
-SEC("fentry/vfs_open")
-__failure __msg("calling kernel function bpf_path_d_path is not allowed")
-int BPF_PROG(path_d_path_kfunc_non_lsm, struct path *path, struct file *f)
-{
- /* Calling bpf_path_d_path() from a non-LSM BPF program isn't permitted.
- */
- bpf_path_d_path(path, buf, sizeof(buf));
- return 0;
-}
-
char _license[] SEC("license") = "GPL";
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type
2024-12-17 23:37 ` [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type Juntong Deng
@ 2024-12-19 16:41 ` Alexei Starovoitov
2024-12-24 0:51 ` Juntong Deng
0 siblings, 1 reply; 21+ messages in thread
From: Alexei Starovoitov @ 2024-12-19 16:41 UTC (permalink / raw)
To: Juntong Deng
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Kumar Kartikeya Dwivedi, snorcht, Christian Brauner, bpf, LKML,
Linux-Fsdevel
On Tue, Dec 17, 2024 at 3:45 PM Juntong Deng <juntong.deng@outlook.com> wrote:
>
> -static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
> -{
> - if (!btf_id_set8_contains(&bpf_fs_kfunc_set_ids, kfunc_id) ||
> - prog->type == BPF_PROG_TYPE_LSM)
> - return 0;
> - return -EACCES;
> -}
> -
> static const struct btf_kfunc_id_set bpf_fs_kfunc_set = {
> .owner = THIS_MODULE,
> .set = &bpf_fs_kfunc_set_ids,
> - .filter = bpf_fs_kfuncs_filter,
> };
>
> static int __init bpf_fs_kfuncs_init(void)
> {
> - return register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM, &bpf_fs_kfunc_set);
> + int ret;
> +
> + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM, &bpf_fs_kfunc_set);
> + return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_fs_kfunc_set);
> }
>
> late_initcall(bpf_fs_kfuncs_init);
> diff --git a/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c b/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
> index d6d3f4fcb24c..5aab75fd2fa5 100644
> --- a/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
> +++ b/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
> @@ -148,14 +148,4 @@ int BPF_PROG(path_d_path_kfunc_invalid_buf_sz, struct file *file)
> return 0;
> }
>
> -SEC("fentry/vfs_open")
> -__failure __msg("calling kernel function bpf_path_d_path is not allowed")
This is incorrect.
You have to keep bpf_fs_kfuncs_filter() and prog->type == BPF_PROG_TYPE_LSM
check because bpf_prog_type_to_kfunc_hook() aliases LSM and fentry
into BTF_KFUNC_HOOK_TRACING category. It's been an annoying quirk.
We're figuring out details for significant refactoring of
register_btf_kfunc_id_set() and the whole registration process.
Maybe you would be interested in working on it?
The main goal is to get rid of run-time mask check in SCX_CALL_OP() and
make it static by the verifier. To make that happen scx_kf_mask flags
would need to become KF_* flags while each struct-ops callback will
specify the expected mask.
Then at struct-ops prog attach time the verifier will see the expected mask
and can check that all kfuncs calls of this particular program
satisfy the mask. Then all of the runtime overhead of
current->scx.kf_mask and scx_kf_allowed() will go away.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type
2024-12-19 16:41 ` Alexei Starovoitov
@ 2024-12-24 0:51 ` Juntong Deng
2024-12-24 12:10 ` Juntong Deng
2025-01-09 19:23 ` per st_ops kfunc allow/deny mask. Was: " Alexei Starovoitov
0 siblings, 2 replies; 21+ messages in thread
From: Juntong Deng @ 2024-12-24 0:51 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Kumar Kartikeya Dwivedi, snorcht, Christian Brauner, bpf, LKML,
Linux-Fsdevel
On 2024/12/19 16:41, Alexei Starovoitov wrote:
> On Tue, Dec 17, 2024 at 3:45 PM Juntong Deng <juntong.deng@outlook.com> wrote:
>>
>> -static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
>> -{
>> - if (!btf_id_set8_contains(&bpf_fs_kfunc_set_ids, kfunc_id) ||
>> - prog->type == BPF_PROG_TYPE_LSM)
>> - return 0;
>> - return -EACCES;
>> -}
>> -
>> static const struct btf_kfunc_id_set bpf_fs_kfunc_set = {
>> .owner = THIS_MODULE,
>> .set = &bpf_fs_kfunc_set_ids,
>> - .filter = bpf_fs_kfuncs_filter,
>> };
>>
>> static int __init bpf_fs_kfuncs_init(void)
>> {
>> - return register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM, &bpf_fs_kfunc_set);
>> + int ret;
>> +
>> + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM, &bpf_fs_kfunc_set);
>> + return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_fs_kfunc_set);
>> }
>>
>> late_initcall(bpf_fs_kfuncs_init);
>> diff --git a/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c b/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
>> index d6d3f4fcb24c..5aab75fd2fa5 100644
>> --- a/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
>> +++ b/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
>> @@ -148,14 +148,4 @@ int BPF_PROG(path_d_path_kfunc_invalid_buf_sz, struct file *file)
>> return 0;
>> }
>>
>> -SEC("fentry/vfs_open")
>> -__failure __msg("calling kernel function bpf_path_d_path is not allowed")
>
> This is incorrect.
> You have to keep bpf_fs_kfuncs_filter() and prog->type == BPF_PROG_TYPE_LSM
> check because bpf_prog_type_to_kfunc_hook() aliases LSM and fentry
> into BTF_KFUNC_HOOK_TRACING category. It's been an annoying quirk.
> We're figuring out details for significant refactoring of
> register_btf_kfunc_id_set() and the whole registration process.
>
> Maybe you would be interested in working on it?
>
> The main goal is to get rid of run-time mask check in SCX_CALL_OP() and
> make it static by the verifier. To make that happen scx_kf_mask flags
> would need to become KF_* flags while each struct-ops callback will
> specify the expected mask.
> Then at struct-ops prog attach time the verifier will see the expected mask
> and can check that all kfuncs calls of this particular program
> satisfy the mask. Then all of the runtime overhead of
> current->scx.kf_mask and scx_kf_allowed() will go away.
Thanks for pointing this out.
Yes, I am interested in working on it.
I will try to solve this problem in a separate patch series.
The following are my thoughts:
Should we really use KF_* to do this? I think KF_* is currently more
like declaring that a kfunc has some kind of attribute, e.g.
KF_TRUSTED_ARGS means that the kfunc only accepts trusted arguments,
rather than being used to categorise kfuncs.
It is not sustainable to restrict the kfuncs that can be used based on
program types, which are coarse-grained. This problem will get worse
as kfuncs increase.
In my opinion, managing the kfuncs available to bpf programs should be
implemented as capabilities. Capabilities are a mature permission model.
We can treat a set of kfuncs as a capability (like the various current
kfunc_sets, but the current kfunc_sets did not carefully divide
permissions).
We should use separate BPF_CAP_XXX flags to manage these capabilities.
For example, SCX may define BPF_CAP_SCX_DISPATCH.
For program types, we should divide them into two levels, types and
subtypes. Types are used to register common capabilities and subtypes
are used to register specific capabilities. The verifier can check if
the used kfuncs are allowed based on the type and subtype of the bpf
program.
I understand that we need to maintain backward compatibility to
userspace, but capabilities are internal changes in the kernel.
Perhaps we can make the current program types as subtypes and
add 'types' that are only used internally, and more subtypes
(program types) can be added in the future.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type
2024-12-24 0:51 ` Juntong Deng
@ 2024-12-24 12:10 ` Juntong Deng
2025-01-09 19:23 ` per st_ops kfunc allow/deny mask. Was: " Alexei Starovoitov
1 sibling, 0 replies; 21+ messages in thread
From: Juntong Deng @ 2024-12-24 12:10 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Kumar Kartikeya Dwivedi, snorcht, Christian Brauner, bpf, LKML,
Linux-Fsdevel
On 2024/12/24 00:51, Juntong Deng wrote:
> On 2024/12/19 16:41, Alexei Starovoitov wrote:
>> On Tue, Dec 17, 2024 at 3:45 PM Juntong Deng
>> <juntong.deng@outlook.com> wrote:
>>>
>>> -static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32
>>> kfunc_id)
>>> -{
>>> - if (!btf_id_set8_contains(&bpf_fs_kfunc_set_ids, kfunc_id) ||
>>> - prog->type == BPF_PROG_TYPE_LSM)
>>> - return 0;
>>> - return -EACCES;
>>> -}
>>> -
>>> static const struct btf_kfunc_id_set bpf_fs_kfunc_set = {
>>> .owner = THIS_MODULE,
>>> .set = &bpf_fs_kfunc_set_ids,
>>> - .filter = bpf_fs_kfuncs_filter,
>>> };
>>>
>>> static int __init bpf_fs_kfuncs_init(void)
>>> {
>>> - return register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM,
>>> &bpf_fs_kfunc_set);
>>> + int ret;
>>> +
>>> + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM,
>>> &bpf_fs_kfunc_set);
>>> + return ret ?:
>>> register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_fs_kfunc_set);
>>> }
>>>
>>> late_initcall(bpf_fs_kfuncs_init);
>>> diff --git a/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
>>> b/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
>>> index d6d3f4fcb24c..5aab75fd2fa5 100644
>>> --- a/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
>>> +++ b/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
>>> @@ -148,14 +148,4 @@ int BPF_PROG(path_d_path_kfunc_invalid_buf_sz,
>>> struct file *file)
>>> return 0;
>>> }
>>>
>>> -SEC("fentry/vfs_open")
>>> -__failure __msg("calling kernel function bpf_path_d_path is not
>>> allowed")
>>
>> This is incorrect.
>> You have to keep bpf_fs_kfuncs_filter() and prog->type ==
>> BPF_PROG_TYPE_LSM
>> check because bpf_prog_type_to_kfunc_hook() aliases LSM and fentry
>> into BTF_KFUNC_HOOK_TRACING category. It's been an annoying quirk.
>> We're figuring out details for significant refactoring of
>> register_btf_kfunc_id_set() and the whole registration process.
>>
>> Maybe you would be interested in working on it?
>>
>> The main goal is to get rid of run-time mask check in SCX_CALL_OP() and
>> make it static by the verifier. To make that happen scx_kf_mask flags
>> would need to become KF_* flags while each struct-ops callback will
>> specify the expected mask.
>> Then at struct-ops prog attach time the verifier will see the expected
>> mask
>> and can check that all kfuncs calls of this particular program
>> satisfy the mask. Then all of the runtime overhead of
>> current->scx.kf_mask and scx_kf_allowed() will go away.
>
> Thanks for pointing this out.
>
> Yes, I am interested in working on it.
>
> I will try to solve this problem in a separate patch series.
>
>
> The following are my thoughts:
>
> Should we really use KF_* to do this? I think KF_* is currently more
> like declaring that a kfunc has some kind of attribute, e.g.
> KF_TRUSTED_ARGS means that the kfunc only accepts trusted arguments,
> rather than being used to categorise kfuncs.
>
> It is not sustainable to restrict the kfuncs that can be used based on
> program types, which are coarse-grained. This problem will get worse
> as kfuncs increase.
>
> In my opinion, managing the kfuncs available to bpf programs should be
> implemented as capabilities. Capabilities are a mature permission model.
> We can treat a set of kfuncs as a capability (like the various current
> kfunc_sets, but the current kfunc_sets did not carefully divide
> permissions).
>
> We should use separate BPF_CAP_XXX flags to manage these capabilities.
> For example, SCX may define BPF_CAP_SCX_DISPATCH.
>
> For program types, we should divide them into two levels, types and
> subtypes. Types are used to register common capabilities and subtypes
> are used to register specific capabilities. The verifier can check if
> the used kfuncs are allowed based on the type and subtype of the bpf
> program.
>
> I understand that we need to maintain backward compatibility to
> userspace, but capabilities are internal changes in the kernel.
> Perhaps we can make the current program types as subtypes and
> add 'types' that are only used internally, and more subtypes
> (program types) can be added in the future.
Sorry, this email was sent at midnight before I went to bed, and yes,
it looks a bit radical.
But in the long run, as ebpf is used in more and more scenarios
(better kernel module), and we have more and more kfuncs
(better EXPORT_SYMBOL_GPL).
Managing (restricting) kfuncs that can be used in different contexts
will become more and more complex.
Therefore it might be better for ebpf to transition to fine-grained
permission management (capabilities).
Maybe we can have more discussion.
Many thanks.
^ permalink raw reply [flat|nested] 21+ messages in thread* per st_ops kfunc allow/deny mask. Was: [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type
2024-12-24 0:51 ` Juntong Deng
2024-12-24 12:10 ` Juntong Deng
@ 2025-01-09 19:23 ` Alexei Starovoitov
2025-01-09 20:49 ` Song Liu
2025-01-10 20:42 ` Juntong Deng
1 sibling, 2 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2025-01-09 19:23 UTC (permalink / raw)
To: Juntong Deng, Tejun Heo
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Kumar Kartikeya Dwivedi, snorcht, Christian Brauner, bpf, LKML,
Linux-Fsdevel
On Mon, Dec 23, 2024 at 4:51 PM Juntong Deng <juntong.deng@outlook.com> wrote:
>
> >
> > The main goal is to get rid of run-time mask check in SCX_CALL_OP() and
> > make it static by the verifier. To make that happen scx_kf_mask flags
> > would need to become KF_* flags while each struct-ops callback will
> > specify the expected mask.
> > Then at struct-ops prog attach time the verifier will see the expected mask
> > and can check that all kfuncs calls of this particular program
> > satisfy the mask. Then all of the runtime overhead of
> > current->scx.kf_mask and scx_kf_allowed() will go away.
>
> Thanks for pointing this out.
>
> Yes, I am interested in working on it.
>
> I will try to solve this problem in a separate patch series.
>
>
> The following are my thoughts:
>
> Should we really use KF_* to do this? I think KF_* is currently more
> like declaring that a kfunc has some kind of attribute, e.g.
> KF_TRUSTED_ARGS means that the kfunc only accepts trusted arguments,
> rather than being used to categorise kfuncs.
>
> It is not sustainable to restrict the kfuncs that can be used based on
> program types, which are coarse-grained. This problem will get worse
> as kfuncs increase.
>
> In my opinion, managing the kfuncs available to bpf programs should be
> implemented as capabilities. Capabilities are a mature permission model.
> We can treat a set of kfuncs as a capability (like the various current
> kfunc_sets, but the current kfunc_sets did not carefully divide
> permissions).
>
> We should use separate BPF_CAP_XXX flags to manage these capabilities.
> For example, SCX may define BPF_CAP_SCX_DISPATCH.
>
> For program types, we should divide them into two levels, types and
> subtypes. Types are used to register common capabilities and subtypes
> are used to register specific capabilities. The verifier can check if
> the used kfuncs are allowed based on the type and subtype of the bpf
> program.
>
> I understand that we need to maintain backward compatibility to
> userspace, but capabilities are internal changes in the kernel.
> Perhaps we can make the current program types as subtypes and
> add 'types' that are only used internally, and more subtypes
> (program types) can be added in the future.
Sorry for the delay.
imo CAP* approach doesn't fit.
caps are security bits exposed to user space.
Here there is no need to expose anything to user space.
But you're also correct that we cannot extend kfunc KF_* flags
that easily. KF_* flags are limited to 32-bit and we're already
using 12 bits.
enum scx_kf_mask needs 5 bits, so we can squeeze them into
the current 32-bit field _for now_,
but eventually we'd need to refactor kfunc definition into a wider set:
BTF_ID_FLAGS(func, .. KF_*)
so that different struct_ops consumers can define their own bits.
Right now SCX is the only st_ops consumer who needs this feature,
so let's squeeze into the existing KF facility.
First step is to remap scx_kf_mask bits into unused bits in KF_
and annotate corresponding sched-ext kfuncs with it.
For example:
SCX_KF_DISPATCH will become
KF_DISPATCH (1 << 13)
and all kfuncs that are allowed to be called from ->dispatch() callback
will be annotated like:
- BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
- BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots)
- BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel)
+ BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
+ BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots, KF_DISPATCH)
+ BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel, KF_DISPATCH)
For sched_ext_ops callback annotations, I think,
the simplest approach is to add special
BTF_SET8_START(st_ops_flags)
BTF_ID_FLAGS(func, sched_ext_ops__dispatch, KF_DISPATCH)
and so on for other ops stubs.
sched_ext_ops__dispatch() is an empty function that
exists in the vmlinux, and though it's not a kfunc
we can use it to annotate
(struct sched_ext_ops *)->dispatch() callback
with a particular KF_ flag
(or a set of flags for SCX_KF_RQ_LOCKED case).
Then the verifier (while analyzing the program that is targeted
to be attach to this ->dispatch() hook)
will check this extra KF flag in st_ops
and will only allow to call kfuncs with matching flags:
if (st_ops->kf_mask & kfunc->kf_mask) // ok to call kfunc from this callback
The end result current->scx.kf_mask will be removed
and instead of run-time check it will become static verifier check.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: per st_ops kfunc allow/deny mask. Was: [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type
2025-01-09 19:23 ` per st_ops kfunc allow/deny mask. Was: " Alexei Starovoitov
@ 2025-01-09 20:49 ` Song Liu
2025-01-09 21:24 ` Alexei Starovoitov
2025-01-10 20:19 ` Tejun Heo
2025-01-10 20:42 ` Juntong Deng
1 sibling, 2 replies; 21+ messages in thread
From: Song Liu @ 2025-01-09 20:49 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Juntong Deng, Tejun Heo, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eddy Z,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Kumar Kartikeya Dwivedi, snorcht, Christian Brauner, bpf, LKML,
Linux-Fsdevel
On Thu, Jan 9, 2025 at 11:24 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Dec 23, 2024 at 4:51 PM Juntong Deng <juntong.deng@outlook.com> wrote:
> >
> > >
> > > The main goal is to get rid of run-time mask check in SCX_CALL_OP() and
> > > make it static by the verifier. To make that happen scx_kf_mask flags
> > > would need to become KF_* flags while each struct-ops callback will
> > > specify the expected mask.
> > > Then at struct-ops prog attach time the verifier will see the expected mask
> > > and can check that all kfuncs calls of this particular program
> > > satisfy the mask. Then all of the runtime overhead of
> > > current->scx.kf_mask and scx_kf_allowed() will go away.
> >
> > Thanks for pointing this out.
> >
> > Yes, I am interested in working on it.
> >
> > I will try to solve this problem in a separate patch series.
> >
> >
> > The following are my thoughts:
> >
> > Should we really use KF_* to do this? I think KF_* is currently more
> > like declaring that a kfunc has some kind of attribute, e.g.
> > KF_TRUSTED_ARGS means that the kfunc only accepts trusted arguments,
> > rather than being used to categorise kfuncs.
> >
> > It is not sustainable to restrict the kfuncs that can be used based on
> > program types, which are coarse-grained. This problem will get worse
> > as kfuncs increase.
> >
> > In my opinion, managing the kfuncs available to bpf programs should be
> > implemented as capabilities. Capabilities are a mature permission model.
> > We can treat a set of kfuncs as a capability (like the various current
> > kfunc_sets, but the current kfunc_sets did not carefully divide
> > permissions).
> >
> > We should use separate BPF_CAP_XXX flags to manage these capabilities.
> > For example, SCX may define BPF_CAP_SCX_DISPATCH.
> >
> > For program types, we should divide them into two levels, types and
> > subtypes. Types are used to register common capabilities and subtypes
> > are used to register specific capabilities. The verifier can check if
> > the used kfuncs are allowed based on the type and subtype of the bpf
> > program.
> >
> > I understand that we need to maintain backward compatibility to
> > userspace, but capabilities are internal changes in the kernel.
> > Perhaps we can make the current program types as subtypes and
> > add 'types' that are only used internally, and more subtypes
> > (program types) can be added in the future.
>
> Sorry for the delay.
> imo CAP* approach doesn't fit.
> caps are security bits exposed to user space.
> Here there is no need to expose anything to user space.
>
> But you're also correct that we cannot extend kfunc KF_* flags
> that easily. KF_* flags are limited to 32-bit and we're already
> using 12 bits.
> enum scx_kf_mask needs 5 bits, so we can squeeze them into
> the current 32-bit field _for now_,
> but eventually we'd need to refactor kfunc definition into a wider set:
> BTF_ID_FLAGS(func, .. KF_*)
> so that different struct_ops consumers can define their own bits.
>
> Right now SCX is the only st_ops consumer who needs this feature,
> so let's squeeze into the existing KF facility.
>
> First step is to remap scx_kf_mask bits into unused bits in KF_
> and annotate corresponding sched-ext kfuncs with it.
> For example:
> SCX_KF_DISPATCH will become
> KF_DISPATCH (1 << 13)
>
> and all kfuncs that are allowed to be called from ->dispatch() callback
> will be annotated like:
> - BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
> - BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots)
> - BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel)
> + BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
> + BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots, KF_DISPATCH)
> + BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel, KF_DISPATCH)
>
>
> For sched_ext_ops callback annotations, I think,
> the simplest approach is to add special
> BTF_SET8_START(st_ops_flags)
> BTF_ID_FLAGS(func, sched_ext_ops__dispatch, KF_DISPATCH)
> and so on for other ops stubs.
>
> sched_ext_ops__dispatch() is an empty function that
> exists in the vmlinux, and though it's not a kfunc
> we can use it to annotate
> (struct sched_ext_ops *)->dispatch() callback
> with a particular KF_ flag
> (or a set of flags for SCX_KF_RQ_LOCKED case).
>
> Then the verifier (while analyzing the program that is targeted
> to be attach to this ->dispatch() hook)
> will check this extra KF flag in st_ops
> and will only allow to call kfuncs with matching flags:
>
> if (st_ops->kf_mask & kfunc->kf_mask) // ok to call kfunc from this callback
>
> The end result current->scx.kf_mask will be removed
> and instead of run-time check it will become static verifier check.
Shall we move some of these logics from verifier core to
btf_kfunc_id_set.filter()? IIUC, this would avoid using extra
KF_* bits. To make the filter functions more capable, we
probably need to pass bpf_verifier_env into the filter() function.
Does this make sense?
Thanks,
Song
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: per st_ops kfunc allow/deny mask. Was: [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type
2025-01-09 20:49 ` Song Liu
@ 2025-01-09 21:24 ` Alexei Starovoitov
2025-01-10 20:19 ` Tejun Heo
1 sibling, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2025-01-09 21:24 UTC (permalink / raw)
To: Song Liu
Cc: Juntong Deng, Tejun Heo, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eddy Z,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Kumar Kartikeya Dwivedi, snorcht, Christian Brauner, bpf, LKML,
Linux-Fsdevel
On Thu, Jan 9, 2025 at 12:49 PM Song Liu <song@kernel.org> wrote:
>
> On Thu, Jan 9, 2025 at 11:24 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Dec 23, 2024 at 4:51 PM Juntong Deng <juntong.deng@outlook.com> wrote:
> > >
> > > >
> > > > The main goal is to get rid of run-time mask check in SCX_CALL_OP() and
> > > > make it static by the verifier. To make that happen scx_kf_mask flags
> > > > would need to become KF_* flags while each struct-ops callback will
> > > > specify the expected mask.
> > > > Then at struct-ops prog attach time the verifier will see the expected mask
> > > > and can check that all kfuncs calls of this particular program
> > > > satisfy the mask. Then all of the runtime overhead of
> > > > current->scx.kf_mask and scx_kf_allowed() will go away.
> > >
> > > Thanks for pointing this out.
> > >
> > > Yes, I am interested in working on it.
> > >
> > > I will try to solve this problem in a separate patch series.
> > >
> > >
> > > The following are my thoughts:
> > >
> > > Should we really use KF_* to do this? I think KF_* is currently more
> > > like declaring that a kfunc has some kind of attribute, e.g.
> > > KF_TRUSTED_ARGS means that the kfunc only accepts trusted arguments,
> > > rather than being used to categorise kfuncs.
> > >
> > > It is not sustainable to restrict the kfuncs that can be used based on
> > > program types, which are coarse-grained. This problem will get worse
> > > as kfuncs increase.
> > >
> > > In my opinion, managing the kfuncs available to bpf programs should be
> > > implemented as capabilities. Capabilities are a mature permission model.
> > > We can treat a set of kfuncs as a capability (like the various current
> > > kfunc_sets, but the current kfunc_sets did not carefully divide
> > > permissions).
> > >
> > > We should use separate BPF_CAP_XXX flags to manage these capabilities.
> > > For example, SCX may define BPF_CAP_SCX_DISPATCH.
> > >
> > > For program types, we should divide them into two levels, types and
> > > subtypes. Types are used to register common capabilities and subtypes
> > > are used to register specific capabilities. The verifier can check if
> > > the used kfuncs are allowed based on the type and subtype of the bpf
> > > program.
> > >
> > > I understand that we need to maintain backward compatibility to
> > > userspace, but capabilities are internal changes in the kernel.
> > > Perhaps we can make the current program types as subtypes and
> > > add 'types' that are only used internally, and more subtypes
> > > (program types) can be added in the future.
> >
> > Sorry for the delay.
> > imo CAP* approach doesn't fit.
> > caps are security bits exposed to user space.
> > Here there is no need to expose anything to user space.
> >
> > But you're also correct that we cannot extend kfunc KF_* flags
> > that easily. KF_* flags are limited to 32-bit and we're already
> > using 12 bits.
> > enum scx_kf_mask needs 5 bits, so we can squeeze them into
> > the current 32-bit field _for now_,
> > but eventually we'd need to refactor kfunc definition into a wider set:
> > BTF_ID_FLAGS(func, .. KF_*)
> > so that different struct_ops consumers can define their own bits.
> >
> > Right now SCX is the only st_ops consumer who needs this feature,
> > so let's squeeze into the existing KF facility.
> >
> > First step is to remap scx_kf_mask bits into unused bits in KF_
> > and annotate corresponding sched-ext kfuncs with it.
> > For example:
> > SCX_KF_DISPATCH will become
> > KF_DISPATCH (1 << 13)
> >
> > and all kfuncs that are allowed to be called from ->dispatch() callback
> > will be annotated like:
> > - BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
> > - BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots)
> > - BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel)
> > + BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
> > + BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots, KF_DISPATCH)
> > + BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel, KF_DISPATCH)
> >
> >
> > For sched_ext_ops callback annotations, I think,
> > the simplest approach is to add special
> > BTF_SET8_START(st_ops_flags)
> > BTF_ID_FLAGS(func, sched_ext_ops__dispatch, KF_DISPATCH)
> > and so on for other ops stubs.
> >
> > sched_ext_ops__dispatch() is an empty function that
> > exists in the vmlinux, and though it's not a kfunc
> > we can use it to annotate
> > (struct sched_ext_ops *)->dispatch() callback
> > with a particular KF_ flag
> > (or a set of flags for SCX_KF_RQ_LOCKED case).
> >
> > Then the verifier (while analyzing the program that is targeted
> > to be attach to this ->dispatch() hook)
> > will check this extra KF flag in st_ops
> > and will only allow to call kfuncs with matching flags:
> >
> > if (st_ops->kf_mask & kfunc->kf_mask) // ok to call kfunc from this callback
> >
> > The end result current->scx.kf_mask will be removed
> > and instead of run-time check it will become static verifier check.
>
> Shall we move some of these logics from verifier core to
> btf_kfunc_id_set.filter()? IIUC, this would avoid using extra
> KF_* bits. To make the filter functions more capable, we
> probably need to pass bpf_verifier_env into the filter() function.
Passing env is probably unnecessary,
but if save 'moff':
const struct btf_type *t,
const struct btf_member *member,
u32 moff = __btf_member_bit_offset(t, member) / 8;
after successful check_struct_ops_btf_id() somewhere in prog->aux
then btf_kfunc_id_set.filter() can indeed do
moff == offsetof(struct sched_ext_ops, dispatch)
allow kfuncs suitable for dispatch.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: per st_ops kfunc allow/deny mask. Was: [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type
2025-01-09 20:49 ` Song Liu
2025-01-09 21:24 ` Alexei Starovoitov
@ 2025-01-10 20:19 ` Tejun Heo
2025-01-10 20:50 ` Juntong Deng
1 sibling, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2025-01-10 20:19 UTC (permalink / raw)
To: Song Liu
Cc: Alexei Starovoitov, Juntong Deng, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko,
Martin KaFai Lau, Eddy Z, Yonghong Song, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Kumar Kartikeya Dwivedi,
snorcht, Christian Brauner, bpf, LKML, Linux-Fsdevel
Hello,
On Thu, Jan 09, 2025 at 12:49:39PM -0800, Song Liu wrote:
...
> Shall we move some of these logics from verifier core to
> btf_kfunc_id_set.filter()? IIUC, this would avoid using extra
> KF_* bits. To make the filter functions more capable, we
> probably need to pass bpf_verifier_env into the filter() function.
FWIW, doing this through callbacks (maybe with predefined helpers and
conventions) seems like the better approach to me given that this policy is
closely tied to specific subsystem (sched_ext here). e.g. If sched_ext want
to introduce new kfunc groups or rules, the changes being contained within
sched_ext implementation would be nicer.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: per st_ops kfunc allow/deny mask. Was: [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type
2025-01-10 20:19 ` Tejun Heo
@ 2025-01-10 20:50 ` Juntong Deng
0 siblings, 0 replies; 21+ messages in thread
From: Juntong Deng @ 2025-01-10 20:50 UTC (permalink / raw)
To: Tejun Heo, Song Liu
Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eddy Z,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Kumar Kartikeya Dwivedi, snorcht, Christian Brauner, bpf, LKML,
Linux-Fsdevel
On 2025/1/10 20:19, Tejun Heo wrote:
> Hello,
>
> On Thu, Jan 09, 2025 at 12:49:39PM -0800, Song Liu wrote:
> ...
>> Shall we move some of these logics from verifier core to
>> btf_kfunc_id_set.filter()? IIUC, this would avoid using extra
>> KF_* bits. To make the filter functions more capable, we
>> probably need to pass bpf_verifier_env into the filter() function.
>
> FWIW, doing this through callbacks (maybe with predefined helpers and
> conventions) seems like the better approach to me given that this policy is
> closely tied to specific subsystem (sched_ext here). e.g. If sched_ext want
> to introduce new kfunc groups or rules, the changes being contained within
> sched_ext implementation would be nicer.
>
> Thanks.
>
I think so, it would be better to use callback functions and keep
this part decoupled from bpf core.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: per st_ops kfunc allow/deny mask. Was: [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type
2025-01-09 19:23 ` per st_ops kfunc allow/deny mask. Was: " Alexei Starovoitov
2025-01-09 20:49 ` Song Liu
@ 2025-01-10 20:42 ` Juntong Deng
2025-01-16 19:52 ` Juntong Deng
1 sibling, 1 reply; 21+ messages in thread
From: Juntong Deng @ 2025-01-10 20:42 UTC (permalink / raw)
To: Alexei Starovoitov, Tejun Heo
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Kumar Kartikeya Dwivedi, snorcht, Christian Brauner, bpf, LKML,
Linux-Fsdevel
On 2025/1/9 19:23, Alexei Starovoitov wrote:
> On Mon, Dec 23, 2024 at 4:51 PM Juntong Deng <juntong.deng@outlook.com> wrote:
>>
>>>
>>> The main goal is to get rid of run-time mask check in SCX_CALL_OP() and
>>> make it static by the verifier. To make that happen scx_kf_mask flags
>>> would need to become KF_* flags while each struct-ops callback will
>>> specify the expected mask.
>>> Then at struct-ops prog attach time the verifier will see the expected mask
>>> and can check that all kfuncs calls of this particular program
>>> satisfy the mask. Then all of the runtime overhead of
>>> current->scx.kf_mask and scx_kf_allowed() will go away.
>>
>> Thanks for pointing this out.
>>
>> Yes, I am interested in working on it.
>>
>> I will try to solve this problem in a separate patch series.
>>
>>
>> The following are my thoughts:
>>
>> Should we really use KF_* to do this? I think KF_* is currently more
>> like declaring that a kfunc has some kind of attribute, e.g.
>> KF_TRUSTED_ARGS means that the kfunc only accepts trusted arguments,
>> rather than being used to categorise kfuncs.
>>
>> It is not sustainable to restrict the kfuncs that can be used based on
>> program types, which are coarse-grained. This problem will get worse
>> as kfuncs increase.
>>
>> In my opinion, managing the kfuncs available to bpf programs should be
>> implemented as capabilities. Capabilities are a mature permission model.
>> We can treat a set of kfuncs as a capability (like the various current
>> kfunc_sets, but the current kfunc_sets did not carefully divide
>> permissions).
>>
>> We should use separate BPF_CAP_XXX flags to manage these capabilities.
>> For example, SCX may define BPF_CAP_SCX_DISPATCH.
>>
>> For program types, we should divide them into two levels, types and
>> subtypes. Types are used to register common capabilities and subtypes
>> are used to register specific capabilities. The verifier can check if
>> the used kfuncs are allowed based on the type and subtype of the bpf
>> program.
>>
>> I understand that we need to maintain backward compatibility to
>> userspace, but capabilities are internal changes in the kernel.
>> Perhaps we can make the current program types as subtypes and
>> add 'types' that are only used internally, and more subtypes
>> (program types) can be added in the future.
>
> Sorry for the delay.
> imo CAP* approach doesn't fit.
> caps are security bits exposed to user space.
> Here there is no need to expose anything to user space.
>
> But you're also correct that we cannot extend kfunc KF_* flags
> that easily. KF_* flags are limited to 32-bit and we're already
> using 12 bits.
> enum scx_kf_mask needs 5 bits, so we can squeeze them into
> the current 32-bit field _for now_,
> but eventually we'd need to refactor kfunc definition into a wider set:
> BTF_ID_FLAGS(func, .. KF_*)
> so that different struct_ops consumers can define their own bits.
>
> Right now SCX is the only st_ops consumer who needs this feature,
> so let's squeeze into the existing KF facility.
>
> First step is to remap scx_kf_mask bits into unused bits in KF_
> and annotate corresponding sched-ext kfuncs with it.
> For example:
> SCX_KF_DISPATCH will become
> KF_DISPATCH (1 << 13)
>
> and all kfuncs that are allowed to be called from ->dispatch() callback
> will be annotated like:
> - BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
> - BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots)
> - BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel)
> + BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
> + BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots, KF_DISPATCH)
> + BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel, KF_DISPATCH)
>
>
> For sched_ext_ops callback annotations, I think,
> the simplest approach is to add special
> BTF_SET8_START(st_ops_flags)
> BTF_ID_FLAGS(func, sched_ext_ops__dispatch, KF_DISPATCH)
> and so on for other ops stubs.
>
> sched_ext_ops__dispatch() is an empty function that
> exists in the vmlinux, and though it's not a kfunc
> we can use it to annotate
> (struct sched_ext_ops *)->dispatch() callback
> with a particular KF_ flag
> (or a set of flags for SCX_KF_RQ_LOCKED case).
>
> Then the verifier (while analyzing the program that is targeted
> to be attach to this ->dispatch() hook)
> will check this extra KF flag in st_ops
> and will only allow to call kfuncs with matching flags:
>
> if (st_ops->kf_mask & kfunc->kf_mask) // ok to call kfunc from this callback
>
> The end result current->scx.kf_mask will be removed
> and instead of run-time check it will become static verifier check.
Sorry, I may not have explained my idea clearly.
The "capabilities" I mentioned have nothing to do with userspace.
The "capabilities" I mentioned are conceptual, not referring to the
capabilities in the Linux.
My idea is that a similar "capabilities" mechanism should be used
inside the BPF subsystem (separate).
I think the essence of the problem is that ONE bpf program type can
be used in MANY different contexts, but different contexts can have
different restrictions.
It is reasonable for one bpf program type to be used in different
contexts. There is no need for one bpf program type to be used
in only one context.
But currently the "permission" management of the BPF subsystem is
completely based on the bpf program type, which is a coarse-grained
model (for example, what kfuncs are allowed to be used, which can
be considered as permissions).
As BPF is used in more and more scenarios, and as one bpf program type
is used in more and more different scenarios, the coarse-grained problem
starts to emerge. It is difficult to divide permissions in different
contexts based on a coarse-grained permission model.
This is why I said that the BPF subsystem should have its own
"capabilities" (again, not part of Linux capabilities, and nothing
to do with userspace).
In my opinion, we should separate permission management from bpf program
types. We need an extra layer of abstraction so that we can achieve
fine-grained permission management.
The reason why I have the idea of capabilities is because in my opinion,
bpf programs need application-like permissions management in a sense,
because BPF is generic.
When BPF is applied to other subsystems (e.g. scheduling, security,
accessing information from other subsystems), we need something like
capabilities. Each subsystem can define its own set of bpf capabilities
to restrict the features that can be used by bpf programs in different
contexts, so that bpf programs can only use a subset of features.
Another advantage of this approach is that bpf capabilities do not need
to be tightly placed inside the bpf core, people in other subsystems can
define them externally and add bpf capabilities they need (Adding
KF_FLAGS can be considered as modifying the bpf core, right?).
Of course, maybe one day in the future, we may be able to associate bpf
capabilities with Linux capabilities, maybe system administrators can
choose to open only some of bpf features to certain users, and maybe all
of this can be configured through /sys/bpf.
So, how do we implement this in the verifier? I think registering the
bpf capabilities is not an problem, it is consistent with the current
registration of kfuncs to the bpf program type, we still use
struct btf_kfunc_id_set.
The really interesting part is how we allow people from different
subsystems to change the capabilities of the bpf program in different
contexts under the same bpf program type. My idea is to add a new
callback function in struct bpf_verifier_ops, say bpf_ctx_allowed_cap.
We can pass context information to this callback function, and the
person who implements this callback function can decide to adjust the
current capabilities (add or delete) in different contexts. In the
case of bpf_struct_ops, the context information may be "moff".
In my opinion, capabilities are a flexible and extensible approach.
All it takes is adding a layer of abstraction to decouple permissions
from program types.
Of course, there are more other technical details that need to be
figured out, and if you think the bpf capabilities is an interesting
idea worth trying, I will try to write a minimal POC and send an
RFC PATCH.
(Actually I have always wanted to write a POC but in the last two weeks
I have been busy with my university stuff and really didn't have the
time, but now I finally have some time to try it)
Yes, maybe I am a bit too forward thinking, at the moment only SCX has
hit this "wall", and at the moment we can indeed solve it directly with
KF_FLAGS or filters.
But I think the problem will persist and come up again in other
scenarios, and maybe we can try something new and maybe not just
solve the SCX problem.
Does this make sense?
Maybe we can have more discussion.
Many thanks.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: per st_ops kfunc allow/deny mask. Was: [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type
2025-01-10 20:42 ` Juntong Deng
@ 2025-01-16 19:52 ` Juntong Deng
0 siblings, 0 replies; 21+ messages in thread
From: Juntong Deng @ 2025-01-16 19:52 UTC (permalink / raw)
To: Alexei Starovoitov, Tejun Heo
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Kumar Kartikeya Dwivedi, snorcht, Christian Brauner, bpf, LKML,
Linux-Fsdevel
On 2025/1/10 20:42, Juntong Deng wrote:
> On 2025/1/9 19:23, Alexei Starovoitov wrote:
>> On Mon, Dec 23, 2024 at 4:51 PM Juntong Deng
>> <juntong.deng@outlook.com> wrote:
>>>
>>>>
>>>> The main goal is to get rid of run-time mask check in SCX_CALL_OP() and
>>>> make it static by the verifier. To make that happen scx_kf_mask flags
>>>> would need to become KF_* flags while each struct-ops callback will
>>>> specify the expected mask.
>>>> Then at struct-ops prog attach time the verifier will see the
>>>> expected mask
>>>> and can check that all kfuncs calls of this particular program
>>>> satisfy the mask. Then all of the runtime overhead of
>>>> current->scx.kf_mask and scx_kf_allowed() will go away.
>>>
>>> Thanks for pointing this out.
>>>
>>> Yes, I am interested in working on it.
>>>
>>> I will try to solve this problem in a separate patch series.
>>>
>>>
>>> The following are my thoughts:
>>>
>>> Should we really use KF_* to do this? I think KF_* is currently more
>>> like declaring that a kfunc has some kind of attribute, e.g.
>>> KF_TRUSTED_ARGS means that the kfunc only accepts trusted arguments,
>>> rather than being used to categorise kfuncs.
>>>
>>> It is not sustainable to restrict the kfuncs that can be used based on
>>> program types, which are coarse-grained. This problem will get worse
>>> as kfuncs increase.
>>>
>>> In my opinion, managing the kfuncs available to bpf programs should be
>>> implemented as capabilities. Capabilities are a mature permission model.
>>> We can treat a set of kfuncs as a capability (like the various current
>>> kfunc_sets, but the current kfunc_sets did not carefully divide
>>> permissions).
>>>
>>> We should use separate BPF_CAP_XXX flags to manage these capabilities.
>>> For example, SCX may define BPF_CAP_SCX_DISPATCH.
>>>
>>> For program types, we should divide them into two levels, types and
>>> subtypes. Types are used to register common capabilities and subtypes
>>> are used to register specific capabilities. The verifier can check if
>>> the used kfuncs are allowed based on the type and subtype of the bpf
>>> program.
>>>
>>> I understand that we need to maintain backward compatibility to
>>> userspace, but capabilities are internal changes in the kernel.
>>> Perhaps we can make the current program types as subtypes and
>>> add 'types' that are only used internally, and more subtypes
>>> (program types) can be added in the future.
>>
>> Sorry for the delay.
>> imo CAP* approach doesn't fit.
>> caps are security bits exposed to user space.
>> Here there is no need to expose anything to user space.
>>
>> But you're also correct that we cannot extend kfunc KF_* flags
>> that easily. KF_* flags are limited to 32-bit and we're already
>> using 12 bits.
>> enum scx_kf_mask needs 5 bits, so we can squeeze them into
>> the current 32-bit field _for now_,
>> but eventually we'd need to refactor kfunc definition into a wider set:
>> BTF_ID_FLAGS(func, .. KF_*)
>> so that different struct_ops consumers can define their own bits.
>>
>> Right now SCX is the only st_ops consumer who needs this feature,
>> so let's squeeze into the existing KF facility.
>>
>> First step is to remap scx_kf_mask bits into unused bits in KF_
>> and annotate corresponding sched-ext kfuncs with it.
>> For example:
>> SCX_KF_DISPATCH will become
>> KF_DISPATCH (1 << 13)
>>
>> and all kfuncs that are allowed to be called from ->dispatch() callback
>> will be annotated like:
>> - BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
>> - BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots)
>> - BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel)
>> + BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
>> + BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots, KF_DISPATCH)
>> + BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel, KF_DISPATCH)
>>
>>
>> For sched_ext_ops callback annotations, I think,
>> the simplest approach is to add special
>> BTF_SET8_START(st_ops_flags)
>> BTF_ID_FLAGS(func, sched_ext_ops__dispatch, KF_DISPATCH)
>> and so on for other ops stubs.
>>
>> sched_ext_ops__dispatch() is an empty function that
>> exists in the vmlinux, and though it's not a kfunc
>> we can use it to annotate
>> (struct sched_ext_ops *)->dispatch() callback
>> with a particular KF_ flag
>> (or a set of flags for SCX_KF_RQ_LOCKED case).
>>
>> Then the verifier (while analyzing the program that is targeted
>> to be attach to this ->dispatch() hook)
>> will check this extra KF flag in st_ops
>> and will only allow to call kfuncs with matching flags:
>>
>> if (st_ops->kf_mask & kfunc->kf_mask) // ok to call kfunc from this
>> callback
>>
>> The end result current->scx.kf_mask will be removed
>> and instead of run-time check it will become static verifier check.
>
> Sorry, I may not have explained my idea clearly.
>
> The "capabilities" I mentioned have nothing to do with userspace.
> The "capabilities" I mentioned are conceptual, not referring to the
> capabilities in the Linux.
>
> My idea is that a similar "capabilities" mechanism should be used
> inside the BPF subsystem (separate).
>
> I think the essence of the problem is that ONE bpf program type can
> be used in MANY different contexts, but different contexts can have
> different restrictions.
>
> It is reasonable for one bpf program type to be used in different
> contexts. There is no need for one bpf program type to be used
> in only one context.
>
> But currently the "permission" management of the BPF subsystem is
> completely based on the bpf program type, which is a coarse-grained
> model (for example, what kfuncs are allowed to be used, which can
> be considered as permissions).
>
> As BPF is used in more and more scenarios, and as one bpf program type
> is used in more and more different scenarios, the coarse-grained problem
> starts to emerge. It is difficult to divide permissions in different
> contexts based on a coarse-grained permission model.
>
> This is why I said that the BPF subsystem should have its own
> "capabilities" (again, not part of Linux capabilities, and nothing
> to do with userspace).
>
> In my opinion, we should separate permission management from bpf program
> types. We need an extra layer of abstraction so that we can achieve
> fine-grained permission management.
>
> The reason why I have the idea of capabilities is because in my opinion,
> bpf programs need application-like permissions management in a sense,
> because BPF is generic.
>
> When BPF is applied to other subsystems (e.g. scheduling, security,
> accessing information from other subsystems), we need something like
> capabilities. Each subsystem can define its own set of bpf capabilities
> to restrict the features that can be used by bpf programs in different
> contexts, so that bpf programs can only use a subset of features.
>
> Another advantage of this approach is that bpf capabilities do not need
> to be tightly placed inside the bpf core, people in other subsystems can
> define them externally and add bpf capabilities they need (Adding
> KF_FLAGS can be considered as modifying the bpf core, right?).
>
> Of course, maybe one day in the future, we may be able to associate bpf
> capabilities with Linux capabilities, maybe system administrators can
> choose to open only some of bpf features to certain users, and maybe all
> of this can be configured through /sys/bpf.
>
> So, how do we implement this in the verifier? I think registering the
> bpf capabilities is not an problem, it is consistent with the current
> registration of kfuncs to the bpf program type, we still use
> struct btf_kfunc_id_set.
>
> The really interesting part is how we allow people from different
> subsystems to change the capabilities of the bpf program in different
> contexts under the same bpf program type. My idea is to add a new
> callback function in struct bpf_verifier_ops, say bpf_ctx_allowed_cap.
> We can pass context information to this callback function, and the
> person who implements this callback function can decide to adjust the
> current capabilities (add or delete) in different contexts. In the
> case of bpf_struct_ops, the context information may be "moff".
>
> In my opinion, capabilities are a flexible and extensible approach.
> All it takes is adding a layer of abstraction to decouple permissions
> from program types.
>
> Of course, there are more other technical details that need to be
> figured out, and if you think the bpf capabilities is an interesting
> idea worth trying, I will try to write a minimal POC and send an
> RFC PATCH.
>
> (Actually I have always wanted to write a POC but in the last two weeks
> I have been busy with my university stuff and really didn't have the
> time, but now I finally have some time to try it)
>
> Yes, maybe I am a bit too forward thinking, at the moment only SCX has
> hit this "wall", and at the moment we can indeed solve it directly with
> KF_FLAGS or filters.
>
> But I think the problem will persist and come up again in other
> scenarios, and maybe we can try something new and maybe not just
> solve the SCX problem.
>
> Does this make sense?
>
> Maybe we can have more discussion.
>
> Many thanks.
Hello, I sent a proof-of-concept patch series of BPF capabilities [0].
[0]:
https://lore.kernel.org/bpf/AM6PR03MB5080C05323552276324C4B4C991A2@AM6PR03MB5080.eurprd03.prod.outlook.com/T/#t
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH bpf-next v6 5/5] selftests/bpf: Add tests for bpf_fget_task() kfunc
2024-12-17 23:34 [PATCH bpf-next v6 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
` (3 preceding siblings ...)
2024-12-17 23:37 ` [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type Juntong Deng
@ 2024-12-17 23:37 ` Juntong Deng
2024-12-19 16:11 ` [PATCH bpf-next v6 0/5] bpf: Add open-coded style process file iterator and " Yonghong Song
5 siblings, 0 replies; 21+ messages in thread
From: Juntong Deng @ 2024-12-17 23:37 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
brauner
Cc: bpf, linux-kernel, linux-fsdevel
This patch adds test cases for bpf_fget_task() kfunc.
test_bpf_fget_task is used to test obtaining struct file based on
the file descriptor in the current process.
bpf_fget_task_null_task and bpf_fget_task_untrusted_task are used to
test the failure cases of passing NULL or untrusted pointer as argument.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
---
.../testing/selftests/bpf/bpf_experimental.h | 8 +++
.../selftests/bpf/prog_tests/fs_kfuncs.c | 46 ++++++++++++++
.../selftests/bpf/progs/fs_kfuncs_failure.c | 33 ++++++++++
.../selftests/bpf/progs/test_fget_task.c | 63 +++++++++++++++++++
4 files changed, 150 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c
create mode 100644 tools/testing/selftests/bpf/progs/test_fget_task.c
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index ce1520c56b55..e0c9e7d9ba0a 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -221,6 +221,14 @@ extern void bpf_put_file(struct file *file) __ksym;
*/
extern int bpf_path_d_path(struct path *path, char *buf, size_t buf__sz) __ksym;
+/* Description
+ * Get a pointer to the struct file corresponding to the task file descriptor
+ * Note that this function acquires a reference to struct file.
+ * Returns
+ * The corresponding struct file pointer if found, otherwise returns NULL
+ */
+extern struct file *bpf_fget_task(struct task_struct *task, unsigned int fd) __ksym;
+
/* This macro must be used to mark the exception callback corresponding to the
* main program. For example:
*
diff --git a/tools/testing/selftests/bpf/prog_tests/fs_kfuncs.c b/tools/testing/selftests/bpf/prog_tests/fs_kfuncs.c
index 5a0b51157451..89f5e09672b3 100644
--- a/tools/testing/selftests/bpf/prog_tests/fs_kfuncs.c
+++ b/tools/testing/selftests/bpf/prog_tests/fs_kfuncs.c
@@ -9,6 +9,8 @@
#include <test_progs.h>
#include "test_get_xattr.skel.h"
#include "test_fsverity.skel.h"
+#include "test_fget_task.skel.h"
+#include "fs_kfuncs_failure.skel.h"
static const char testfile[] = "/tmp/test_progs_fs_kfuncs";
@@ -139,6 +141,45 @@ static void test_fsverity(void)
remove(testfile);
}
+static void test_fget_task(void)
+{
+ int pipefd[2], prog_fd, err;
+ struct test_fget_task *skel;
+ struct bpf_program *prog;
+
+ skel = test_fget_task__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "open_and_load"))
+ return;
+
+ if (!ASSERT_OK(skel->bss->err, "pre_test_err"))
+ goto cleanup_skel;
+
+ prog = bpf_object__find_program_by_name(skel->obj, "test_bpf_fget_task");
+ if (!ASSERT_OK_PTR(prog, "find_program_by_name"))
+ goto cleanup_skel;
+
+ prog_fd = bpf_program__fd(prog);
+ if (!ASSERT_GT(prog_fd, -1, "bpf_program__fd"))
+ goto cleanup_skel;
+
+ if (pipe(pipefd) < 0)
+ goto cleanup_skel;
+
+ skel->bss->test_fd1 = pipefd[0];
+ skel->bss->test_fd2 = pipefd[1];
+
+ err = bpf_prog_test_run_opts(prog_fd, NULL);
+ if (!ASSERT_OK(err, "prog_test_run"))
+ goto cleanup_pipe;
+
+ ASSERT_OK(skel->bss->err, "run_bpf_fget_task_test_failure");
+cleanup_pipe:
+ close(pipefd[0]);
+ close(pipefd[1]);
+cleanup_skel:
+ test_fget_task__destroy(skel);
+}
+
void test_fs_kfuncs(void)
{
if (test__start_subtest("xattr"))
@@ -146,4 +187,9 @@ void test_fs_kfuncs(void)
if (test__start_subtest("fsverity"))
test_fsverity();
+
+ if (test__start_subtest("fget_task"))
+ test_fget_task();
+
+ RUN_TESTS(fs_kfuncs_failure);
}
diff --git a/tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c b/tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c
new file mode 100644
index 000000000000..57aa6d2787ac
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+char _license[] SEC("license") = "GPL";
+
+SEC("syscall")
+__failure __msg("Possibly NULL pointer passed to trusted arg0")
+int bpf_fget_task_null_task(void *ctx)
+{
+ struct task_struct *task = NULL;
+
+ bpf_fget_task(task, 1);
+
+ return 0;
+}
+
+SEC("syscall")
+__failure __msg("R1 must be referenced or trusted")
+int bpf_fget_task_untrusted_task(void *ctx)
+{
+ struct task_struct *task;
+
+ task = bpf_get_current_task_btf()->parent;
+
+ bpf_fget_task(task, 1);
+
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/test_fget_task.c b/tools/testing/selftests/bpf/progs/test_fget_task.c
new file mode 100644
index 000000000000..fee5d5e1244a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_fget_task.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+int err, test_fd1, test_fd2;
+
+extern const void pipefifo_fops __ksym;
+
+SEC("syscall")
+int test_bpf_fget_task(void *ctx)
+{
+ struct task_struct *task;
+ struct file *file;
+
+ task = bpf_get_current_task_btf();
+ if (task == NULL) {
+ err = 1;
+ return 0;
+ }
+
+ file = bpf_fget_task(task, test_fd1);
+ if (file == NULL) {
+ err = 2;
+ return 0;
+ }
+
+ if (file->f_op != &pipefifo_fops) {
+ err = 3;
+ bpf_put_file(file);
+ return 0;
+ }
+
+ bpf_put_file(file);
+
+ file = bpf_fget_task(task, test_fd2);
+ if (file == NULL) {
+ err = 4;
+ return 0;
+ }
+
+ if (file->f_op != &pipefifo_fops) {
+ err = 5;
+ bpf_put_file(file);
+ return 0;
+ }
+
+ bpf_put_file(file);
+
+ file = bpf_fget_task(task, 9999);
+ if (file != NULL) {
+ err = 6;
+ bpf_put_file(file);
+ }
+
+ return 0;
+}
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH bpf-next v6 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc
2024-12-17 23:34 [PATCH bpf-next v6 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
` (4 preceding siblings ...)
2024-12-17 23:37 ` [PATCH bpf-next v6 5/5] selftests/bpf: Add tests for bpf_fget_task() kfunc Juntong Deng
@ 2024-12-19 16:11 ` Yonghong Song
2024-12-24 0:42 ` Juntong Deng
5 siblings, 1 reply; 21+ messages in thread
From: Yonghong Song @ 2024-12-19 16:11 UTC (permalink / raw)
To: Juntong Deng, ast, daniel, john.fastabend, andrii, martin.lau,
eddyz87, song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
brauner
Cc: bpf, linux-kernel, linux-fsdevel
On 12/17/24 3:34 PM, Juntong Deng wrote:
> This patch series adds open-coded style process file iterator
> bpf_iter_task_file and bpf_fget_task() kfunc, and corresponding
> selftests test cases.
>
> In addition, since fs kfuncs is generic and useful for scenarios
> other than LSM, this patch makes fs kfuncs available for SYSCALL
> program type.
>
> Although iter/task_file already exists, for CRIB we still need the
> open-coded iterator style process file iterator, and the same is true
> for other bpf iterators such as iter/tcp, iter/udp, etc.
>
> The traditional bpf iterator is more like a bpf version of procfs, but
> similar to procfs, it is not suitable for CRIB scenarios that need to
> obtain large amounts of complex, multi-level in-kernel information.
>
> The following is from previous discussions [1].
>
> [1]: https://lore.kernel.org/bpf/AM6PR03MB5848CA34B5B68C90F210285E99B12@AM6PR03MB5848.eurprd03.prod.outlook.com/
>
> This is because the context of bpf iterators is fixed and bpf iterators
> cannot be nested. This means that a bpf iterator program can only
> complete a specific small iterative dump task, and cannot dump
> multi-level data.
>
> An example, when we need to dump all the sockets of a process, we need
> to iterate over all the files (sockets) of the process, and iterate over
> the all packets in the queue of each socket, and iterate over all data
> in each packet.
>
> If we use bpf iterator, since the iterator can not be nested, we need to
> use socket iterator program to get all the basic information of all
> sockets (pass pid as filter), and then use packet iterator program to
> get the basic information of all packets of a specific socket (pass pid,
> fd as filter), and then use packet data iterator program to get all the
> data of a specific packet (pass pid, fd, packet index as filter).
>
> This would be complicated and require a lot of (each iteration)
> bpf program startup and exit (leading to poor performance).
>
> By comparison, open coded iterator is much more flexible, we can iterate
> in any context, at any time, and iteration can be nested, so we can
> achieve more flexible and more elegant dumping through open coded
> iterators.
>
> With open coded iterators, all of the above can be done in a single
> bpf program, and with nested iterators, everything becomes compact
> and simple.
>
> Also, bpf iterators transmit data to user space through seq_file,
> which involves a lot of open (bpf_iter_create), read, close syscalls,
> context switching, memory copying, and cannot achieve the performance
> of using ringbuf.
>
> Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
> ---
> v5 -> v6:
> * Remove local variable in bpf_fget_task.
>
> * Remove KF_RCU_PROTECTED from bpf_iter_task_file_new.
>
> * Remove bpf_fs_kfunc_set from being available for TRACING.
>
> * Use get_task_struct in bpf_iter_task_file_new.
>
> * Use put_task_struct in bpf_iter_task_file_destroy.
>
> v4 -> v5:
> * Add file type checks in test cases for process file iterator
> and bpf_fget_task().
>
> * Use fentry to synchronize tests instead of waiting in a loop.
>
> * Remove path_d_path_kfunc_non_lsm test case.
>
> * Replace task_lookup_next_fdget_rcu() with fget_task_next().
>
> * Remove future merge conflict section in cover letter (resolved).
>
> v3 -> v4:
> * Make all kfuncs generic, not CRIB specific.
>
> * Move bpf_fget_task to fs/bpf_fs_kfuncs.c.
>
> * Remove bpf_iter_task_file_get_fd and bpf_get_file_ops_type.
>
> * Use struct bpf_iter_task_file_item * as the return value of
> bpf_iter_task_file_next.
>
> * Change fd to unsigned int type and add next_fd.
>
> * Add KF_RCU_PROTECTED to bpf_iter_task_file_new.
>
> * Make fs kfuncs available to SYSCALL and TRACING program types.
>
> * Update all relevant test cases.
>
> * Remove the discussion section from cover letter.
>
> v2 -> v3:
> * Move task_file open-coded iterator to kernel/bpf/helpers.c.
>
> * Fix duplicate error code 7 in test_bpf_iter_task_file().
>
> * Add comment for case when bpf_iter_task_file_get_fd() returns -1.
>
> * Add future plans in commit message of "Add struct file related
> CRIB kfuncs".
>
> * Add Discussion section to cover letter.
>
> v1 -> v2:
> * Fix a type definition error in the fd parameter of
> bpf_fget_task() at crib_common.h.
>
> Juntong Deng (5):
> bpf: Introduce task_file open-coded iterator kfuncs
> selftests/bpf: Add tests for open-coded style process file iterator
> bpf: Add bpf_fget_task() kfunc
> bpf: Make fs kfuncs available for SYSCALL program type
> selftests/bpf: Add tests for bpf_fget_task() kfunc
>
> fs/bpf_fs_kfuncs.c | 38 ++++----
> kernel/bpf/helpers.c | 3 +
> kernel/bpf/task_iter.c | 91 +++++++++++++++++++
> .../testing/selftests/bpf/bpf_experimental.h | 15 +++
> .../selftests/bpf/prog_tests/fs_kfuncs.c | 46 ++++++++++
> .../testing/selftests/bpf/prog_tests/iters.c | 79 ++++++++++++++++
> .../selftests/bpf/progs/fs_kfuncs_failure.c | 33 +++++++
> .../selftests/bpf/progs/iters_task_file.c | 86 ++++++++++++++++++
> .../bpf/progs/iters_task_file_failure.c | 91 +++++++++++++++++++
> .../selftests/bpf/progs/test_fget_task.c | 63 +++++++++++++
> .../selftests/bpf/progs/verifier_vfs_reject.c | 10 --
> 11 files changed, 529 insertions(+), 26 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c
> create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c
> create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file_failure.c
> create mode 100644 tools/testing/selftests/bpf/progs/test_fget_task.c
There are quite some CI failures.
https://github.com/kernel-patches/bpf/actions/runs/12403224240/job/34626610882?pr=8266
Please investigate.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH bpf-next v6 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc
2024-12-19 16:11 ` [PATCH bpf-next v6 0/5] bpf: Add open-coded style process file iterator and " Yonghong Song
@ 2024-12-24 0:42 ` Juntong Deng
0 siblings, 0 replies; 21+ messages in thread
From: Juntong Deng @ 2024-12-24 0:42 UTC (permalink / raw)
To: Yonghong Song, ast, daniel, john.fastabend, andrii, martin.lau,
eddyz87, song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
brauner
Cc: bpf, linux-kernel, linux-fsdevel
On 2024/12/19 16:11, Yonghong Song wrote:
>
>
>
> On 12/17/24 3:34 PM, Juntong Deng wrote:
>> This patch series adds open-coded style process file iterator
>> bpf_iter_task_file and bpf_fget_task() kfunc, and corresponding
>> selftests test cases.
>>
>> In addition, since fs kfuncs is generic and useful for scenarios
>> other than LSM, this patch makes fs kfuncs available for SYSCALL
>> program type.
>>
>> Although iter/task_file already exists, for CRIB we still need the
>> open-coded iterator style process file iterator, and the same is true
>> for other bpf iterators such as iter/tcp, iter/udp, etc.
>>
>> The traditional bpf iterator is more like a bpf version of procfs, but
>> similar to procfs, it is not suitable for CRIB scenarios that need to
>> obtain large amounts of complex, multi-level in-kernel information.
>>
>> The following is from previous discussions [1].
>>
>> [1]: https://lore.kernel.org/bpf/
>> AM6PR03MB5848CA34B5B68C90F210285E99B12@AM6PR03MB5848.eurprd03.prod.outlook.com/
>>
>> This is because the context of bpf iterators is fixed and bpf iterators
>> cannot be nested. This means that a bpf iterator program can only
>> complete a specific small iterative dump task, and cannot dump
>> multi-level data.
>>
>> An example, when we need to dump all the sockets of a process, we need
>> to iterate over all the files (sockets) of the process, and iterate over
>> the all packets in the queue of each socket, and iterate over all data
>> in each packet.
>>
>> If we use bpf iterator, since the iterator can not be nested, we need to
>> use socket iterator program to get all the basic information of all
>> sockets (pass pid as filter), and then use packet iterator program to
>> get the basic information of all packets of a specific socket (pass pid,
>> fd as filter), and then use packet data iterator program to get all the
>> data of a specific packet (pass pid, fd, packet index as filter).
>>
>> This would be complicated and require a lot of (each iteration)
>> bpf program startup and exit (leading to poor performance).
>>
>> By comparison, open coded iterator is much more flexible, we can iterate
>> in any context, at any time, and iteration can be nested, so we can
>> achieve more flexible and more elegant dumping through open coded
>> iterators.
>>
>> With open coded iterators, all of the above can be done in a single
>> bpf program, and with nested iterators, everything becomes compact
>> and simple.
>>
>> Also, bpf iterators transmit data to user space through seq_file,
>> which involves a lot of open (bpf_iter_create), read, close syscalls,
>> context switching, memory copying, and cannot achieve the performance
>> of using ringbuf.
>>
>> Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
>> ---
>> v5 -> v6:
>> * Remove local variable in bpf_fget_task.
>>
>> * Remove KF_RCU_PROTECTED from bpf_iter_task_file_new.
>>
>> * Remove bpf_fs_kfunc_set from being available for TRACING.
>>
>> * Use get_task_struct in bpf_iter_task_file_new.
>>
>> * Use put_task_struct in bpf_iter_task_file_destroy.
>>
>> v4 -> v5:
>> * Add file type checks in test cases for process file iterator
>> and bpf_fget_task().
>>
>> * Use fentry to synchronize tests instead of waiting in a loop.
>>
>> * Remove path_d_path_kfunc_non_lsm test case.
>>
>> * Replace task_lookup_next_fdget_rcu() with fget_task_next().
>>
>> * Remove future merge conflict section in cover letter (resolved).
>>
>> v3 -> v4:
>> * Make all kfuncs generic, not CRIB specific.
>>
>> * Move bpf_fget_task to fs/bpf_fs_kfuncs.c.
>>
>> * Remove bpf_iter_task_file_get_fd and bpf_get_file_ops_type.
>>
>> * Use struct bpf_iter_task_file_item * as the return value of
>> bpf_iter_task_file_next.
>>
>> * Change fd to unsigned int type and add next_fd.
>>
>> * Add KF_RCU_PROTECTED to bpf_iter_task_file_new.
>>
>> * Make fs kfuncs available to SYSCALL and TRACING program types.
>>
>> * Update all relevant test cases.
>>
>> * Remove the discussion section from cover letter.
>>
>> v2 -> v3:
>> * Move task_file open-coded iterator to kernel/bpf/helpers.c.
>>
>> * Fix duplicate error code 7 in test_bpf_iter_task_file().
>>
>> * Add comment for case when bpf_iter_task_file_get_fd() returns -1.
>>
>> * Add future plans in commit message of "Add struct file related
>> CRIB kfuncs".
>>
>> * Add Discussion section to cover letter.
>>
>> v1 -> v2:
>> * Fix a type definition error in the fd parameter of
>> bpf_fget_task() at crib_common.h.
>>
>> Juntong Deng (5):
>> bpf: Introduce task_file open-coded iterator kfuncs
>> selftests/bpf: Add tests for open-coded style process file iterator
>> bpf: Add bpf_fget_task() kfunc
>> bpf: Make fs kfuncs available for SYSCALL program type
>> selftests/bpf: Add tests for bpf_fget_task() kfunc
>>
>> fs/bpf_fs_kfuncs.c | 38 ++++----
>> kernel/bpf/helpers.c | 3 +
>> kernel/bpf/task_iter.c | 91 +++++++++++++++++++
>> .../testing/selftests/bpf/bpf_experimental.h | 15 +++
>> .../selftests/bpf/prog_tests/fs_kfuncs.c | 46 ++++++++++
>> .../testing/selftests/bpf/prog_tests/iters.c | 79 ++++++++++++++++
>> .../selftests/bpf/progs/fs_kfuncs_failure.c | 33 +++++++
>> .../selftests/bpf/progs/iters_task_file.c | 86 ++++++++++++++++++
>> .../bpf/progs/iters_task_file_failure.c | 91 +++++++++++++++++++
>> .../selftests/bpf/progs/test_fget_task.c | 63 +++++++++++++
>> .../selftests/bpf/progs/verifier_vfs_reject.c | 10 --
>> 11 files changed, 529 insertions(+), 26 deletions(-)
>> create mode 100644 tools/testing/selftests/bpf/progs/
>> fs_kfuncs_failure.c
>> create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c
>> create mode 100644 tools/testing/selftests/bpf/progs/
>> iters_task_file_failure.c
>> create mode 100644 tools/testing/selftests/bpf/progs/test_fget_task.c
>
> There are quite some CI failures.
>
> https://github.com/kernel-patches/bpf/actions/runs/12403224240/
> job/34626610882?pr=8266
>
> Please investigate.
>
>
Thanks for your reply.
I noticed it, I will fix it in the next version.
^ permalink raw reply [flat|nested] 21+ messages in thread