public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc
@ 2024-12-10 14:01 Juntong Deng
  2024-12-10 14:03 ` [PATCH bpf-next v5 1/5] bpf: Introduce task_file open-coded iterator kfuncs Juntong Deng
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Juntong Deng @ 2024-12-10 14:01 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
	brauner
  Cc: bpf, linux-kernel, linux-fsdevel

This patch series adds open-coded style process file iterator
bpf_iter_task_file and bpf_fget_task() kfunc, and corresponding
selftests test cases.

In addition, since fs kfuncs is generic and useful for scenarios
other than LSM, this patch makes fs kfuncs available for SYSCALL
and TRACING program types [0].

[0]: https://lore.kernel.org/bpf/CAPhsuW6ud21v2xz8iSXf=CiDL+R_zpQ+p8isSTMTw=EiJQtRSw@mail.gmail.com/

Although iter/task_file already exists, for CRIB we still need the
open-coded iterator style process file iterator, and the same is true
for other bpf iterators such as iter/tcp, iter/udp, etc.

The traditional bpf iterator is more like a bpf version of procfs, but
similar to procfs, it is not suitable for CRIB scenarios that need to
obtain large amounts of complex, multi-level in-kernel information.

The following is from previous discussions [2].

[2]: https://lore.kernel.org/bpf/AM6PR03MB5848CA34B5B68C90F210285E99B12@AM6PR03MB5848.eurprd03.prod.outlook.com/

This is because the context of bpf iterators is fixed and bpf iterators
cannot be nested. This means that a bpf iterator program can only
complete a specific small iterative dump task, and cannot dump
multi-level data.

An example, when we need to dump all the sockets of a process, we need
to iterate over all the files (sockets) of the process, and iterate over
the all packets in the queue of each socket, and iterate over all data
in each packet.

If we use bpf iterator, since the iterator can not be nested, we need to
use socket iterator program to get all the basic information of all
sockets (pass pid as filter), and then use packet iterator program to
get the basic information of all packets of a specific socket (pass pid,
fd as filter), and then use packet data iterator program to get all the
data of a specific packet (pass pid, fd, packet index as filter).

This would be complicated and require a lot of (each iteration)
bpf program startup and exit (leading to poor performance).

By comparison, open coded iterator is much more flexible, we can iterate
in any context, at any time, and iteration can be nested, so we can
achieve more flexible and more elegant dumping through open coded
iterators.

With open coded iterators, all of the above can be done in a single
bpf program, and with nested iterators, everything becomes compact
and simple.

Also, bpf iterators transmit data to user space through seq_file,
which involves a lot of open (bpf_iter_create), read, close syscalls,
context switching, memory copying, and cannot achieve the performance
of using ringbuf.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
---
v4 -> v5:
* Add file type checks in test cases for process file iterator
  and bpf_fget_task().

* Use fentry to synchronize tests instead of waiting in a loop.

* Remove path_d_path_kfunc_non_lsm test case.

* Replace task_lookup_next_fdget_rcu() with fget_task_next().

* Remove future merge conflict section in cover letter (resolved).

v3 -> v4:
* Make all kfuncs generic, not CRIB specific.

* Move bpf_fget_task to fs/bpf_fs_kfuncs.c.

* Remove bpf_iter_task_file_get_fd and bpf_get_file_ops_type.

* Use struct bpf_iter_task_file_item * as the return value of
  bpf_iter_task_file_next.

* Change fd to unsigned int type and add next_fd.

* Add KF_RCU_PROTECTED to bpf_iter_task_file_new.

* Make fs kfuncs available to SYSCALL and TRACING program types.

* Update all relevant test cases.

* Remove the discussion section from cover letter.

v2 -> v3:
* Move task_file open-coded iterator to kernel/bpf/helpers.c.

* Fix duplicate error code 7 in test_bpf_iter_task_file().

* Add comment for case when bpf_iter_task_file_get_fd() returns -1.

* Add future plans in commit message of "Add struct file related
  CRIB kfuncs".

* Add Discussion section to cover letter.

v1 -> v2:
* Fix a type definition error in the fd parameter of
  bpf_fget_task() at crib_common.h.

Juntong Deng (5):
  bpf: Introduce task_file open-coded iterator kfuncs
  selftests/bpf: Add tests for open-coded style process file iterator
  bpf: Add bpf_fget_task() kfunc
  bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  selftests/bpf: Add tests for bpf_fget_task() kfunc

 fs/bpf_fs_kfuncs.c                            |  42 ++++---
 kernel/bpf/helpers.c                          |   3 +
 kernel/bpf/task_iter.c                        |  92 ++++++++++++++
 .../testing/selftests/bpf/bpf_experimental.h  |  15 +++
 .../selftests/bpf/prog_tests/fs_kfuncs.c      |  46 +++++++
 .../testing/selftests/bpf/prog_tests/iters.c  |  79 ++++++++++++
 .../selftests/bpf/progs/fs_kfuncs_failure.c   |  33 +++++
 .../selftests/bpf/progs/iters_task_file.c     |  88 ++++++++++++++
 .../bpf/progs/iters_task_file_failure.c       | 114 ++++++++++++++++++
 .../selftests/bpf/progs/test_fget_task.c      |  63 ++++++++++
 .../selftests/bpf/progs/verifier_vfs_reject.c |  10 --
 11 files changed, 559 insertions(+), 26 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_fget_task.c

-- 
2.39.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v5 1/5] bpf: Introduce task_file open-coded iterator kfuncs
  2024-12-10 14:01 [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
@ 2024-12-10 14:03 ` Juntong Deng
  2024-12-10 14:03 ` [PATCH bpf-next v5 2/5] selftests/bpf: Add tests for open-coded style process file iterator Juntong Deng
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Juntong Deng @ 2024-12-10 14:03 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
	brauner
  Cc: bpf, linux-kernel, linux-fsdevel

This patch adds the open-coded iterator style process file iterator
kfuncs bpf_iter_task_file_{new,next,destroy} that iterates over all
files opened by the specified process.

bpf_iter_task_file_next returns a pointer to bpf_iter_task_file_item,
which currently contains *task, *file, fd. This is an extensible
structure that enables compatibility with different versions
through CO-RE.

The reference to struct file acquired by the previous
bpf_iter_task_file_next() is released in the next
bpf_iter_task_file_next(), and the last reference is released in the
last bpf_iter_task_file_next() that returns NULL.

In the bpf_iter_task_file_destroy(), if the iterator does not iterate to
the end, then the last struct file reference is released at this time.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
---
 kernel/bpf/helpers.c   |  3 ++
 kernel/bpf/task_iter.c | 92 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 532ea74d4850..b2a73b49d1a9 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -3145,6 +3145,9 @@ BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
 BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_task_file_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
+BTF_ID_FLAGS(func, bpf_iter_task_file_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_task_file_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_dynptr_adjust)
 BTF_ID_FLAGS(func, bpf_dynptr_is_null)
 BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 98d9b4c0daff..133eacbff92a 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -1027,6 +1027,98 @@ __bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it)
 {
 }
 
+struct bpf_iter_task_file_item {
+	struct task_struct *task;
+	struct file *file;
+	unsigned int fd;
+} __aligned(8);
+
+struct bpf_iter_task_file {
+	__u64 __opaque[4];
+} __aligned(8);
+
+struct bpf_iter_task_file_kern {
+	struct bpf_iter_task_file_item item;
+	unsigned int next_fd;
+} __aligned(8);
+
+/**
+ * bpf_iter_task_file_new() - Initialize a new task file iterator for a task,
+ * used to iterate over all files opened by a specified task
+ *
+ * Note that task file iterator requires to be used within RCU CS.
+ *
+ * @it: the new bpf_iter_task_file to be created
+ * @task: a pointer pointing to a task to be iterated over
+ */
+__bpf_kfunc int bpf_iter_task_file_new(struct bpf_iter_task_file *it,
+		struct task_struct *task)
+{
+	struct bpf_iter_task_file_kern *kit = (void *)it;
+	struct bpf_iter_task_file_item *item = &kit->item;
+
+	BUILD_BUG_ON(sizeof(struct bpf_iter_task_file_kern) > sizeof(struct bpf_iter_task_file));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_task_file_kern) !=
+		     __alignof__(struct bpf_iter_task_file));
+
+	item->task = task;
+	item->file = NULL;
+	item->fd = 0;
+	kit->next_fd = 0;
+
+	return 0;
+}
+
+/**
+ * bpf_iter_task_file_next() - Get the next file in bpf_iter_task_file
+ *
+ * bpf_iter_task_file_next acquires a reference to the struct file.
+ *
+ * The reference to struct file acquired by the previous
+ * bpf_iter_task_file_next() is released in the next bpf_iter_task_file_next(),
+ * and the last reference is released in the last bpf_iter_task_file_next()
+ * that returns NULL.
+ *
+ * @it: the bpf_iter_task_file to be checked
+ *
+ * @returns a pointer to bpf_iter_task_file_item
+ */
+__bpf_kfunc struct bpf_iter_task_file_item *bpf_iter_task_file_next(struct bpf_iter_task_file *it)
+{
+	struct bpf_iter_task_file_kern *kit = (void *)it;
+	struct bpf_iter_task_file_item *item = &kit->item;
+
+	if (item->file)
+		fput(item->file);
+
+	item->file = fget_task_next(item->task, &kit->next_fd);
+	item->fd = kit->next_fd;
+
+	kit->next_fd++;
+
+	if (!item->file)
+		return NULL;
+
+	return item;
+}
+
+/**
+ * bpf_iter_task_file_destroy() - Destroy a bpf_iter_task_file
+ *
+ * If the iterator does not iterate to the end, then the last
+ * struct file reference is released at this time.
+ *
+ * @it: the bpf_iter_task_file to be destroyed
+ */
+__bpf_kfunc void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it)
+{
+	struct bpf_iter_task_file_kern *kit = (void *)it;
+	struct bpf_iter_task_file_item *item = &kit->item;
+
+	if (item->file)
+		fput(item->file);
+}
+
 __bpf_kfunc_end_defs();
 
 DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v5 2/5] selftests/bpf: Add tests for open-coded style process file iterator
  2024-12-10 14:01 [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
  2024-12-10 14:03 ` [PATCH bpf-next v5 1/5] bpf: Introduce task_file open-coded iterator kfuncs Juntong Deng
@ 2024-12-10 14:03 ` Juntong Deng
  2024-12-10 14:37   ` Christian Brauner
  2024-12-10 14:03 ` [PATCH bpf-next v5 3/5] bpf: Add bpf_fget_task() kfunc Juntong Deng
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Juntong Deng @ 2024-12-10 14:03 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
	brauner
  Cc: bpf, linux-kernel, linux-fsdevel

This patch adds test cases for open-coded style process file iterator.

Test cases related to process files are run in the newly created child
process. Close all opened files inherited from the parent process in
the child process to avoid the files opened by the parent process
affecting the test results.

In addition, this patch adds failure test cases where bpf programs
cannot pass the verifier due to uninitialized or untrusted
arguments, or not in RCU CS, etc.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
---
 .../testing/selftests/bpf/bpf_experimental.h  |   7 ++
 .../testing/selftests/bpf/prog_tests/iters.c  |  79 ++++++++++++
 .../selftests/bpf/progs/iters_task_file.c     |  88 ++++++++++++++
 .../bpf/progs/iters_task_file_failure.c       | 114 ++++++++++++++++++
 4 files changed, 288 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file_failure.c

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index cd8ecd39c3f3..ce1520c56b55 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -588,4 +588,11 @@ extern int bpf_iter_kmem_cache_new(struct bpf_iter_kmem_cache *it) __weak __ksym
 extern struct kmem_cache *bpf_iter_kmem_cache_next(struct bpf_iter_kmem_cache *it) __weak __ksym;
 extern void bpf_iter_kmem_cache_destroy(struct bpf_iter_kmem_cache *it) __weak __ksym;
 
+struct bpf_iter_task_file;
+struct bpf_iter_task_file_item;
+extern int bpf_iter_task_file_new(struct bpf_iter_task_file *it, struct task_struct *task) __ksym;
+extern struct bpf_iter_task_file_item *
+bpf_iter_task_file_next(struct bpf_iter_task_file *it) __ksym;
+extern void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it) __ksym;
+
 #endif
diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
index 3cea71f9c500..cfe5b56cc027 100644
--- a/tools/testing/selftests/bpf/prog_tests/iters.c
+++ b/tools/testing/selftests/bpf/prog_tests/iters.c
@@ -1,6 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
 
+#define _GNU_SOURCE
+#include <sys/socket.h>
 #include <sys/syscall.h>
 #include <sys/mman.h>
 #include <sys/wait.h>
@@ -16,11 +18,13 @@
 #include "iters_num.skel.h"
 #include "iters_testmod.skel.h"
 #include "iters_testmod_seq.skel.h"
+#include "iters_task_file.skel.h"
 #include "iters_task_vma.skel.h"
 #include "iters_task.skel.h"
 #include "iters_css_task.skel.h"
 #include "iters_css.skel.h"
 #include "iters_task_failure.skel.h"
+#include "iters_task_file_failure.skel.h"
 
 static void subtest_num_iters(void)
 {
@@ -291,6 +295,78 @@ static void subtest_css_iters(void)
 	iters_css__destroy(skel);
 }
 
+static int task_file_test_process(void *args)
+{
+	int pipefd[2], sockfd, err = 0;
+
+	/* Create a clean file descriptor table for the test process */
+	close_range(0, ~0U, 0);
+
+	if (pipe(pipefd) < 0)
+		return 1;
+
+	sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
+	if (sockfd < 0) {
+		err = 2;
+		goto cleanup_pipe;
+	}
+
+	usleep(1);
+
+	close(sockfd);
+cleanup_pipe:
+	close(pipefd[0]);
+	close(pipefd[1]);
+	return err;
+}
+
+static void subtest_task_file_iters(void)
+{
+	const int stack_size = 1024 * 1024;
+	struct iters_task_file *skel;
+	int child_pid, wstatus, err;
+	char *stack;
+
+	skel = iters_task_file__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		return;
+
+	if (!ASSERT_OK(skel->bss->err, "pre_test_err"))
+		goto cleanup_skel;
+
+	skel->bss->parent_pid = getpid();
+	skel->bss->count = 0;
+
+	err = iters_task_file__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto cleanup_skel;
+
+	stack = (char *)malloc(stack_size);
+	if (!ASSERT_OK_PTR(stack, "clone_stack"))
+		goto cleanup_attach;
+
+	/* Note that there is no CLONE_FILES */
+	child_pid = clone(task_file_test_process, stack + stack_size, CLONE_VM | SIGCHLD, NULL);
+	if (!ASSERT_GT(child_pid, -1, "child_pid"))
+		goto cleanup_stack;
+
+	if (!ASSERT_GT(waitpid(child_pid, &wstatus, 0), -1, "waitpid"))
+		goto cleanup_stack;
+
+	if (!ASSERT_OK(WEXITSTATUS(wstatus), "run_task_file_iters_test_err"))
+		goto cleanup_stack;
+
+	ASSERT_EQ(skel->bss->count, 1, "run_task_file_iters_test_count_err");
+	ASSERT_OK(skel->bss->err, "run_task_file_iters_test_failure");
+
+cleanup_stack:
+	free(stack);
+cleanup_attach:
+	iters_task_file__detach(skel);
+cleanup_skel:
+	iters_task_file__destroy(skel);
+}
+
 void test_iters(void)
 {
 	RUN_TESTS(iters_state_safety);
@@ -315,5 +391,8 @@ void test_iters(void)
 		subtest_css_task_iters();
 	if (test__start_subtest("css"))
 		subtest_css_iters();
+	if (test__start_subtest("task_file"))
+		subtest_task_file_iters();
 	RUN_TESTS(iters_task_failure);
+	RUN_TESTS(iters_task_file_failure);
 }
diff --git a/tools/testing/selftests/bpf/progs/iters_task_file.c b/tools/testing/selftests/bpf/progs/iters_task_file.c
new file mode 100644
index 000000000000..81bcd20041d8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task_file.c
@@ -0,0 +1,88 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+int err, parent_pid, count;
+
+extern const void pipefifo_fops __ksym;
+extern const void socket_file_ops __ksym;
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int test_bpf_iter_task_file(void *ctx)
+{
+	struct bpf_iter_task_file task_file_it;
+	struct bpf_iter_task_file_item *item;
+	struct task_struct *task;
+
+	task = bpf_get_current_task_btf();
+	if (task->parent->pid != parent_pid)
+		return 0;
+
+	count++;
+
+	bpf_rcu_read_lock();
+	bpf_iter_task_file_new(&task_file_it, task);
+
+	item = bpf_iter_task_file_next(&task_file_it);
+	if (item == NULL) {
+		err = 1;
+		goto cleanup;
+	}
+
+	if (item->fd != 0) {
+		err = 2;
+		goto cleanup;
+	}
+
+	if (item->file->f_op != &pipefifo_fops) {
+		err = 3;
+		goto cleanup;
+	}
+
+	item = bpf_iter_task_file_next(&task_file_it);
+	if (item == NULL) {
+		err = 4;
+		goto cleanup;
+	}
+
+	if (item->fd != 1) {
+		err = 5;
+		goto cleanup;
+	}
+
+	if (item->file->f_op != &pipefifo_fops) {
+		err = 6;
+		goto cleanup;
+	}
+
+	item = bpf_iter_task_file_next(&task_file_it);
+	if (item == NULL) {
+		err = 7;
+		goto cleanup;
+	}
+
+	if (item->fd != 2) {
+		err = 8;
+		goto cleanup;
+	}
+
+	if (item->file->f_op != &socket_file_ops) {
+		err = 9;
+		goto cleanup;
+	}
+
+	item = bpf_iter_task_file_next(&task_file_it);
+	if (item != NULL)
+		err = 10;
+cleanup:
+	bpf_iter_task_file_destroy(&task_file_it);
+	bpf_rcu_read_unlock();
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/iters_task_file_failure.c b/tools/testing/selftests/bpf/progs/iters_task_file_failure.c
new file mode 100644
index 000000000000..c3de9235b888
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task_file_failure.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+SEC("syscall")
+__failure __msg("expected an RCU CS when using bpf_iter_task_file")
+int bpf_iter_task_file_new_without_rcu_lock(void *ctx)
+{
+	struct bpf_iter_task_file task_file_it;
+	struct task_struct *task;
+
+	task = bpf_get_current_task_btf();
+
+	bpf_iter_task_file_new(&task_file_it, task);
+
+	bpf_iter_task_file_destroy(&task_file_it);
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("expected uninitialized iter_task_file as arg #1")
+int bpf_iter_task_file_new_inited_iter(void *ctx)
+{
+	struct bpf_iter_task_file task_file_it;
+	struct task_struct *task;
+
+	task = bpf_get_current_task_btf();
+
+	bpf_rcu_read_lock();
+	bpf_iter_task_file_new(&task_file_it, task);
+
+	bpf_iter_task_file_new(&task_file_it, task);
+
+	bpf_iter_task_file_destroy(&task_file_it);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("Possibly NULL pointer passed to trusted arg1")
+int bpf_iter_task_file_new_null_task(void *ctx)
+{
+	struct bpf_iter_task_file task_file_it;
+	struct task_struct *task = NULL;
+
+	bpf_rcu_read_lock();
+	bpf_iter_task_file_new(&task_file_it, task);
+
+	bpf_iter_task_file_destroy(&task_file_it);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("R2 must be referenced or trusted")
+int bpf_iter_task_file_new_untrusted_task(void *ctx)
+{
+	struct bpf_iter_task_file task_file_it;
+	struct task_struct *task;
+
+	task = bpf_get_current_task_btf()->parent;
+
+	bpf_rcu_read_lock();
+	bpf_iter_task_file_new(&task_file_it, task);
+
+	bpf_iter_task_file_destroy(&task_file_it);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("Unreleased reference")
+int bpf_iter_task_file_no_destory(void *ctx)
+{
+	struct bpf_iter_task_file task_file_it;
+	struct task_struct *task;
+
+	task = bpf_get_current_task_btf();
+
+	bpf_rcu_read_lock();
+	bpf_iter_task_file_new(&task_file_it, task);
+
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("expected an initialized iter_task_file as arg #1")
+int bpf_iter_task_file_next_uninit_iter(void *ctx)
+{
+	struct bpf_iter_task_file task_file_it;
+
+	bpf_iter_task_file_next(&task_file_it);
+
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("expected an initialized iter_task_file as arg #1")
+int bpf_iter_task_file_destroy_uninit_iter(void *ctx)
+{
+	struct bpf_iter_task_file task_file_it;
+
+	bpf_iter_task_file_destroy(&task_file_it);
+
+	return 0;
+}
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v5 3/5] bpf: Add bpf_fget_task() kfunc
  2024-12-10 14:01 [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
  2024-12-10 14:03 ` [PATCH bpf-next v5 1/5] bpf: Introduce task_file open-coded iterator kfuncs Juntong Deng
  2024-12-10 14:03 ` [PATCH bpf-next v5 2/5] selftests/bpf: Add tests for open-coded style process file iterator Juntong Deng
@ 2024-12-10 14:03 ` Juntong Deng
  2024-12-10 14:28   ` Christian Brauner
  2024-12-10 14:03 ` [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types Juntong Deng
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Juntong Deng @ 2024-12-10 14:03 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
	brauner
  Cc: bpf, linux-kernel, linux-fsdevel

This patch adds bpf_fget_task() kfunc.

bpf_fget_task() is used to get a pointer to the struct file
corresponding to the task file descriptor. Note that this function
acquires a reference to struct file.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
---
 fs/bpf_fs_kfuncs.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 3fe9f59ef867..19a9d45c47f9 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -152,6 +152,26 @@ __bpf_kfunc int bpf_get_file_xattr(struct file *file, const char *name__str,
 	return bpf_get_dentry_xattr(dentry, name__str, value_p);
 }
 
+/**
+ * bpf_fget_task() - Get a pointer to the struct file corresponding to
+ * the task file descriptor
+ *
+ * Note that this function acquires a reference to struct file.
+ *
+ * @task: the specified struct task_struct
+ * @fd: the file descriptor
+ *
+ * @returns the corresponding struct file pointer if found,
+ * otherwise returns NULL
+ */
+__bpf_kfunc struct file *bpf_fget_task(struct task_struct *task, unsigned int fd)
+{
+	struct file *file;
+
+	file = fget_task(task, fd);
+	return file;
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(bpf_fs_kfunc_set_ids)
@@ -161,6 +181,7 @@ BTF_ID_FLAGS(func, bpf_put_file, KF_RELEASE)
 BTF_ID_FLAGS(func, bpf_path_d_path, KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_get_dentry_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_get_file_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_fget_task, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
 BTF_KFUNCS_END(bpf_fs_kfunc_set_ids)
 
 static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  2024-12-10 14:01 [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
                   ` (2 preceding siblings ...)
  2024-12-10 14:03 ` [PATCH bpf-next v5 3/5] bpf: Add bpf_fget_task() kfunc Juntong Deng
@ 2024-12-10 14:03 ` Juntong Deng
  2024-12-10 14:43   ` Christian Brauner
  2024-12-10 14:03 ` [PATCH bpf-next v5 5/5] selftests/bpf: Add tests for bpf_fget_task() kfunc Juntong Deng
  2024-12-10 14:44 ` [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and " Christian Brauner
  5 siblings, 1 reply; 25+ messages in thread
From: Juntong Deng @ 2024-12-10 14:03 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
	brauner
  Cc: bpf, linux-kernel, linux-fsdevel

Currently fs kfuncs are only available for LSM program type, but fs
kfuncs are generic and useful for scenarios other than LSM.

This patch makes fs kfuncs available for SYSCALL and TRACING
program types.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
---
 fs/bpf_fs_kfuncs.c                            | 21 +++++--------------
 .../selftests/bpf/progs/verifier_vfs_reject.c | 10 ---------
 2 files changed, 5 insertions(+), 26 deletions(-)

diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 19a9d45c47f9..609d6b2af1db 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -26,8 +26,6 @@ __bpf_kfunc_start_defs();
  * acquired by this BPF kfunc will result in the BPF program being rejected by
  * the BPF verifier.
  *
- * This BPF kfunc may only be called from BPF LSM programs.
- *
  * Internally, this BPF kfunc leans on get_task_exe_file(), such that calling
  * bpf_get_task_exe_file() would be analogous to calling get_task_exe_file()
  * directly in kernel context.
@@ -49,8 +47,6 @@ __bpf_kfunc struct file *bpf_get_task_exe_file(struct task_struct *task)
  * passed to this BPF kfunc. Attempting to pass an unreferenced file pointer, or
  * any other arbitrary pointer for that matter, will result in the BPF program
  * being rejected by the BPF verifier.
- *
- * This BPF kfunc may only be called from BPF LSM programs.
  */
 __bpf_kfunc void bpf_put_file(struct file *file)
 {
@@ -70,8 +66,6 @@ __bpf_kfunc void bpf_put_file(struct file *file)
  * reference, or else the BPF program will be outright rejected by the BPF
  * verifier.
  *
- * This BPF kfunc may only be called from BPF LSM programs.
- *
  * Return: A positive integer corresponding to the length of the resolved
  * pathname in *buf*, including the NUL termination character. On error, a
  * negative integer is returned.
@@ -184,23 +178,18 @@ BTF_ID_FLAGS(func, bpf_get_file_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_fget_task, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
 BTF_KFUNCS_END(bpf_fs_kfunc_set_ids)
 
-static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
-{
-	if (!btf_id_set8_contains(&bpf_fs_kfunc_set_ids, kfunc_id) ||
-	    prog->type == BPF_PROG_TYPE_LSM)
-		return 0;
-	return -EACCES;
-}
-
 static const struct btf_kfunc_id_set bpf_fs_kfunc_set = {
 	.owner = THIS_MODULE,
 	.set = &bpf_fs_kfunc_set_ids,
-	.filter = bpf_fs_kfuncs_filter,
 };
 
 static int __init bpf_fs_kfuncs_init(void)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM, &bpf_fs_kfunc_set);
+	int ret;
+
+	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM, &bpf_fs_kfunc_set);
+	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_fs_kfunc_set);
+	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_fs_kfunc_set);
 }
 
 late_initcall(bpf_fs_kfuncs_init);
diff --git a/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c b/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
index d6d3f4fcb24c..5aab75fd2fa5 100644
--- a/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
+++ b/tools/testing/selftests/bpf/progs/verifier_vfs_reject.c
@@ -148,14 +148,4 @@ int BPF_PROG(path_d_path_kfunc_invalid_buf_sz, struct file *file)
 	return 0;
 }
 
-SEC("fentry/vfs_open")
-__failure __msg("calling kernel function bpf_path_d_path is not allowed")
-int BPF_PROG(path_d_path_kfunc_non_lsm, struct path *path, struct file *f)
-{
-	/* Calling bpf_path_d_path() from a non-LSM BPF program isn't permitted.
-	 */
-	bpf_path_d_path(path, buf, sizeof(buf));
-	return 0;
-}
-
 char _license[] SEC("license") = "GPL";
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v5 5/5] selftests/bpf: Add tests for bpf_fget_task() kfunc
  2024-12-10 14:01 [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
                   ` (3 preceding siblings ...)
  2024-12-10 14:03 ` [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types Juntong Deng
@ 2024-12-10 14:03 ` Juntong Deng
  2024-12-10 14:44 ` [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and " Christian Brauner
  5 siblings, 0 replies; 25+ messages in thread
From: Juntong Deng @ 2024-12-10 14:03 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht,
	brauner
  Cc: bpf, linux-kernel, linux-fsdevel

This patch adds test cases for bpf_fget_task() kfunc.

test_bpf_fget_task is used to test obtaining struct file based on
the file descriptor in the current process.

bpf_fget_task_null_task and bpf_fget_task_untrusted_task are used to
test the failure cases of passing NULL or untrusted pointer as argument.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
---
 .../testing/selftests/bpf/bpf_experimental.h  |  8 +++
 .../selftests/bpf/prog_tests/fs_kfuncs.c      | 46 ++++++++++++++
 .../selftests/bpf/progs/fs_kfuncs_failure.c   | 33 ++++++++++
 .../selftests/bpf/progs/test_fget_task.c      | 63 +++++++++++++++++++
 4 files changed, 150 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_fget_task.c

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index ce1520c56b55..e0c9e7d9ba0a 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -221,6 +221,14 @@ extern void bpf_put_file(struct file *file) __ksym;
  */
 extern int bpf_path_d_path(struct path *path, char *buf, size_t buf__sz) __ksym;
 
+/* Description
+ *	Get a pointer to the struct file corresponding to the task file descriptor
+ *	Note that this function acquires a reference to struct file.
+ * Returns
+ *	The corresponding struct file pointer if found, otherwise returns NULL
+ */
+extern struct file *bpf_fget_task(struct task_struct *task, unsigned int fd) __ksym;
+
 /* This macro must be used to mark the exception callback corresponding to the
  * main program. For example:
  *
diff --git a/tools/testing/selftests/bpf/prog_tests/fs_kfuncs.c b/tools/testing/selftests/bpf/prog_tests/fs_kfuncs.c
index 5a0b51157451..89f5e09672b3 100644
--- a/tools/testing/selftests/bpf/prog_tests/fs_kfuncs.c
+++ b/tools/testing/selftests/bpf/prog_tests/fs_kfuncs.c
@@ -9,6 +9,8 @@
 #include <test_progs.h>
 #include "test_get_xattr.skel.h"
 #include "test_fsverity.skel.h"
+#include "test_fget_task.skel.h"
+#include "fs_kfuncs_failure.skel.h"
 
 static const char testfile[] = "/tmp/test_progs_fs_kfuncs";
 
@@ -139,6 +141,45 @@ static void test_fsverity(void)
 	remove(testfile);
 }
 
+static void test_fget_task(void)
+{
+	int pipefd[2], prog_fd, err;
+	struct test_fget_task *skel;
+	struct bpf_program *prog;
+
+	skel = test_fget_task__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		return;
+
+	if (!ASSERT_OK(skel->bss->err, "pre_test_err"))
+		goto cleanup_skel;
+
+	prog = bpf_object__find_program_by_name(skel->obj, "test_bpf_fget_task");
+	if (!ASSERT_OK_PTR(prog, "find_program_by_name"))
+		goto cleanup_skel;
+
+	prog_fd = bpf_program__fd(prog);
+	if (!ASSERT_GT(prog_fd, -1, "bpf_program__fd"))
+		goto cleanup_skel;
+
+	if (pipe(pipefd) < 0)
+		goto cleanup_skel;
+
+	skel->bss->test_fd1 = pipefd[0];
+	skel->bss->test_fd2 = pipefd[1];
+
+	err = bpf_prog_test_run_opts(prog_fd, NULL);
+	if (!ASSERT_OK(err, "prog_test_run"))
+		goto cleanup_pipe;
+
+	ASSERT_OK(skel->bss->err, "run_bpf_fget_task_test_failure");
+cleanup_pipe:
+	close(pipefd[0]);
+	close(pipefd[1]);
+cleanup_skel:
+	test_fget_task__destroy(skel);
+}
+
 void test_fs_kfuncs(void)
 {
 	if (test__start_subtest("xattr"))
@@ -146,4 +187,9 @@ void test_fs_kfuncs(void)
 
 	if (test__start_subtest("fsverity"))
 		test_fsverity();
+
+	if (test__start_subtest("fget_task"))
+		test_fget_task();
+
+	RUN_TESTS(fs_kfuncs_failure);
 }
diff --git a/tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c b/tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c
new file mode 100644
index 000000000000..57aa6d2787ac
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+char _license[] SEC("license") = "GPL";
+
+SEC("syscall")
+__failure __msg("Possibly NULL pointer passed to trusted arg0")
+int bpf_fget_task_null_task(void *ctx)
+{
+	struct task_struct *task = NULL;
+
+	bpf_fget_task(task, 1);
+
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("R1 must be referenced or trusted")
+int bpf_fget_task_untrusted_task(void *ctx)
+{
+	struct task_struct *task;
+
+	task = bpf_get_current_task_btf()->parent;
+
+	bpf_fget_task(task, 1);
+
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/test_fget_task.c b/tools/testing/selftests/bpf/progs/test_fget_task.c
new file mode 100644
index 000000000000..fee5d5e1244a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_fget_task.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+int err, test_fd1, test_fd2;
+
+extern const void pipefifo_fops __ksym;
+
+SEC("syscall")
+int test_bpf_fget_task(void *ctx)
+{
+	struct task_struct *task;
+	struct file *file;
+
+	task = bpf_get_current_task_btf();
+	if (task == NULL) {
+		err = 1;
+		return 0;
+	}
+
+	file = bpf_fget_task(task, test_fd1);
+	if (file == NULL) {
+		err = 2;
+		return 0;
+	}
+
+	if (file->f_op != &pipefifo_fops) {
+		err = 3;
+		bpf_put_file(file);
+		return 0;
+	}
+
+	bpf_put_file(file);
+
+	file = bpf_fget_task(task, test_fd2);
+	if (file == NULL) {
+		err = 4;
+		return 0;
+	}
+
+	if (file->f_op != &pipefifo_fops) {
+		err = 5;
+		bpf_put_file(file);
+		return 0;
+	}
+
+	bpf_put_file(file);
+
+	file = bpf_fget_task(task, 9999);
+	if (file != NULL) {
+		err = 6;
+		bpf_put_file(file);
+	}
+
+	return 0;
+}
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 3/5] bpf: Add bpf_fget_task() kfunc
  2024-12-10 14:03 ` [PATCH bpf-next v5 3/5] bpf: Add bpf_fget_task() kfunc Juntong Deng
@ 2024-12-10 14:28   ` Christian Brauner
  2024-12-10 16:21     ` Juntong Deng
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Brauner @ 2024-12-10 14:28 UTC (permalink / raw)
  To: Juntong Deng
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht, bpf,
	linux-kernel, linux-fsdevel

On Tue, Dec 10, 2024 at 02:03:52PM +0000, Juntong Deng wrote:
> This patch adds bpf_fget_task() kfunc.
> 
> bpf_fget_task() is used to get a pointer to the struct file
> corresponding to the task file descriptor. Note that this function
> acquires a reference to struct file.
> 
> Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
> ---
>  fs/bpf_fs_kfuncs.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
> index 3fe9f59ef867..19a9d45c47f9 100644
> --- a/fs/bpf_fs_kfuncs.c
> +++ b/fs/bpf_fs_kfuncs.c
> @@ -152,6 +152,26 @@ __bpf_kfunc int bpf_get_file_xattr(struct file *file, const char *name__str,
>  	return bpf_get_dentry_xattr(dentry, name__str, value_p);
>  }
>  
> +/**
> + * bpf_fget_task() - Get a pointer to the struct file corresponding to
> + * the task file descriptor
> + *
> + * Note that this function acquires a reference to struct file.
> + *
> + * @task: the specified struct task_struct
> + * @fd: the file descriptor
> + *
> + * @returns the corresponding struct file pointer if found,
> + * otherwise returns NULL
> + */
> +__bpf_kfunc struct file *bpf_fget_task(struct task_struct *task, unsigned int fd)
> +{
> +	struct file *file;
> +
> +	file = fget_task(task, fd);
> +	return file;

Why the local variable?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 2/5] selftests/bpf: Add tests for open-coded style process file iterator
  2024-12-10 14:03 ` [PATCH bpf-next v5 2/5] selftests/bpf: Add tests for open-coded style process file iterator Juntong Deng
@ 2024-12-10 14:37   ` Christian Brauner
  2024-12-10 16:23     ` Juntong Deng
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Brauner @ 2024-12-10 14:37 UTC (permalink / raw)
  To: Juntong Deng
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht, bpf,
	linux-kernel, linux-fsdevel

On Tue, Dec 10, 2024 at 02:03:51PM +0000, Juntong Deng wrote:
> This patch adds test cases for open-coded style process file iterator.
> 
> Test cases related to process files are run in the newly created child
> process. Close all opened files inherited from the parent process in
> the child process to avoid the files opened by the parent process
> affecting the test results.
> 
> In addition, this patch adds failure test cases where bpf programs
> cannot pass the verifier due to uninitialized or untrusted
> arguments, or not in RCU CS, etc.
> 
> Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
> ---
>  .../testing/selftests/bpf/bpf_experimental.h  |   7 ++
>  .../testing/selftests/bpf/prog_tests/iters.c  |  79 ++++++++++++
>  .../selftests/bpf/progs/iters_task_file.c     |  88 ++++++++++++++
>  .../bpf/progs/iters_task_file_failure.c       | 114 ++++++++++++++++++
>  4 files changed, 288 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c
>  create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file_failure.c
> 
> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> index cd8ecd39c3f3..ce1520c56b55 100644
> --- a/tools/testing/selftests/bpf/bpf_experimental.h
> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> @@ -588,4 +588,11 @@ extern int bpf_iter_kmem_cache_new(struct bpf_iter_kmem_cache *it) __weak __ksym
>  extern struct kmem_cache *bpf_iter_kmem_cache_next(struct bpf_iter_kmem_cache *it) __weak __ksym;
>  extern void bpf_iter_kmem_cache_destroy(struct bpf_iter_kmem_cache *it) __weak __ksym;
>  
> +struct bpf_iter_task_file;
> +struct bpf_iter_task_file_item;
> +extern int bpf_iter_task_file_new(struct bpf_iter_task_file *it, struct task_struct *task) __ksym;
> +extern struct bpf_iter_task_file_item *
> +bpf_iter_task_file_next(struct bpf_iter_task_file *it) __ksym;
> +extern void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it) __ksym;
> +
>  #endif
> diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
> index 3cea71f9c500..cfe5b56cc027 100644
> --- a/tools/testing/selftests/bpf/prog_tests/iters.c
> +++ b/tools/testing/selftests/bpf/prog_tests/iters.c
> @@ -1,6 +1,8 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
>  
> +#define _GNU_SOURCE
> +#include <sys/socket.h>
>  #include <sys/syscall.h>
>  #include <sys/mman.h>
>  #include <sys/wait.h>
> @@ -16,11 +18,13 @@
>  #include "iters_num.skel.h"
>  #include "iters_testmod.skel.h"
>  #include "iters_testmod_seq.skel.h"
> +#include "iters_task_file.skel.h"
>  #include "iters_task_vma.skel.h"
>  #include "iters_task.skel.h"
>  #include "iters_css_task.skel.h"
>  #include "iters_css.skel.h"
>  #include "iters_task_failure.skel.h"
> +#include "iters_task_file_failure.skel.h"
>  
>  static void subtest_num_iters(void)
>  {
> @@ -291,6 +295,78 @@ static void subtest_css_iters(void)
>  	iters_css__destroy(skel);
>  }
>  
> +static int task_file_test_process(void *args)
> +{
> +	int pipefd[2], sockfd, err = 0;
> +
> +	/* Create a clean file descriptor table for the test process */
> +	close_range(0, ~0U, 0);
> +
> +	if (pipe(pipefd) < 0)
> +		return 1;
> +
> +	sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
> +	if (sockfd < 0) {
> +		err = 2;
> +		goto cleanup_pipe;
> +	}
> +
> +	usleep(1);
> +
> +	close(sockfd);
> +cleanup_pipe:
> +	close(pipefd[0]);
> +	close(pipefd[1]);
> +	return err;
> +}
> +
> +static void subtest_task_file_iters(void)
> +{
> +	const int stack_size = 1024 * 1024;
> +	struct iters_task_file *skel;
> +	int child_pid, wstatus, err;
> +	char *stack;
> +
> +	skel = iters_task_file__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "open_and_load"))
> +		return;
> +
> +	if (!ASSERT_OK(skel->bss->err, "pre_test_err"))
> +		goto cleanup_skel;
> +
> +	skel->bss->parent_pid = getpid();
> +	skel->bss->count = 0;
> +
> +	err = iters_task_file__attach(skel);
> +	if (!ASSERT_OK(err, "skel_attach"))
> +		goto cleanup_skel;
> +
> +	stack = (char *)malloc(stack_size);
> +	if (!ASSERT_OK_PTR(stack, "clone_stack"))
> +		goto cleanup_attach;
> +
> +	/* Note that there is no CLONE_FILES */
> +	child_pid = clone(task_file_test_process, stack + stack_size, CLONE_VM | SIGCHLD, NULL);
> +	if (!ASSERT_GT(child_pid, -1, "child_pid"))
> +		goto cleanup_stack;
> +
> +	if (!ASSERT_GT(waitpid(child_pid, &wstatus, 0), -1, "waitpid"))
> +		goto cleanup_stack;
> +
> +	if (!ASSERT_OK(WEXITSTATUS(wstatus), "run_task_file_iters_test_err"))
> +		goto cleanup_stack;
> +
> +	ASSERT_EQ(skel->bss->count, 1, "run_task_file_iters_test_count_err");
> +	ASSERT_OK(skel->bss->err, "run_task_file_iters_test_failure");
> +
> +cleanup_stack:
> +	free(stack);
> +cleanup_attach:
> +	iters_task_file__detach(skel);
> +cleanup_skel:
> +	iters_task_file__destroy(skel);
> +}
> +
>  void test_iters(void)
>  {
>  	RUN_TESTS(iters_state_safety);
> @@ -315,5 +391,8 @@ void test_iters(void)
>  		subtest_css_task_iters();
>  	if (test__start_subtest("css"))
>  		subtest_css_iters();
> +	if (test__start_subtest("task_file"))
> +		subtest_task_file_iters();
>  	RUN_TESTS(iters_task_failure);
> +	RUN_TESTS(iters_task_file_failure);
>  }
> diff --git a/tools/testing/selftests/bpf/progs/iters_task_file.c b/tools/testing/selftests/bpf/progs/iters_task_file.c
> new file mode 100644
> index 000000000000..81bcd20041d8
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/iters_task_file.c
> @@ -0,0 +1,88 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include "vmlinux.h"
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_helpers.h>
> +#include "bpf_misc.h"
> +#include "bpf_experimental.h"
> +#include "task_kfunc_common.h"
> +
> +char _license[] SEC("license") = "GPL";
> +
> +int err, parent_pid, count;
> +
> +extern const void pipefifo_fops __ksym;
> +extern const void socket_file_ops __ksym;
> +
> +SEC("fentry/" SYS_PREFIX "sys_nanosleep")
> +int test_bpf_iter_task_file(void *ctx)
> +{
> +	struct bpf_iter_task_file task_file_it;
> +	struct bpf_iter_task_file_item *item;
> +	struct task_struct *task;
> +
> +	task = bpf_get_current_task_btf();
> +	if (task->parent->pid != parent_pid)
> +		return 0;
> +
> +	count++;
> +
> +	bpf_rcu_read_lock();

What does the RCU read lock do here exactly?

> +	bpf_iter_task_file_new(&task_file_it, task);
> +
> +	item = bpf_iter_task_file_next(&task_file_it);
> +	if (item == NULL) {
> +		err = 1;
> +		goto cleanup;
> +	}
> +
> +	if (item->fd != 0) {
> +		err = 2;
> +		goto cleanup;
> +	}
> +
> +	if (item->file->f_op != &pipefifo_fops) {
> +		err = 3;
> +		goto cleanup;
> +	}
> +
> +	item = bpf_iter_task_file_next(&task_file_it);
> +	if (item == NULL) {
> +		err = 4;
> +		goto cleanup;
> +	}
> +
> +	if (item->fd != 1) {
> +		err = 5;
> +		goto cleanup;
> +	}
> +
> +	if (item->file->f_op != &pipefifo_fops) {
> +		err = 6;
> +		goto cleanup;
> +	}
> +
> +	item = bpf_iter_task_file_next(&task_file_it);
> +	if (item == NULL) {
> +		err = 7;
> +		goto cleanup;
> +	}
> +
> +	if (item->fd != 2) {
> +		err = 8;
> +		goto cleanup;
> +	}
> +
> +	if (item->file->f_op != &socket_file_ops) {
> +		err = 9;
> +		goto cleanup;
> +	}
> +
> +	item = bpf_iter_task_file_next(&task_file_it);
> +	if (item != NULL)
> +		err = 10;
> +cleanup:
> +	bpf_iter_task_file_destroy(&task_file_it);
> +	bpf_rcu_read_unlock();
> +	return 0;
> +}
> diff --git a/tools/testing/selftests/bpf/progs/iters_task_file_failure.c b/tools/testing/selftests/bpf/progs/iters_task_file_failure.c
> new file mode 100644
> index 000000000000..c3de9235b888
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/iters_task_file_failure.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include "vmlinux.h"
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_helpers.h>
> +#include "bpf_misc.h"
> +#include "bpf_experimental.h"
> +#include "task_kfunc_common.h"
> +
> +char _license[] SEC("license") = "GPL";
> +
> +SEC("syscall")
> +__failure __msg("expected an RCU CS when using bpf_iter_task_file")
> +int bpf_iter_task_file_new_without_rcu_lock(void *ctx)
> +{
> +	struct bpf_iter_task_file task_file_it;
> +	struct task_struct *task;
> +
> +	task = bpf_get_current_task_btf();
> +
> +	bpf_iter_task_file_new(&task_file_it, task);
> +
> +	bpf_iter_task_file_destroy(&task_file_it);
> +	return 0;
> +}
> +
> +SEC("syscall")
> +__failure __msg("expected uninitialized iter_task_file as arg #1")
> +int bpf_iter_task_file_new_inited_iter(void *ctx)
> +{
> +	struct bpf_iter_task_file task_file_it;
> +	struct task_struct *task;
> +
> +	task = bpf_get_current_task_btf();
> +
> +	bpf_rcu_read_lock();
> +	bpf_iter_task_file_new(&task_file_it, task);
> +
> +	bpf_iter_task_file_new(&task_file_it, task);
> +
> +	bpf_iter_task_file_destroy(&task_file_it);
> +	bpf_rcu_read_unlock();
> +	return 0;
> +}
> +
> +SEC("syscall")
> +__failure __msg("Possibly NULL pointer passed to trusted arg1")
> +int bpf_iter_task_file_new_null_task(void *ctx)
> +{
> +	struct bpf_iter_task_file task_file_it;
> +	struct task_struct *task = NULL;
> +
> +	bpf_rcu_read_lock();
> +	bpf_iter_task_file_new(&task_file_it, task);
> +
> +	bpf_iter_task_file_destroy(&task_file_it);
> +	bpf_rcu_read_unlock();
> +	return 0;
> +}
> +
> +SEC("syscall")
> +__failure __msg("R2 must be referenced or trusted")
> +int bpf_iter_task_file_new_untrusted_task(void *ctx)
> +{
> +	struct bpf_iter_task_file task_file_it;
> +	struct task_struct *task;
> +
> +	task = bpf_get_current_task_btf()->parent;
> +
> +	bpf_rcu_read_lock();
> +	bpf_iter_task_file_new(&task_file_it, task);
> +
> +	bpf_iter_task_file_destroy(&task_file_it);
> +	bpf_rcu_read_unlock();
> +	return 0;
> +}
> +
> +SEC("syscall")
> +__failure __msg("Unreleased reference")
> +int bpf_iter_task_file_no_destory(void *ctx)
> +{
> +	struct bpf_iter_task_file task_file_it;
> +	struct task_struct *task;
> +
> +	task = bpf_get_current_task_btf();
> +
> +	bpf_rcu_read_lock();
> +	bpf_iter_task_file_new(&task_file_it, task);
> +
> +	bpf_rcu_read_unlock();
> +	return 0;
> +}
> +
> +SEC("syscall")
> +__failure __msg("expected an initialized iter_task_file as arg #1")
> +int bpf_iter_task_file_next_uninit_iter(void *ctx)
> +{
> +	struct bpf_iter_task_file task_file_it;
> +
> +	bpf_iter_task_file_next(&task_file_it);
> +
> +	return 0;
> +}
> +
> +SEC("syscall")
> +__failure __msg("expected an initialized iter_task_file as arg #1")
> +int bpf_iter_task_file_destroy_uninit_iter(void *ctx)
> +{
> +	struct bpf_iter_task_file task_file_it;
> +
> +	bpf_iter_task_file_destroy(&task_file_it);
> +
> +	return 0;
> +}
> -- 
> 2.39.5
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  2024-12-10 14:03 ` [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types Juntong Deng
@ 2024-12-10 14:43   ` Christian Brauner
  2024-12-10 18:58     ` Alexei Starovoitov
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Brauner @ 2024-12-10 14:43 UTC (permalink / raw)
  To: Juntong Deng
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht, bpf,
	linux-kernel, linux-fsdevel

On Tue, Dec 10, 2024 at 02:03:53PM +0000, Juntong Deng wrote:
> Currently fs kfuncs are only available for LSM program type, but fs
> kfuncs are generic and useful for scenarios other than LSM.
> 
> This patch makes fs kfuncs available for SYSCALL and TRACING
> program types.

I would like a detailed explanation from the maintainers what it means
to make this available to SYSCALL program types, please.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc
  2024-12-10 14:01 [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
                   ` (4 preceding siblings ...)
  2024-12-10 14:03 ` [PATCH bpf-next v5 5/5] selftests/bpf: Add tests for bpf_fget_task() kfunc Juntong Deng
@ 2024-12-10 14:44 ` Christian Brauner
  2024-12-10 16:25   ` Juntong Deng
  5 siblings, 1 reply; 25+ messages in thread
From: Christian Brauner @ 2024-12-10 14:44 UTC (permalink / raw)
  To: Juntong Deng
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht, bpf,
	linux-kernel, linux-fsdevel

On Tue, Dec 10, 2024 at 02:01:53PM +0000, Juntong Deng wrote:
> This patch series adds open-coded style process file iterator
> bpf_iter_task_file and bpf_fget_task() kfunc, and corresponding
> selftests test cases.
> 
> In addition, since fs kfuncs is generic and useful for scenarios
> other than LSM, this patch makes fs kfuncs available for SYSCALL
> and TRACING program types [0].
> 
> [0]: https://lore.kernel.org/bpf/CAPhsuW6ud21v2xz8iSXf=CiDL+R_zpQ+p8isSTMTw=EiJQtRSw@mail.gmail.com/
> 
> Although iter/task_file already exists, for CRIB we still need the

What is CRIB?

> open-coded iterator style process file iterator, and the same is true
> for other bpf iterators such as iter/tcp, iter/udp, etc.
> 
> The traditional bpf iterator is more like a bpf version of procfs, but
> similar to procfs, it is not suitable for CRIB scenarios that need to
> obtain large amounts of complex, multi-level in-kernel information.
> 
> The following is from previous discussions [2].
> 
> [2]: https://lore.kernel.org/bpf/AM6PR03MB5848CA34B5B68C90F210285E99B12@AM6PR03MB5848.eurprd03.prod.outlook.com/
> 
> This is because the context of bpf iterators is fixed and bpf iterators
> cannot be nested. This means that a bpf iterator program can only
> complete a specific small iterative dump task, and cannot dump
> multi-level data.
> 
> An example, when we need to dump all the sockets of a process, we need
> to iterate over all the files (sockets) of the process, and iterate over
> the all packets in the queue of each socket, and iterate over all data
> in each packet.
> 
> If we use bpf iterator, since the iterator can not be nested, we need to
> use socket iterator program to get all the basic information of all
> sockets (pass pid as filter), and then use packet iterator program to
> get the basic information of all packets of a specific socket (pass pid,
> fd as filter), and then use packet data iterator program to get all the
> data of a specific packet (pass pid, fd, packet index as filter).
> 
> This would be complicated and require a lot of (each iteration)
> bpf program startup and exit (leading to poor performance).
> 
> By comparison, open coded iterator is much more flexible, we can iterate
> in any context, at any time, and iteration can be nested, so we can
> achieve more flexible and more elegant dumping through open coded
> iterators.
> 
> With open coded iterators, all of the above can be done in a single
> bpf program, and with nested iterators, everything becomes compact
> and simple.
> 
> Also, bpf iterators transmit data to user space through seq_file,
> which involves a lot of open (bpf_iter_create), read, close syscalls,
> context switching, memory copying, and cannot achieve the performance
> of using ringbuf.
> 
> Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
> ---
> v4 -> v5:
> * Add file type checks in test cases for process file iterator
>   and bpf_fget_task().
> 
> * Use fentry to synchronize tests instead of waiting in a loop.
> 
> * Remove path_d_path_kfunc_non_lsm test case.
> 
> * Replace task_lookup_next_fdget_rcu() with fget_task_next().
> 
> * Remove future merge conflict section in cover letter (resolved).
> 
> v3 -> v4:
> * Make all kfuncs generic, not CRIB specific.
> 
> * Move bpf_fget_task to fs/bpf_fs_kfuncs.c.
> 
> * Remove bpf_iter_task_file_get_fd and bpf_get_file_ops_type.
> 
> * Use struct bpf_iter_task_file_item * as the return value of
>   bpf_iter_task_file_next.
> 
> * Change fd to unsigned int type and add next_fd.
> 
> * Add KF_RCU_PROTECTED to bpf_iter_task_file_new.
> 
> * Make fs kfuncs available to SYSCALL and TRACING program types.
> 
> * Update all relevant test cases.
> 
> * Remove the discussion section from cover letter.
> 
> v2 -> v3:
> * Move task_file open-coded iterator to kernel/bpf/helpers.c.
> 
> * Fix duplicate error code 7 in test_bpf_iter_task_file().
> 
> * Add comment for case when bpf_iter_task_file_get_fd() returns -1.
> 
> * Add future plans in commit message of "Add struct file related
>   CRIB kfuncs".
> 
> * Add Discussion section to cover letter.
> 
> v1 -> v2:
> * Fix a type definition error in the fd parameter of
>   bpf_fget_task() at crib_common.h.
> 
> Juntong Deng (5):
>   bpf: Introduce task_file open-coded iterator kfuncs
>   selftests/bpf: Add tests for open-coded style process file iterator
>   bpf: Add bpf_fget_task() kfunc
>   bpf: Make fs kfuncs available for SYSCALL and TRACING program types
>   selftests/bpf: Add tests for bpf_fget_task() kfunc
> 
>  fs/bpf_fs_kfuncs.c                            |  42 ++++---
>  kernel/bpf/helpers.c                          |   3 +
>  kernel/bpf/task_iter.c                        |  92 ++++++++++++++
>  .../testing/selftests/bpf/bpf_experimental.h  |  15 +++
>  .../selftests/bpf/prog_tests/fs_kfuncs.c      |  46 +++++++
>  .../testing/selftests/bpf/prog_tests/iters.c  |  79 ++++++++++++
>  .../selftests/bpf/progs/fs_kfuncs_failure.c   |  33 +++++
>  .../selftests/bpf/progs/iters_task_file.c     |  88 ++++++++++++++
>  .../bpf/progs/iters_task_file_failure.c       | 114 ++++++++++++++++++
>  .../selftests/bpf/progs/test_fget_task.c      |  63 ++++++++++
>  .../selftests/bpf/progs/verifier_vfs_reject.c |  10 --
>  11 files changed, 559 insertions(+), 26 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c
>  create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c
>  create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file_failure.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_fget_task.c
> 
> -- 
> 2.39.5
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 3/5] bpf: Add bpf_fget_task() kfunc
  2024-12-10 14:28   ` Christian Brauner
@ 2024-12-10 16:21     ` Juntong Deng
  0 siblings, 0 replies; 25+ messages in thread
From: Juntong Deng @ 2024-12-10 16:21 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht, bpf,
	linux-kernel, linux-fsdevel

On 2024/12/10 14:28, Christian Brauner wrote:
> On Tue, Dec 10, 2024 at 02:03:52PM +0000, Juntong Deng wrote:
>> This patch adds bpf_fget_task() kfunc.
>>
>> bpf_fget_task() is used to get a pointer to the struct file
>> corresponding to the task file descriptor. Note that this function
>> acquires a reference to struct file.
>>
>> Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
>> ---
>>   fs/bpf_fs_kfuncs.c | 21 +++++++++++++++++++++
>>   1 file changed, 21 insertions(+)
>>
>> diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
>> index 3fe9f59ef867..19a9d45c47f9 100644
>> --- a/fs/bpf_fs_kfuncs.c
>> +++ b/fs/bpf_fs_kfuncs.c
>> @@ -152,6 +152,26 @@ __bpf_kfunc int bpf_get_file_xattr(struct file *file, const char *name__str,
>>   	return bpf_get_dentry_xattr(dentry, name__str, value_p);
>>   }
>>   
>> +/**
>> + * bpf_fget_task() - Get a pointer to the struct file corresponding to
>> + * the task file descriptor
>> + *
>> + * Note that this function acquires a reference to struct file.
>> + *
>> + * @task: the specified struct task_struct
>> + * @fd: the file descriptor
>> + *
>> + * @returns the corresponding struct file pointer if found,
>> + * otherwise returns NULL
>> + */
>> +__bpf_kfunc struct file *bpf_fget_task(struct task_struct *task, unsigned int fd)
>> +{
>> +	struct file *file;
>> +
>> +	file = fget_task(task, fd);
>> +	return file;
> 
> Why the local variable?

Thanks for your reply.

Uh, I forgot why I wrote it like that.

Yes, I agree that it is redundant.

I will remove it in the next version.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 2/5] selftests/bpf: Add tests for open-coded style process file iterator
  2024-12-10 14:37   ` Christian Brauner
@ 2024-12-10 16:23     ` Juntong Deng
  2024-12-10 18:51       ` Alexei Starovoitov
  0 siblings, 1 reply; 25+ messages in thread
From: Juntong Deng @ 2024-12-10 16:23 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht, bpf,
	linux-kernel, linux-fsdevel

On 2024/12/10 14:37, Christian Brauner wrote:
> On Tue, Dec 10, 2024 at 02:03:51PM +0000, Juntong Deng wrote:
>> This patch adds test cases for open-coded style process file iterator.
>>
>> Test cases related to process files are run in the newly created child
>> process. Close all opened files inherited from the parent process in
>> the child process to avoid the files opened by the parent process
>> affecting the test results.
>>
>> In addition, this patch adds failure test cases where bpf programs
>> cannot pass the verifier due to uninitialized or untrusted
>> arguments, or not in RCU CS, etc.
>>
>> Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
>> ---
>>   .../testing/selftests/bpf/bpf_experimental.h  |   7 ++
>>   .../testing/selftests/bpf/prog_tests/iters.c  |  79 ++++++++++++
>>   .../selftests/bpf/progs/iters_task_file.c     |  88 ++++++++++++++
>>   .../bpf/progs/iters_task_file_failure.c       | 114 ++++++++++++++++++
>>   4 files changed, 288 insertions(+)
>>   create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c
>>   create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file_failure.c
>>
>> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
>> index cd8ecd39c3f3..ce1520c56b55 100644
>> --- a/tools/testing/selftests/bpf/bpf_experimental.h
>> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
>> @@ -588,4 +588,11 @@ extern int bpf_iter_kmem_cache_new(struct bpf_iter_kmem_cache *it) __weak __ksym
>>   extern struct kmem_cache *bpf_iter_kmem_cache_next(struct bpf_iter_kmem_cache *it) __weak __ksym;
>>   extern void bpf_iter_kmem_cache_destroy(struct bpf_iter_kmem_cache *it) __weak __ksym;
>>   
>> +struct bpf_iter_task_file;
>> +struct bpf_iter_task_file_item;
>> +extern int bpf_iter_task_file_new(struct bpf_iter_task_file *it, struct task_struct *task) __ksym;
>> +extern struct bpf_iter_task_file_item *
>> +bpf_iter_task_file_next(struct bpf_iter_task_file *it) __ksym;
>> +extern void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it) __ksym;
>> +
>>   #endif
>> diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
>> index 3cea71f9c500..cfe5b56cc027 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/iters.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/iters.c
>> @@ -1,6 +1,8 @@
>>   // SPDX-License-Identifier: GPL-2.0
>>   /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
>>   
>> +#define _GNU_SOURCE
>> +#include <sys/socket.h>
>>   #include <sys/syscall.h>
>>   #include <sys/mman.h>
>>   #include <sys/wait.h>
>> @@ -16,11 +18,13 @@
>>   #include "iters_num.skel.h"
>>   #include "iters_testmod.skel.h"
>>   #include "iters_testmod_seq.skel.h"
>> +#include "iters_task_file.skel.h"
>>   #include "iters_task_vma.skel.h"
>>   #include "iters_task.skel.h"
>>   #include "iters_css_task.skel.h"
>>   #include "iters_css.skel.h"
>>   #include "iters_task_failure.skel.h"
>> +#include "iters_task_file_failure.skel.h"
>>   
>>   static void subtest_num_iters(void)
>>   {
>> @@ -291,6 +295,78 @@ static void subtest_css_iters(void)
>>   	iters_css__destroy(skel);
>>   }
>>   
>> +static int task_file_test_process(void *args)
>> +{
>> +	int pipefd[2], sockfd, err = 0;
>> +
>> +	/* Create a clean file descriptor table for the test process */
>> +	close_range(0, ~0U, 0);
>> +
>> +	if (pipe(pipefd) < 0)
>> +		return 1;
>> +
>> +	sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
>> +	if (sockfd < 0) {
>> +		err = 2;
>> +		goto cleanup_pipe;
>> +	}
>> +
>> +	usleep(1);
>> +
>> +	close(sockfd);
>> +cleanup_pipe:
>> +	close(pipefd[0]);
>> +	close(pipefd[1]);
>> +	return err;
>> +}
>> +
>> +static void subtest_task_file_iters(void)
>> +{
>> +	const int stack_size = 1024 * 1024;
>> +	struct iters_task_file *skel;
>> +	int child_pid, wstatus, err;
>> +	char *stack;
>> +
>> +	skel = iters_task_file__open_and_load();
>> +	if (!ASSERT_OK_PTR(skel, "open_and_load"))
>> +		return;
>> +
>> +	if (!ASSERT_OK(skel->bss->err, "pre_test_err"))
>> +		goto cleanup_skel;
>> +
>> +	skel->bss->parent_pid = getpid();
>> +	skel->bss->count = 0;
>> +
>> +	err = iters_task_file__attach(skel);
>> +	if (!ASSERT_OK(err, "skel_attach"))
>> +		goto cleanup_skel;
>> +
>> +	stack = (char *)malloc(stack_size);
>> +	if (!ASSERT_OK_PTR(stack, "clone_stack"))
>> +		goto cleanup_attach;
>> +
>> +	/* Note that there is no CLONE_FILES */
>> +	child_pid = clone(task_file_test_process, stack + stack_size, CLONE_VM | SIGCHLD, NULL);
>> +	if (!ASSERT_GT(child_pid, -1, "child_pid"))
>> +		goto cleanup_stack;
>> +
>> +	if (!ASSERT_GT(waitpid(child_pid, &wstatus, 0), -1, "waitpid"))
>> +		goto cleanup_stack;
>> +
>> +	if (!ASSERT_OK(WEXITSTATUS(wstatus), "run_task_file_iters_test_err"))
>> +		goto cleanup_stack;
>> +
>> +	ASSERT_EQ(skel->bss->count, 1, "run_task_file_iters_test_count_err");
>> +	ASSERT_OK(skel->bss->err, "run_task_file_iters_test_failure");
>> +
>> +cleanup_stack:
>> +	free(stack);
>> +cleanup_attach:
>> +	iters_task_file__detach(skel);
>> +cleanup_skel:
>> +	iters_task_file__destroy(skel);
>> +}
>> +
>>   void test_iters(void)
>>   {
>>   	RUN_TESTS(iters_state_safety);
>> @@ -315,5 +391,8 @@ void test_iters(void)
>>   		subtest_css_task_iters();
>>   	if (test__start_subtest("css"))
>>   		subtest_css_iters();
>> +	if (test__start_subtest("task_file"))
>> +		subtest_task_file_iters();
>>   	RUN_TESTS(iters_task_failure);
>> +	RUN_TESTS(iters_task_file_failure);
>>   }
>> diff --git a/tools/testing/selftests/bpf/progs/iters_task_file.c b/tools/testing/selftests/bpf/progs/iters_task_file.c
>> new file mode 100644
>> index 000000000000..81bcd20041d8
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/progs/iters_task_file.c
>> @@ -0,0 +1,88 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#include "vmlinux.h"
>> +#include <bpf/bpf_tracing.h>
>> +#include <bpf/bpf_helpers.h>
>> +#include "bpf_misc.h"
>> +#include "bpf_experimental.h"
>> +#include "task_kfunc_common.h"
>> +
>> +char _license[] SEC("license") = "GPL";
>> +
>> +int err, parent_pid, count;
>> +
>> +extern const void pipefifo_fops __ksym;
>> +extern const void socket_file_ops __ksym;
>> +
>> +SEC("fentry/" SYS_PREFIX "sys_nanosleep")
>> +int test_bpf_iter_task_file(void *ctx)
>> +{
>> +	struct bpf_iter_task_file task_file_it;
>> +	struct bpf_iter_task_file_item *item;
>> +	struct task_struct *task;
>> +
>> +	task = bpf_get_current_task_btf();
>> +	if (task->parent->pid != parent_pid)
>> +		return 0;
>> +
>> +	count++;
>> +
>> +	bpf_rcu_read_lock();
> 
> What does the RCU read lock do here exactly?
> 

Thanks for your reply.

This is used to solve the problem previously discussed in v3 [0].

Task ref may be released during iteration.

[0]: 
https://lore.kernel.org/bpf/CAADnVQ+0LUXxmfm1YgyGDz=cciy3+dGGM-Zysq84fpAdaB74Qw@mail.gmail.com/

>> +	bpf_iter_task_file_new(&task_file_it, task);
>> +
>> +	item = bpf_iter_task_file_next(&task_file_it);
>> +	if (item == NULL) {
>> +		err = 1;
>> +		goto cleanup;
>> +	}
>> +
>> +	if (item->fd != 0) {
>> +		err = 2;
>> +		goto cleanup;
>> +	}
>> +
>> +	if (item->file->f_op != &pipefifo_fops) {
>> +		err = 3;
>> +		goto cleanup;
>> +	}
>> +
>> +	item = bpf_iter_task_file_next(&task_file_it);
>> +	if (item == NULL) {
>> +		err = 4;
>> +		goto cleanup;
>> +	}
>> +
>> +	if (item->fd != 1) {
>> +		err = 5;
>> +		goto cleanup;
>> +	}
>> +
>> +	if (item->file->f_op != &pipefifo_fops) {
>> +		err = 6;
>> +		goto cleanup;
>> +	}
>> +
>> +	item = bpf_iter_task_file_next(&task_file_it);
>> +	if (item == NULL) {
>> +		err = 7;
>> +		goto cleanup;
>> +	}
>> +
>> +	if (item->fd != 2) {
>> +		err = 8;
>> +		goto cleanup;
>> +	}
>> +
>> +	if (item->file->f_op != &socket_file_ops) {
>> +		err = 9;
>> +		goto cleanup;
>> +	}
>> +
>> +	item = bpf_iter_task_file_next(&task_file_it);
>> +	if (item != NULL)
>> +		err = 10;
>> +cleanup:
>> +	bpf_iter_task_file_destroy(&task_file_it);
>> +	bpf_rcu_read_unlock();
>> +	return 0;
>> +}
>> diff --git a/tools/testing/selftests/bpf/progs/iters_task_file_failure.c b/tools/testing/selftests/bpf/progs/iters_task_file_failure.c
>> new file mode 100644
>> index 000000000000..c3de9235b888
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/progs/iters_task_file_failure.c
>> @@ -0,0 +1,114 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#include "vmlinux.h"
>> +#include <bpf/bpf_tracing.h>
>> +#include <bpf/bpf_helpers.h>
>> +#include "bpf_misc.h"
>> +#include "bpf_experimental.h"
>> +#include "task_kfunc_common.h"
>> +
>> +char _license[] SEC("license") = "GPL";
>> +
>> +SEC("syscall")
>> +__failure __msg("expected an RCU CS when using bpf_iter_task_file")
>> +int bpf_iter_task_file_new_without_rcu_lock(void *ctx)
>> +{
>> +	struct bpf_iter_task_file task_file_it;
>> +	struct task_struct *task;
>> +
>> +	task = bpf_get_current_task_btf();
>> +
>> +	bpf_iter_task_file_new(&task_file_it, task);
>> +
>> +	bpf_iter_task_file_destroy(&task_file_it);
>> +	return 0;
>> +}
>> +
>> +SEC("syscall")
>> +__failure __msg("expected uninitialized iter_task_file as arg #1")
>> +int bpf_iter_task_file_new_inited_iter(void *ctx)
>> +{
>> +	struct bpf_iter_task_file task_file_it;
>> +	struct task_struct *task;
>> +
>> +	task = bpf_get_current_task_btf();
>> +
>> +	bpf_rcu_read_lock();
>> +	bpf_iter_task_file_new(&task_file_it, task);
>> +
>> +	bpf_iter_task_file_new(&task_file_it, task);
>> +
>> +	bpf_iter_task_file_destroy(&task_file_it);
>> +	bpf_rcu_read_unlock();
>> +	return 0;
>> +}
>> +
>> +SEC("syscall")
>> +__failure __msg("Possibly NULL pointer passed to trusted arg1")
>> +int bpf_iter_task_file_new_null_task(void *ctx)
>> +{
>> +	struct bpf_iter_task_file task_file_it;
>> +	struct task_struct *task = NULL;
>> +
>> +	bpf_rcu_read_lock();
>> +	bpf_iter_task_file_new(&task_file_it, task);
>> +
>> +	bpf_iter_task_file_destroy(&task_file_it);
>> +	bpf_rcu_read_unlock();
>> +	return 0;
>> +}
>> +
>> +SEC("syscall")
>> +__failure __msg("R2 must be referenced or trusted")
>> +int bpf_iter_task_file_new_untrusted_task(void *ctx)
>> +{
>> +	struct bpf_iter_task_file task_file_it;
>> +	struct task_struct *task;
>> +
>> +	task = bpf_get_current_task_btf()->parent;
>> +
>> +	bpf_rcu_read_lock();
>> +	bpf_iter_task_file_new(&task_file_it, task);
>> +
>> +	bpf_iter_task_file_destroy(&task_file_it);
>> +	bpf_rcu_read_unlock();
>> +	return 0;
>> +}
>> +
>> +SEC("syscall")
>> +__failure __msg("Unreleased reference")
>> +int bpf_iter_task_file_no_destory(void *ctx)
>> +{
>> +	struct bpf_iter_task_file task_file_it;
>> +	struct task_struct *task;
>> +
>> +	task = bpf_get_current_task_btf();
>> +
>> +	bpf_rcu_read_lock();
>> +	bpf_iter_task_file_new(&task_file_it, task);
>> +
>> +	bpf_rcu_read_unlock();
>> +	return 0;
>> +}
>> +
>> +SEC("syscall")
>> +__failure __msg("expected an initialized iter_task_file as arg #1")
>> +int bpf_iter_task_file_next_uninit_iter(void *ctx)
>> +{
>> +	struct bpf_iter_task_file task_file_it;
>> +
>> +	bpf_iter_task_file_next(&task_file_it);
>> +
>> +	return 0;
>> +}
>> +
>> +SEC("syscall")
>> +__failure __msg("expected an initialized iter_task_file as arg #1")
>> +int bpf_iter_task_file_destroy_uninit_iter(void *ctx)
>> +{
>> +	struct bpf_iter_task_file task_file_it;
>> +
>> +	bpf_iter_task_file_destroy(&task_file_it);
>> +
>> +	return 0;
>> +}
>> -- 
>> 2.39.5
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc
  2024-12-10 14:44 ` [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and " Christian Brauner
@ 2024-12-10 16:25   ` Juntong Deng
  0 siblings, 0 replies; 25+ messages in thread
From: Juntong Deng @ 2024-12-10 16:25 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, memxor, snorcht, bpf,
	linux-kernel, linux-fsdevel

On 2024/12/10 14:44, Christian Brauner wrote:
> On Tue, Dec 10, 2024 at 02:01:53PM +0000, Juntong Deng wrote:
>> This patch series adds open-coded style process file iterator
>> bpf_iter_task_file and bpf_fget_task() kfunc, and corresponding
>> selftests test cases.
>>
>> In addition, since fs kfuncs is generic and useful for scenarios
>> other than LSM, this patch makes fs kfuncs available for SYSCALL
>> and TRACING program types [0].
>>
>> [0]: https://lore.kernel.org/bpf/CAPhsuW6ud21v2xz8iSXf=CiDL+R_zpQ+p8isSTMTw=EiJQtRSw@mail.gmail.com/
>>
>> Although iter/task_file already exists, for CRIB we still need the
> 
> What is CRIB?
> 

CRIB is Checkpoint/Restore In eBPF.

Introductions to CRIB can be found at the following links:

https://lpc.events/event/18/contributions/1812/
https://lore.kernel.org/bpf/AM6PR03MB58488045E4D0FA6AEDC8BDE099A52@AM6PR03MB5848.eurprd03.prod.outlook.com/T/#u
https://lwn.net/Articles/984313/

>> open-coded iterator style process file iterator, and the same is true
>> for other bpf iterators such as iter/tcp, iter/udp, etc.
>>
>> The traditional bpf iterator is more like a bpf version of procfs, but
>> similar to procfs, it is not suitable for CRIB scenarios that need to
>> obtain large amounts of complex, multi-level in-kernel information.
>>
>> The following is from previous discussions [2].
>>
>> [2]: https://lore.kernel.org/bpf/AM6PR03MB5848CA34B5B68C90F210285E99B12@AM6PR03MB5848.eurprd03.prod.outlook.com/
>>
>> This is because the context of bpf iterators is fixed and bpf iterators
>> cannot be nested. This means that a bpf iterator program can only
>> complete a specific small iterative dump task, and cannot dump
>> multi-level data.
>>
>> An example, when we need to dump all the sockets of a process, we need
>> to iterate over all the files (sockets) of the process, and iterate over
>> the all packets in the queue of each socket, and iterate over all data
>> in each packet.
>>
>> If we use bpf iterator, since the iterator can not be nested, we need to
>> use socket iterator program to get all the basic information of all
>> sockets (pass pid as filter), and then use packet iterator program to
>> get the basic information of all packets of a specific socket (pass pid,
>> fd as filter), and then use packet data iterator program to get all the
>> data of a specific packet (pass pid, fd, packet index as filter).
>>
>> This would be complicated and require a lot of (each iteration)
>> bpf program startup and exit (leading to poor performance).
>>
>> By comparison, open coded iterator is much more flexible, we can iterate
>> in any context, at any time, and iteration can be nested, so we can
>> achieve more flexible and more elegant dumping through open coded
>> iterators.
>>
>> With open coded iterators, all of the above can be done in a single
>> bpf program, and with nested iterators, everything becomes compact
>> and simple.
>>
>> Also, bpf iterators transmit data to user space through seq_file,
>> which involves a lot of open (bpf_iter_create), read, close syscalls,
>> context switching, memory copying, and cannot achieve the performance
>> of using ringbuf.
>>
>> Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
>> ---
>> v4 -> v5:
>> * Add file type checks in test cases for process file iterator
>>    and bpf_fget_task().
>>
>> * Use fentry to synchronize tests instead of waiting in a loop.
>>
>> * Remove path_d_path_kfunc_non_lsm test case.
>>
>> * Replace task_lookup_next_fdget_rcu() with fget_task_next().
>>
>> * Remove future merge conflict section in cover letter (resolved).
>>
>> v3 -> v4:
>> * Make all kfuncs generic, not CRIB specific.
>>
>> * Move bpf_fget_task to fs/bpf_fs_kfuncs.c.
>>
>> * Remove bpf_iter_task_file_get_fd and bpf_get_file_ops_type.
>>
>> * Use struct bpf_iter_task_file_item * as the return value of
>>    bpf_iter_task_file_next.
>>
>> * Change fd to unsigned int type and add next_fd.
>>
>> * Add KF_RCU_PROTECTED to bpf_iter_task_file_new.
>>
>> * Make fs kfuncs available to SYSCALL and TRACING program types.
>>
>> * Update all relevant test cases.
>>
>> * Remove the discussion section from cover letter.
>>
>> v2 -> v3:
>> * Move task_file open-coded iterator to kernel/bpf/helpers.c.
>>
>> * Fix duplicate error code 7 in test_bpf_iter_task_file().
>>
>> * Add comment for case when bpf_iter_task_file_get_fd() returns -1.
>>
>> * Add future plans in commit message of "Add struct file related
>>    CRIB kfuncs".
>>
>> * Add Discussion section to cover letter.
>>
>> v1 -> v2:
>> * Fix a type definition error in the fd parameter of
>>    bpf_fget_task() at crib_common.h.
>>
>> Juntong Deng (5):
>>    bpf: Introduce task_file open-coded iterator kfuncs
>>    selftests/bpf: Add tests for open-coded style process file iterator
>>    bpf: Add bpf_fget_task() kfunc
>>    bpf: Make fs kfuncs available for SYSCALL and TRACING program types
>>    selftests/bpf: Add tests for bpf_fget_task() kfunc
>>
>>   fs/bpf_fs_kfuncs.c                            |  42 ++++---
>>   kernel/bpf/helpers.c                          |   3 +
>>   kernel/bpf/task_iter.c                        |  92 ++++++++++++++
>>   .../testing/selftests/bpf/bpf_experimental.h  |  15 +++
>>   .../selftests/bpf/prog_tests/fs_kfuncs.c      |  46 +++++++
>>   .../testing/selftests/bpf/prog_tests/iters.c  |  79 ++++++++++++
>>   .../selftests/bpf/progs/fs_kfuncs_failure.c   |  33 +++++
>>   .../selftests/bpf/progs/iters_task_file.c     |  88 ++++++++++++++
>>   .../bpf/progs/iters_task_file_failure.c       | 114 ++++++++++++++++++
>>   .../selftests/bpf/progs/test_fget_task.c      |  63 ++++++++++
>>   .../selftests/bpf/progs/verifier_vfs_reject.c |  10 --
>>   11 files changed, 559 insertions(+), 26 deletions(-)
>>   create mode 100644 tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c
>>   create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c
>>   create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file_failure.c
>>   create mode 100644 tools/testing/selftests/bpf/progs/test_fget_task.c
>>
>> -- 
>> 2.39.5
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 2/5] selftests/bpf: Add tests for open-coded style process file iterator
  2024-12-10 16:23     ` Juntong Deng
@ 2024-12-10 18:51       ` Alexei Starovoitov
  2024-12-11 21:27         ` Juntong Deng
  0 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2024-12-10 18:51 UTC (permalink / raw)
  To: Juntong Deng
  Cc: Christian Brauner, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eddy Z,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Kumar Kartikeya Dwivedi, snorcht, bpf, LKML,
	Linux-Fsdevel

On Tue, Dec 10, 2024 at 8:23 AM Juntong Deng <juntong.deng@outlook.com> wrote:
>
> >> +SEC("fentry/" SYS_PREFIX "sys_nanosleep")
> >> +int test_bpf_iter_task_file(void *ctx)
> >> +{
> >> +    struct bpf_iter_task_file task_file_it;
> >> +    struct bpf_iter_task_file_item *item;
> >> +    struct task_struct *task;
> >> +
> >> +    task = bpf_get_current_task_btf();
> >> +    if (task->parent->pid != parent_pid)
> >> +            return 0;
> >> +
> >> +    count++;
> >> +
> >> +    bpf_rcu_read_lock();
> >
> > What does the RCU read lock do here exactly?
> >
>
> Thanks for your reply.
>
> This is used to solve the problem previously discussed in v3 [0].
>
> Task ref may be released during iteration.
>
> [0]:
> https://lore.kernel.org/bpf/CAADnVQ+0LUXxmfm1YgyGDz=cciy3+dGGM-Zysq84fpAdaB74Qw@mail.gmail.com/

I think you misunderstood my comment.

"If this object _was_ RCU protected ..."

Adding rcu_read_lock doesn't make 'task' pointer RCU protected.
That's not how RCU works.

So patch 1 doing:

item->task = task;

is not correct.

See bpf_iter_task_vma_new(). It's doing:
kit->data->task = get_task_struct(task);
to make sure task stays valid while iterating.

pw-bot: cr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  2024-12-10 14:43   ` Christian Brauner
@ 2024-12-10 18:58     ` Alexei Starovoitov
  2024-12-11 21:29       ` Juntong Deng
  2024-12-17 12:30       ` Christian Brauner
  0 siblings, 2 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2024-12-10 18:58 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Juntong Deng, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Kumar Kartikeya Dwivedi, snorcht, bpf, LKML, Linux-Fsdevel

On Tue, Dec 10, 2024 at 6:43 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Dec 10, 2024 at 02:03:53PM +0000, Juntong Deng wrote:
> > Currently fs kfuncs are only available for LSM program type, but fs
> > kfuncs are generic and useful for scenarios other than LSM.
> >
> > This patch makes fs kfuncs available for SYSCALL and TRACING
> > program types.
>
> I would like a detailed explanation from the maintainers what it means
> to make this available to SYSCALL program types, please.

Sigh.
This is obviously not safe from tracing progs.

From BPF_PROG_TYPE_SYSCALL these kfuncs should be safe to use,
since those progs are not attached to anything.
Such progs can only be executed via sys_bpf syscall prog_run command.
They're sleepable, preemptable, faultable, in task ctx.

But I'm not sure what's the value of enabling these kfuncs for
BPF_PROG_TYPE_SYSCALL.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 2/5] selftests/bpf: Add tests for open-coded style process file iterator
  2024-12-10 18:51       ` Alexei Starovoitov
@ 2024-12-11 21:27         ` Juntong Deng
  0 siblings, 0 replies; 25+ messages in thread
From: Juntong Deng @ 2024-12-11 21:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Christian Brauner, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eddy Z,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Kumar Kartikeya Dwivedi, snorcht, bpf, LKML,
	Linux-Fsdevel

On 2024/12/10 18:51, Alexei Starovoitov wrote:
> On Tue, Dec 10, 2024 at 8:23 AM Juntong Deng <juntong.deng@outlook.com> wrote:
>>
>>>> +SEC("fentry/" SYS_PREFIX "sys_nanosleep")
>>>> +int test_bpf_iter_task_file(void *ctx)
>>>> +{
>>>> +    struct bpf_iter_task_file task_file_it;
>>>> +    struct bpf_iter_task_file_item *item;
>>>> +    struct task_struct *task;
>>>> +
>>>> +    task = bpf_get_current_task_btf();
>>>> +    if (task->parent->pid != parent_pid)
>>>> +            return 0;
>>>> +
>>>> +    count++;
>>>> +
>>>> +    bpf_rcu_read_lock();
>>>
>>> What does the RCU read lock do here exactly?
>>>
>>
>> Thanks for your reply.
>>
>> This is used to solve the problem previously discussed in v3 [0].
>>
>> Task ref may be released during iteration.
>>
>> [0]:
>> https://lore.kernel.org/bpf/CAADnVQ+0LUXxmfm1YgyGDz=cciy3+dGGM-Zysq84fpAdaB74Qw@mail.gmail.com/
> 
> I think you misunderstood my comment.
> 
> "If this object _was_ RCU protected ..."
> 
> Adding rcu_read_lock doesn't make 'task' pointer RCU protected.
> That's not how RCU works.
> 
> So patch 1 doing:
> 
> item->task = task;
> 
> is not correct.
> 
> See bpf_iter_task_vma_new(). It's doing:
> kit->data->task = get_task_struct(task);
> to make sure task stays valid while iterating.
> 
> pw-bot: cr

Thanks for your reply.

Sorry for the misunderstanding.

I will fix it in the next version.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  2024-12-10 18:58     ` Alexei Starovoitov
@ 2024-12-11 21:29       ` Juntong Deng
  2024-12-11 22:06         ` Song Liu
  2024-12-12  0:53         ` Alexei Starovoitov
  2024-12-17 12:30       ` Christian Brauner
  1 sibling, 2 replies; 25+ messages in thread
From: Juntong Deng @ 2024-12-11 21:29 UTC (permalink / raw)
  To: Alexei Starovoitov, Christian Brauner
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Kumar Kartikeya Dwivedi, snorcht, bpf, LKML, Linux-Fsdevel

On 2024/12/10 18:58, Alexei Starovoitov wrote:
> On Tue, Dec 10, 2024 at 6:43 AM Christian Brauner <brauner@kernel.org> wrote:
>>
>> On Tue, Dec 10, 2024 at 02:03:53PM +0000, Juntong Deng wrote:
>>> Currently fs kfuncs are only available for LSM program type, but fs
>>> kfuncs are generic and useful for scenarios other than LSM.
>>>
>>> This patch makes fs kfuncs available for SYSCALL and TRACING
>>> program types.
>>
>> I would like a detailed explanation from the maintainers what it means
>> to make this available to SYSCALL program types, please.
> 
> Sigh.
> This is obviously not safe from tracing progs.
> 
>  From BPF_PROG_TYPE_SYSCALL these kfuncs should be safe to use,
> since those progs are not attached to anything.
> Such progs can only be executed via sys_bpf syscall prog_run command.
> They're sleepable, preemptable, faultable, in task ctx.
> 
> But I'm not sure what's the value of enabling these kfuncs for
> BPF_PROG_TYPE_SYSCALL.

Thanks for your reply.

Song said here that we need some of these kfuncs to be available for
tracing functions [0].

If Song saw this email, could you please join the discussion?

[0]: 
https://lore.kernel.org/bpf/CAPhsuW6ud21v2xz8iSXf=CiDL+R_zpQ+p8isSTMTw=EiJQtRSw@mail.gmail.com/

For BPF_PROG_TYPE_SYSCALL, I think BPF_PROG_TYPE_SYSCALL has now
exceeded its original designed purpose and has become a more general
program type.

Currently BPF_PROG_TYPE_SYSCALL is widely used in HID drivers, and there
are some use cases in sched-ext (CRIB is also a use case, although still
in infancy).

As BPF_PROG_TYPE_SYSCALL becomes more general, it would be valuable to
make more kfuncs available for BPF_PROG_TYPE_SYSCALL.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  2024-12-11 21:29       ` Juntong Deng
@ 2024-12-11 22:06         ` Song Liu
  2024-12-13 18:44           ` Juntong Deng
  2024-12-12  0:53         ` Alexei Starovoitov
  1 sibling, 1 reply; 25+ messages in thread
From: Song Liu @ 2024-12-11 22:06 UTC (permalink / raw)
  To: Juntong Deng
  Cc: Alexei Starovoitov, Christian Brauner, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Eddy Z, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Kumar Kartikeya Dwivedi,
	snorcht, bpf, LKML, Linux-Fsdevel

On Wed, Dec 11, 2024 at 1:29 PM Juntong Deng <juntong.deng@outlook.com> wrote:
>
> On 2024/12/10 18:58, Alexei Starovoitov wrote:
> > On Tue, Dec 10, 2024 at 6:43 AM Christian Brauner <brauner@kernel.org> wrote:
> >>
> >> On Tue, Dec 10, 2024 at 02:03:53PM +0000, Juntong Deng wrote:
> >>> Currently fs kfuncs are only available for LSM program type, but fs
> >>> kfuncs are generic and useful for scenarios other than LSM.
> >>>
> >>> This patch makes fs kfuncs available for SYSCALL and TRACING
> >>> program types.
> >>
> >> I would like a detailed explanation from the maintainers what it means
> >> to make this available to SYSCALL program types, please.
> >
> > Sigh.
> > This is obviously not safe from tracing progs.
> >
> >  From BPF_PROG_TYPE_SYSCALL these kfuncs should be safe to use,
> > since those progs are not attached to anything.
> > Such progs can only be executed via sys_bpf syscall prog_run command.
> > They're sleepable, preemptable, faultable, in task ctx.
> >
> > But I'm not sure what's the value of enabling these kfuncs for
> > BPF_PROG_TYPE_SYSCALL.
>
> Thanks for your reply.
>
> Song said here that we need some of these kfuncs to be available for
> tracing functions [0].

I meant we can put the new kfuncs, such as bpf_get_file_ops_type, in
bpf_fs_kfuncs.c, and make it available to tracing programs. But we
cannot blindly make all of these kfuncs available to tracing programs.
Instead, we need to review each kfunc and check whether it is safe
for tracing programs.

Thanks,
Song

> If Song saw this email, could you please join the discussion?
>
> [0]:
> https://lore.kernel.org/bpf/CAPhsuW6ud21v2xz8iSXf=CiDL+R_zpQ+p8isSTMTw=EiJQtRSw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  2024-12-11 21:29       ` Juntong Deng
  2024-12-11 22:06         ` Song Liu
@ 2024-12-12  0:53         ` Alexei Starovoitov
  2024-12-13 18:51           ` Juntong Deng
  1 sibling, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2024-12-12  0:53 UTC (permalink / raw)
  To: Juntong Deng
  Cc: Christian Brauner, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eddy Z,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Kumar Kartikeya Dwivedi, snorcht, bpf, LKML,
	Linux-Fsdevel

On Wed, Dec 11, 2024 at 1:29 PM Juntong Deng <juntong.deng@outlook.com> wrote:
>
> On 2024/12/10 18:58, Alexei Starovoitov wrote:
> > On Tue, Dec 10, 2024 at 6:43 AM Christian Brauner <brauner@kernel.org> wrote:
> >>
> >> On Tue, Dec 10, 2024 at 02:03:53PM +0000, Juntong Deng wrote:
> >>> Currently fs kfuncs are only available for LSM program type, but fs
> >>> kfuncs are generic and useful for scenarios other than LSM.
> >>>
> >>> This patch makes fs kfuncs available for SYSCALL and TRACING
> >>> program types.
> >>
> >> I would like a detailed explanation from the maintainers what it means
> >> to make this available to SYSCALL program types, please.
> >
> > Sigh.
> > This is obviously not safe from tracing progs.
> >
> >  From BPF_PROG_TYPE_SYSCALL these kfuncs should be safe to use,
> > since those progs are not attached to anything.
> > Such progs can only be executed via sys_bpf syscall prog_run command.
> > They're sleepable, preemptable, faultable, in task ctx.
> >
> > But I'm not sure what's the value of enabling these kfuncs for
> > BPF_PROG_TYPE_SYSCALL.
>
> Thanks for your reply.
>
> Song said here that we need some of these kfuncs to be available for
> tracing functions [0].
>
> If Song saw this email, could you please join the discussion?
>
> [0]:
> https://lore.kernel.org/bpf/CAPhsuW6ud21v2xz8iSXf=CiDL+R_zpQ+p8isSTMTw=EiJQtRSw@mail.gmail.com/
>
> For BPF_PROG_TYPE_SYSCALL, I think BPF_PROG_TYPE_SYSCALL has now
> exceeded its original designed purpose and has become a more general
> program type.
>
> Currently BPF_PROG_TYPE_SYSCALL is widely used in HID drivers, and there
> are some use cases in sched-ext (CRIB is also a use case, although still
> in infancy).

hid switched to use struct_ops prog type.
I believe syscall prog type in hid is a legacy code.
Those still present might be leftovers for older kernels.

sched-ext is struct_ops only. No syscall progs there.

> As BPF_PROG_TYPE_SYSCALL becomes more general, it would be valuable to
> make more kfuncs available for BPF_PROG_TYPE_SYSCALL.

Maybe. I still don't understand how it helps CRIB goal.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  2024-12-11 22:06         ` Song Liu
@ 2024-12-13 18:44           ` Juntong Deng
  0 siblings, 0 replies; 25+ messages in thread
From: Juntong Deng @ 2024-12-13 18:44 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, Christian Brauner, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Eddy Z, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Kumar Kartikeya Dwivedi,
	snorcht, bpf, LKML, Linux-Fsdevel

On 2024/12/11 22:06, Song Liu wrote:
> On Wed, Dec 11, 2024 at 1:29 PM Juntong Deng <juntong.deng@outlook.com> wrote:
>>
>> On 2024/12/10 18:58, Alexei Starovoitov wrote:
>>> On Tue, Dec 10, 2024 at 6:43 AM Christian Brauner <brauner@kernel.org> wrote:
>>>>
>>>> On Tue, Dec 10, 2024 at 02:03:53PM +0000, Juntong Deng wrote:
>>>>> Currently fs kfuncs are only available for LSM program type, but fs
>>>>> kfuncs are generic and useful for scenarios other than LSM.
>>>>>
>>>>> This patch makes fs kfuncs available for SYSCALL and TRACING
>>>>> program types.
>>>>
>>>> I would like a detailed explanation from the maintainers what it means
>>>> to make this available to SYSCALL program types, please.
>>>
>>> Sigh.
>>> This is obviously not safe from tracing progs.
>>>
>>>   From BPF_PROG_TYPE_SYSCALL these kfuncs should be safe to use,
>>> since those progs are not attached to anything.
>>> Such progs can only be executed via sys_bpf syscall prog_run command.
>>> They're sleepable, preemptable, faultable, in task ctx.
>>>
>>> But I'm not sure what's the value of enabling these kfuncs for
>>> BPF_PROG_TYPE_SYSCALL.
>>
>> Thanks for your reply.
>>
>> Song said here that we need some of these kfuncs to be available for
>> tracing functions [0].
> 
> I meant we can put the new kfuncs, such as bpf_get_file_ops_type, in
> bpf_fs_kfuncs.c, and make it available to tracing programs. But we
> cannot blindly make all of these kfuncs available to tracing programs.
> Instead, we need to review each kfunc and check whether it is safe
> for tracing programs.
> 
> Thanks,
> Song
> 

Thanks for joining the discussion.

Yes, we should do that.

Sorry for the misunderstanding.

>> If Song saw this email, could you please join the discussion?
>>
>> [0]:
>> https://lore.kernel.org/bpf/CAPhsuW6ud21v2xz8iSXf=CiDL+R_zpQ+p8isSTMTw=EiJQtRSw@mail.gmail.com/


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  2024-12-12  0:53         ` Alexei Starovoitov
@ 2024-12-13 18:51           ` Juntong Deng
  2024-12-14  0:41             ` Alexei Starovoitov
  0 siblings, 1 reply; 25+ messages in thread
From: Juntong Deng @ 2024-12-13 18:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Christian Brauner, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eddy Z,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Kumar Kartikeya Dwivedi, snorcht, bpf, LKML,
	Linux-Fsdevel

On 2024/12/12 00:53, Alexei Starovoitov wrote:
> On Wed, Dec 11, 2024 at 1:29 PM Juntong Deng <juntong.deng@outlook.com> wrote:
>>
>> On 2024/12/10 18:58, Alexei Starovoitov wrote:
>>> On Tue, Dec 10, 2024 at 6:43 AM Christian Brauner <brauner@kernel.org> wrote:
>>>>
>>>> On Tue, Dec 10, 2024 at 02:03:53PM +0000, Juntong Deng wrote:
>>>>> Currently fs kfuncs are only available for LSM program type, but fs
>>>>> kfuncs are generic and useful for scenarios other than LSM.
>>>>>
>>>>> This patch makes fs kfuncs available for SYSCALL and TRACING
>>>>> program types.
>>>>
>>>> I would like a detailed explanation from the maintainers what it means
>>>> to make this available to SYSCALL program types, please.
>>>
>>> Sigh.
>>> This is obviously not safe from tracing progs.
>>>
>>>   From BPF_PROG_TYPE_SYSCALL these kfuncs should be safe to use,
>>> since those progs are not attached to anything.
>>> Such progs can only be executed via sys_bpf syscall prog_run command.
>>> They're sleepable, preemptable, faultable, in task ctx.
>>>
>>> But I'm not sure what's the value of enabling these kfuncs for
>>> BPF_PROG_TYPE_SYSCALL.
>>
>> Thanks for your reply.
>>
>> Song said here that we need some of these kfuncs to be available for
>> tracing functions [0].
>>
>> If Song saw this email, could you please join the discussion?
>>
>> [0]:
>> https://lore.kernel.org/bpf/CAPhsuW6ud21v2xz8iSXf=CiDL+R_zpQ+p8isSTMTw=EiJQtRSw@mail.gmail.com/
>>
>> For BPF_PROG_TYPE_SYSCALL, I think BPF_PROG_TYPE_SYSCALL has now
>> exceeded its original designed purpose and has become a more general
>> program type.
>>
>> Currently BPF_PROG_TYPE_SYSCALL is widely used in HID drivers, and there
>> are some use cases in sched-ext (CRIB is also a use case, although still
>> in infancy).
> 
> hid switched to use struct_ops prog type.
> I believe syscall prog type in hid is a legacy code.
> Those still present might be leftovers for older kernels.
> 
> sched-ext is struct_ops only. No syscall progs there.
> 

I saw some on Github [0], sorry, yes they are not in the Linux tree.

[0]: 
https://github.com/search?q=repo%3Asched-ext%2Fscx%20SEC(%22syscall%22)&type=code

>> As BPF_PROG_TYPE_SYSCALL becomes more general, it would be valuable to
>> make more kfuncs available for BPF_PROG_TYPE_SYSCALL.
> 
> Maybe. I still don't understand how it helps CRIB goal.

For CRIB goals, the program type is not important. What is important is
that CRIB bpf programs are able to call the required kfuncs, and that
CRIB ebpf programs can be executed from userspace.

In our previous discussion, the conclusion was that we do not need a
separate CRIB program type [1].

BPF_PROG_TYPE_SYSCALL can be executed from userspace via prog_run, which
fits the CRIB use case of calling the ebpf program from userspace to get
process information.

So BPF_PROG_TYPE_SYSCALL becomes an option.

[1]: 
https://lore.kernel.org/bpf/etzm4h5qm2jhgi6d4pevooy2sebrvgb3lsa67ym4x7zbh5bgnj@feoli4hj22so/

In fs/bpf_fs_kfuncs.c, CRIB currently needs bpf_fget_task (dump files
opened by the process), bpf_put_file, and bpf_get_task_exe_file.

So I would like these kfuncs to be available for BPF_PROG_TYPE_SYSCALL.

bpf_get_dentry_xattr, bpf_get_file_xattr, and bpf_path_d_path have
nothing to do with CRIB, but they are all in bpf_fs_kfunc_set_ids.

Should we make bpf_fs_kfunc_set_ids available to BPF_PROG_TYPE_SYSCALL
as a whole? Or create a separate set? Maybe we can discuss.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  2024-12-13 18:51           ` Juntong Deng
@ 2024-12-14  0:41             ` Alexei Starovoitov
  2024-12-16 23:08               ` Juntong Deng
  0 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2024-12-14  0:41 UTC (permalink / raw)
  To: Juntong Deng
  Cc: Christian Brauner, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eddy Z,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Kumar Kartikeya Dwivedi, snorcht, bpf, LKML,
	Linux-Fsdevel

On Fri, Dec 13, 2024 at 10:51 AM Juntong Deng <juntong.deng@outlook.com> wrote:
>
> >
> > sched-ext is struct_ops only. No syscall progs there.
> >
>
> I saw some on Github [0], sorry, yes they are not in the Linux tree.
>
> [0]:
> https://github.com/search?q=repo%3Asched-ext%2Fscx%20SEC(%22syscall%22)&type=code

Ahh. I see. Those are executed from user space via prog_run.
https://github.com/sched-ext/scx/blob/e8e68e8ee80f65f62a6e900d457306217b764e58/scheds/rust/scx_lavd/src/main.rs#L794

These progs are not executed by sched-ext core,
so not really sched-ext progs.
They're auxiliary progs that populate configs and knobs in bpf maps
that sched-ext progs use later.

>
> >> As BPF_PROG_TYPE_SYSCALL becomes more general, it would be valuable to
> >> make more kfuncs available for BPF_PROG_TYPE_SYSCALL.
> >
> > Maybe. I still don't understand how it helps CRIB goal.
>
> For CRIB goals, the program type is not important. What is important is
> that CRIB bpf programs are able to call the required kfuncs, and that
> CRIB ebpf programs can be executed from userspace.
>
> In our previous discussion, the conclusion was that we do not need a
> separate CRIB program type [1].
>
> BPF_PROG_TYPE_SYSCALL can be executed from userspace via prog_run, which
> fits the CRIB use case of calling the ebpf program from userspace to get
> process information.
>
> So BPF_PROG_TYPE_SYSCALL becomes an option.
>
> [1]:
> https://lore.kernel.org/bpf/etzm4h5qm2jhgi6d4pevooy2sebrvgb3lsa67ym4x7zbh5bgnj@feoli4hj22so/
>
> In fs/bpf_fs_kfuncs.c, CRIB currently needs bpf_fget_task (dump files
> opened by the process), bpf_put_file, and bpf_get_task_exe_file.
>
> So I would like these kfuncs to be available for BPF_PROG_TYPE_SYSCALL.
>
> bpf_get_dentry_xattr, bpf_get_file_xattr, and bpf_path_d_path have
> nothing to do with CRIB, but they are all in bpf_fs_kfunc_set_ids.
>
> Should we make bpf_fs_kfunc_set_ids available to BPF_PROG_TYPE_SYSCALL
> as a whole? Or create a separate set? Maybe we can discuss.

I don't think it's necessary to slide and dice that match.
Since they're all safe from syscall prog it's cleaner to enable them all.

When I said:

> I still don't understand how it helps CRIB goal.

I meant how are you going to use them from CRIB ?

Patch 5 selftest does:

+ file = bpf_fget_task(task, test_fd1);
+ if (file == NULL) {
+ err = 2;
+ return 0;
+ }
+
+ if (file->f_op != &pipefifo_fops) {
+ err = 3;
+ bpf_put_file(file);
+ return 0;
+ }
+
+ bpf_put_file(file);


It's ok for selftest, but not enough to explain the motivation and
end-to-end operation of CRIB.

Patch 2 selftest is also weak.
It's not using bpf_iter_task_file_next() like iterators are
normally used.

When selftests are basic sanity tests, it begs the question: what's next?
How are they going to be used for real?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  2024-12-14  0:41             ` Alexei Starovoitov
@ 2024-12-16 23:08               ` Juntong Deng
  0 siblings, 0 replies; 25+ messages in thread
From: Juntong Deng @ 2024-12-16 23:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Christian Brauner, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eddy Z,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Kumar Kartikeya Dwivedi, snorcht, bpf, LKML,
	Linux-Fsdevel

On 2024/12/14 00:41, Alexei Starovoitov wrote:
> On Fri, Dec 13, 2024 at 10:51 AM Juntong Deng <juntong.deng@outlook.com> wrote:
>>
>>>
>>> sched-ext is struct_ops only. No syscall progs there.
>>>
>>
>> I saw some on Github [0], sorry, yes they are not in the Linux tree.
>>
>> [0]:
>> https://github.com/search?q=repo%3Asched-ext%2Fscx%20SEC(%22syscall%22)&type=code
> 
> Ahh. I see. Those are executed from user space via prog_run.
> https://github.com/sched-ext/scx/blob/e8e68e8ee80f65f62a6e900d457306217b764e58/scheds/rust/scx_lavd/src/main.rs#L794
> 
> These progs are not executed by sched-ext core,
> so not really sched-ext progs.
> They're auxiliary progs that populate configs and knobs in bpf maps
> that sched-ext progs use later.
> 
>>
>>>> As BPF_PROG_TYPE_SYSCALL becomes more general, it would be valuable to
>>>> make more kfuncs available for BPF_PROG_TYPE_SYSCALL.
>>>
>>> Maybe. I still don't understand how it helps CRIB goal.
>>
>> For CRIB goals, the program type is not important. What is important is
>> that CRIB bpf programs are able to call the required kfuncs, and that
>> CRIB ebpf programs can be executed from userspace.
>>
>> In our previous discussion, the conclusion was that we do not need a
>> separate CRIB program type [1].
>>
>> BPF_PROG_TYPE_SYSCALL can be executed from userspace via prog_run, which
>> fits the CRIB use case of calling the ebpf program from userspace to get
>> process information.
>>
>> So BPF_PROG_TYPE_SYSCALL becomes an option.
>>
>> [1]:
>> https://lore.kernel.org/bpf/etzm4h5qm2jhgi6d4pevooy2sebrvgb3lsa67ym4x7zbh5bgnj@feoli4hj22so/
>>
>> In fs/bpf_fs_kfuncs.c, CRIB currently needs bpf_fget_task (dump files
>> opened by the process), bpf_put_file, and bpf_get_task_exe_file.
>>
>> So I would like these kfuncs to be available for BPF_PROG_TYPE_SYSCALL.
>>
>> bpf_get_dentry_xattr, bpf_get_file_xattr, and bpf_path_d_path have
>> nothing to do with CRIB, but they are all in bpf_fs_kfunc_set_ids.
>>
>> Should we make bpf_fs_kfunc_set_ids available to BPF_PROG_TYPE_SYSCALL
>> as a whole? Or create a separate set? Maybe we can discuss.
> 
> I don't think it's necessary to slide and dice that match.
> Since they're all safe from syscall prog it's cleaner to enable them all.
> 
> When I said:
> 
>> I still don't understand how it helps CRIB goal.
> 
> I meant how are you going to use them from CRIB ?
> 
> Patch 5 selftest does:
> 
> + file = bpf_fget_task(task, test_fd1);
> + if (file == NULL) {
> + err = 2;
> + return 0;
> + }
> +
> + if (file->f_op != &pipefifo_fops) {
> + err = 3;
> + bpf_put_file(file);
> + return 0;
> + }
> +
> + bpf_put_file(file);
> 
> 
> It's ok for selftest, but not enough to explain the motivation and
> end-to-end operation of CRIB.
> 
> Patch 2 selftest is also weak.
> It's not using bpf_iter_task_file_next() like iterators are
> normally used.
> 
> When selftests are basic sanity tests, it begs the question: what's next?
> How are they going to be used for real?

In my plan CRIB ebpf program will look like this

SEC("syscall")
int dump_task_files(struct task_arg *arg)
...

SEC("syscall")
int dump_task_socket(struct socket_arg *arg)
...

SEC("syscall")
int dump_task_pipe(struct pipe_arg *arg)
...

Since the complexity of an ebpf program is limited, I am unable to
implement dumping all types of files within one ebpf program, so I need
to provide separate bpf programs for different file types (restoring
part is similar).

In dump_task_files will use bpf_iter_task_file to obtain the file
descriptor, the file type (socket, pipe, ...) and other necessary
information of all files opened by the process.

And then the userspace program will pass the file descriptors to
different dump_task_xxx ebpf programs based on the different file
types. bpf_fget_task will be used in the dump_task_xxx ebpf programs.

In restoring part, ebpf program is used only as minimal necessary help,
and most of the restoring part continue to use the original kernel
interface.

For example, when restoring sockets, we still use the original kernel
interface "socket" to create sockets, and only use the ebpf program
when necessary (e.g. to set TCP internal states).

Thanks to BPF CO-RE, it would be more convenient to extend the data
structure of kfuncs than the data structure of the system call
interface, so this still has value.

Once I have finished a good enough solution, I will send out this part
of the patch series (as soon as I can, even though I cannot do it
full time).

All other patch series related to opened 'files' (socket, pipe, ...)
depend on bpf_iter_task_file and bpf_fget_task.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  2024-12-10 18:58     ` Alexei Starovoitov
  2024-12-11 21:29       ` Juntong Deng
@ 2024-12-17 12:30       ` Christian Brauner
  2024-12-17 14:11         ` Juntong Deng
  1 sibling, 1 reply; 25+ messages in thread
From: Christian Brauner @ 2024-12-17 12:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Juntong Deng, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Kumar Kartikeya Dwivedi, snorcht, bpf, LKML, Linux-Fsdevel

On Tue, Dec 10, 2024 at 10:58:52AM -0800, Alexei Starovoitov wrote:
> On Tue, Dec 10, 2024 at 6:43 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Tue, Dec 10, 2024 at 02:03:53PM +0000, Juntong Deng wrote:
> > > Currently fs kfuncs are only available for LSM program type, but fs
> > > kfuncs are generic and useful for scenarios other than LSM.
> > >
> > > This patch makes fs kfuncs available for SYSCALL and TRACING
> > > program types.
> >
> > I would like a detailed explanation from the maintainers what it means
> > to make this available to SYSCALL program types, please.
> 
> Sigh.

Hm? Was that directed at my question? I don't have the background to
judge this and this whole api looks like a giant footgun so far for
questionable purposes.

I have a hard time seeing parts of CRIU moved into bpf especially
because all of the userspace stuff exists.

> This is obviously not safe from tracing progs.
> 
> From BPF_PROG_TYPE_SYSCALL these kfuncs should be safe to use,
> since those progs are not attached to anything.
> Such progs can only be executed via sys_bpf syscall prog_run command.
> They're sleepable, preemptable, faultable, in task ctx.
> 
> But I'm not sure what's the value of enabling these kfuncs for
> BPF_PROG_TYPE_SYSCALL.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types
  2024-12-17 12:30       ` Christian Brauner
@ 2024-12-17 14:11         ` Juntong Deng
  0 siblings, 0 replies; 25+ messages in thread
From: Juntong Deng @ 2024-12-17 14:11 UTC (permalink / raw)
  To: Christian Brauner, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Kumar Kartikeya Dwivedi, snorcht, bpf, LKML, Linux-Fsdevel

On 2024/12/17 12:30, Christian Brauner wrote:
> On Tue, Dec 10, 2024 at 10:58:52AM -0800, Alexei Starovoitov wrote:
>> On Tue, Dec 10, 2024 at 6:43 AM Christian Brauner <brauner@kernel.org> wrote:
>>>
>>> On Tue, Dec 10, 2024 at 02:03:53PM +0000, Juntong Deng wrote:
>>>> Currently fs kfuncs are only available for LSM program type, but fs
>>>> kfuncs are generic and useful for scenarios other than LSM.
>>>>
>>>> This patch makes fs kfuncs available for SYSCALL and TRACING
>>>> program types.
>>>
>>> I would like a detailed explanation from the maintainers what it means
>>> to make this available to SYSCALL program types, please.
>>
>> Sigh.
> 
> Hm? Was that directed at my question? I don't have the background to
> judge this and this whole api looks like a giant footgun so far for
> questionable purposes.
> 
> I have a hard time seeing parts of CRIU moved into bpf especially
> because all of the userspace stuff exists.
> 

All these kfuncs I want to add are not CRIB specific but generic.

Although these kfuncs are to be added because of CRIB, these kfuncs
are essentially to give BPF the ability to access process-related
information.

Obtaining process-related information is not a requirement specific to
checkpoint/restore scenarios, but is also required in other scenarios.

Access to process-related information is a generic ability that would
be useful for scenarios other than checkpoint/restore.

Therefore these generic kfuncs will be valuable to all BPF users
in the long run.

>> This is obviously not safe from tracing progs.
>>
>>  From BPF_PROG_TYPE_SYSCALL these kfuncs should be safe to use,
>> since those progs are not attached to anything.
>> Such progs can only be executed via sys_bpf syscall prog_run command.
>> They're sleepable, preemptable, faultable, in task ctx.
>>
>> But I'm not sure what's the value of enabling these kfuncs for
>> BPF_PROG_TYPE_SYSCALL.


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-12-17 14:11 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-10 14:01 [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc Juntong Deng
2024-12-10 14:03 ` [PATCH bpf-next v5 1/5] bpf: Introduce task_file open-coded iterator kfuncs Juntong Deng
2024-12-10 14:03 ` [PATCH bpf-next v5 2/5] selftests/bpf: Add tests for open-coded style process file iterator Juntong Deng
2024-12-10 14:37   ` Christian Brauner
2024-12-10 16:23     ` Juntong Deng
2024-12-10 18:51       ` Alexei Starovoitov
2024-12-11 21:27         ` Juntong Deng
2024-12-10 14:03 ` [PATCH bpf-next v5 3/5] bpf: Add bpf_fget_task() kfunc Juntong Deng
2024-12-10 14:28   ` Christian Brauner
2024-12-10 16:21     ` Juntong Deng
2024-12-10 14:03 ` [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types Juntong Deng
2024-12-10 14:43   ` Christian Brauner
2024-12-10 18:58     ` Alexei Starovoitov
2024-12-11 21:29       ` Juntong Deng
2024-12-11 22:06         ` Song Liu
2024-12-13 18:44           ` Juntong Deng
2024-12-12  0:53         ` Alexei Starovoitov
2024-12-13 18:51           ` Juntong Deng
2024-12-14  0:41             ` Alexei Starovoitov
2024-12-16 23:08               ` Juntong Deng
2024-12-17 12:30       ` Christian Brauner
2024-12-17 14:11         ` Juntong Deng
2024-12-10 14:03 ` [PATCH bpf-next v5 5/5] selftests/bpf: Add tests for bpf_fget_task() kfunc Juntong Deng
2024-12-10 14:44 ` [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file iterator and " Christian Brauner
2024-12-10 16:25   ` Juntong Deng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox