linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 bpf-next 0/5] bpf path iterator
@ 2025-06-06 21:30 Song Liu
  2025-06-06 21:30 ` [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent() Song Liu
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Song Liu @ 2025-06-06 21:30 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel, linux-security-module
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, mic, gnoack, m, Song Liu

In security use cases, it is common to apply rules to VFS subtrees.
However, filtering files in a subtree is not straightforward [1].

One solution to this problem is to start from a path and walk up the VFS
tree (towards the root). Among in-tree LSMs, Landlock uses this solution.

BPF LSM solutions, such like Tetragon [2], also use similar approaches.
However, due to lack of proper helper/kfunc support, BPF LSM solutions
usually do the path walk with probe read, which is racy.

This patchset introduces a new helper path_walk_parent, which walks
path to its VFS parent. The helper is used in Landlock.

A new BPF iterator, path iterator, is introduced to do the path walking.
The BPF path iterator uses the new path_walk_parent help to walk the VFS
tree.

Changes v2 => v3:
1. Fix an issue with path_walk_parent.
2. Move bpf path iterator to fs/bpf_fs_kfuncs.c
3. Optimize bpf path iterator (less memory).
4. Add more selftests.
5. Add more comments.

v2: https://lore.kernel.org/bpf/20250603065920.3404510-1-song@kernel.org/

Changes v1 => v2:
1. Rename path_parent => path_walk_parent.
2. Remove path_connected check in path_walk_parent.
3. Fix is_access_to_paths_allowed().
4. Remove mode for path iterator, add a flag instead.

v1: https://lore.kernel.org/bpf/20250528222623.1373000-1-song@kernel.org/

[1] https://lpc.events/event/18/contributions/1940/
[2] https://github.com/cilium/tetragon/

Song Liu (5):
  namei: Introduce new helper function path_walk_parent()
  landlock: Use path_walk_parent()
  bpf: Introduce path iterator
  selftests/bpf: Add tests for bpf path iterator
  selftests/bpf: Path walk test

 fs/bpf_fs_kfuncs.c                            |  73 +++++++++
 fs/namei.c                                    |  51 ++++++
 include/linux/namei.h                         |   2 +
 kernel/bpf/verifier.c                         |   5 +
 security/landlock/fs.c                        |  31 ++--
 .../testing/selftests/bpf/bpf_experimental.h  |   6 +
 .../selftests/bpf/prog_tests/path_iter.c      | 111 ++++++++++++++
 tools/testing/selftests/bpf/progs/path_iter.c | 145 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/path_walk.c |  59 +++++++
 9 files changed, 462 insertions(+), 21 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/path_iter.c
 create mode 100644 tools/testing/selftests/bpf/progs/path_iter.c
 create mode 100644 tools/testing/selftests/bpf/progs/path_walk.c

--
2.47.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-06 21:30 [PATCH v3 bpf-next 0/5] bpf path iterator Song Liu
@ 2025-06-06 21:30 ` Song Liu
  2025-06-10 17:18   ` Mickaël Salaün
  2025-06-10 23:34   ` NeilBrown
  2025-06-06 21:30 ` [PATCH v3 bpf-next 2/5] landlock: Use path_walk_parent() Song Liu
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Song Liu @ 2025-06-06 21:30 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel, linux-security-module
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, mic, gnoack, m, Song Liu

This helper walks an input path to its parent. Logic are added to handle
walking across mount tree.

This will be used by landlock, and BPF LSM.

Signed-off-by: Song Liu <song@kernel.org>
---
 fs/namei.c            | 51 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/namei.h |  2 ++
 2 files changed, 53 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index 4bb889fc980b..f02183e9c073 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1424,6 +1424,57 @@ static bool choose_mountpoint(struct mount *m, const struct path *root,
 	return found;
 }
 
+/**
+ * path_walk_parent - Walk to the parent of path
+ * @path: input and output path.
+ * @root: root of the path walk, do not go beyond this root. If @root is
+ *        zero'ed, walk all the way to real root.
+ *
+ * Given a path, find the parent path. Replace @path with the parent path.
+ * If we were already at the real root or a disconnected root, @path is
+ * not changed.
+ *
+ * The logic of path_walk_parent() is similar to follow_dotdot(), except
+ * that path_walk_parent() will continue walking for !path_connected case.
+ * This effectively means we are walking from disconnected bind mount to
+ * the original mount. If this behavior is not desired, the caller can add
+ * a check like:
+ *
+ *   if (path_walk_parent(&path) && !path_connected(path.mnt, path.dentry)
+ *           // continue walking
+ *   else
+ *           // stop walking
+ *
+ * Returns:
+ *  true  - if @path is updated to its parent.
+ *  false - if @path is already the root (real root or @root).
+ */
+bool path_walk_parent(struct path *path, const struct path *root)
+{
+	struct dentry *parent;
+
+	if (path_equal(path, root))
+		return false;
+
+	if (unlikely(path->dentry == path->mnt->mnt_root)) {
+		struct path p;
+
+		if (!choose_mountpoint(real_mount(path->mnt), root, &p))
+			return false;
+		path_put(path);
+		*path = p;
+	}
+
+	if (unlikely(IS_ROOT(path->dentry)))
+		return false;
+
+	parent = dget_parent(path->dentry);
+	dput(path->dentry);
+	path->dentry = parent;
+	return true;
+}
+EXPORT_SYMBOL_GPL(path_walk_parent);
+
 /*
  * Perform an automount
  * - return -EISDIR to tell follow_managed() to stop and return the path we
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 5d085428e471..cba5373ecf86 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -85,6 +85,8 @@ extern int follow_down_one(struct path *);
 extern int follow_down(struct path *path, unsigned int flags);
 extern int follow_up(struct path *);
 
+bool path_walk_parent(struct path *path, const struct path *root);
+
 extern struct dentry *lock_rename(struct dentry *, struct dentry *);
 extern struct dentry *lock_rename_child(struct dentry *, struct dentry *);
 extern void unlock_rename(struct dentry *, struct dentry *);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 bpf-next 2/5] landlock: Use path_walk_parent()
  2025-06-06 21:30 [PATCH v3 bpf-next 0/5] bpf path iterator Song Liu
  2025-06-06 21:30 ` [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent() Song Liu
@ 2025-06-06 21:30 ` Song Liu
  2025-06-08 18:45   ` Tingmao Wang
  2025-06-06 21:30 ` [PATCH v3 bpf-next 3/5] bpf: Introduce path iterator Song Liu
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Song Liu @ 2025-06-06 21:30 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel, linux-security-module
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, mic, gnoack, m, Song Liu

Use path_walk_parent() to walk a path up to its parent.

No functional changes intended.

Signed-off-by: Song Liu <song@kernel.org>
---
 security/landlock/fs.c | 31 ++++++++++---------------------
 1 file changed, 10 insertions(+), 21 deletions(-)

diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 6fee7c20f64d..3adac544dc9e 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -837,8 +837,8 @@ static bool is_access_to_paths_allowed(
 	 * restriction.
 	 */
 	while (true) {
-		struct dentry *parent_dentry;
 		const struct landlock_rule *rule;
+		struct path root = {};
 
 		/*
 		 * If at least all accesses allowed on the destination are
@@ -895,34 +895,23 @@ static bool is_access_to_paths_allowed(
 		/* Stops when a rule from each layer grants access. */
 		if (allowed_parent1 && allowed_parent2)
 			break;
-jump_up:
-		if (walker_path.dentry == walker_path.mnt->mnt_root) {
-			if (follow_up(&walker_path)) {
-				/* Ignores hidden mount points. */
-				goto jump_up;
-			} else {
-				/*
-				 * Stops at the real root.  Denies access
-				 * because not all layers have granted access.
-				 */
-				break;
-			}
-		}
+
+		if (path_walk_parent(&walker_path, &root))
+			continue;
+
 		if (unlikely(IS_ROOT(walker_path.dentry))) {
 			/*
-			 * Stops at disconnected root directories.  Only allows
-			 * access to internal filesystems (e.g. nsfs, which is
-			 * reachable through /proc/<pid>/ns/<namespace>).
+			 * Stops at disconnected or real root directories.
+			 * Only allows access to internal filesystems
+			 * (e.g. nsfs, which is reachable through
+			 * /proc/<pid>/ns/<namespace>).
 			 */
 			if (walker_path.mnt->mnt_flags & MNT_INTERNAL) {
 				allowed_parent1 = true;
 				allowed_parent2 = true;
 			}
-			break;
 		}
-		parent_dentry = dget_parent(walker_path.dentry);
-		dput(walker_path.dentry);
-		walker_path.dentry = parent_dentry;
+		break;
 	}
 	path_put(&walker_path);
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 bpf-next 3/5] bpf: Introduce path iterator
  2025-06-06 21:30 [PATCH v3 bpf-next 0/5] bpf path iterator Song Liu
  2025-06-06 21:30 ` [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent() Song Liu
  2025-06-06 21:30 ` [PATCH v3 bpf-next 2/5] landlock: Use path_walk_parent() Song Liu
@ 2025-06-06 21:30 ` Song Liu
  2025-06-06 21:30 ` [PATCH v3 bpf-next 4/5] selftests/bpf: Add tests for bpf " Song Liu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Song Liu @ 2025-06-06 21:30 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel, linux-security-module
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, mic, gnoack, m, Song Liu

Introduce a path iterator, which walks a struct path toward the root.
This path iterator is based on path_walk_parent. A fixed zero'ed root
is passed to path_walk_parent(). Therefore, unless the user terminates
it earlier, the iterator will terminate at the real root.

Signed-off-by: Song Liu <song@kernel.org>
---
 fs/bpf_fs_kfuncs.c    | 73 +++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c |  5 +++
 2 files changed, 78 insertions(+)

diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 08412532db1b..8c618154df0a 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -10,6 +10,7 @@
 #include <linux/fsnotify.h>
 #include <linux/file.h>
 #include <linux/mm.h>
+#include <linux/namei.h>
 #include <linux/xattr.h>
 
 __bpf_kfunc_start_defs();
@@ -324,6 +325,75 @@ __bpf_kfunc int bpf_remove_dentry_xattr(struct dentry *dentry, const char *name_
 
 __bpf_kfunc_end_defs();
 
+/* open-coded path iterator */
+struct bpf_iter_path {
+	__u64 __opaque[2];
+} __aligned(8);
+
+struct bpf_iter_path_kern {
+	struct path path;
+} __aligned(8);
+
+__bpf_kfunc_start_defs();
+
+__bpf_kfunc int bpf_iter_path_new(struct bpf_iter_path *it,
+				  struct path *start,
+				  __u64 flags)
+{
+	struct bpf_iter_path_kern *kit = (void *)it;
+
+	BUILD_BUG_ON(sizeof(*kit) > sizeof(*it));
+	BUILD_BUG_ON(__alignof__(*kit) != __alignof__(*it));
+
+	if (flags) {
+		/*
+		 * _destroy() is still called when _new() fails. Zero
+		 * kit->path so that it be passed to path_put() safely.
+		 * Note: path_put() is no-op for zero'ed path.
+		 */
+		memset(&kit->path, 0, sizeof(struct path));
+		return -EINVAL;
+	}
+
+	kit->path = *start;
+	path_get(&kit->path);
+
+	return 0;
+}
+
+__bpf_kfunc struct path *bpf_iter_path_next(struct bpf_iter_path *it)
+{
+	struct bpf_iter_path_kern *kit = (void *)it;
+	struct path root = {};
+
+	/*
+	 * "root" is zero'ed. Therefore, unless the loop is explicitly
+	 * terminated, bpf_iter_path_next() will continue looping until
+	 * we've reached the global root of the VFS.
+	 *
+	 * If a root of walk is needed, the user can check "path" against
+	 * that root on each iteration.
+	 */
+	if (!path_walk_parent(&kit->path, &root)) {
+		/*
+		 * Return NULL, but keep valid kit->path. _destroy() will
+		 * always path_put(&kit->path).
+		 */
+		return NULL;
+	}
+
+	return &kit->path;
+}
+
+__bpf_kfunc void bpf_iter_path_destroy(struct bpf_iter_path *it)
+{
+	struct bpf_iter_path_kern *kit = (void *)it;
+
+	path_put(&kit->path);
+}
+
+__bpf_kfunc_end_defs();
+
 BTF_KFUNCS_START(bpf_fs_kfunc_set_ids)
 BTF_ID_FLAGS(func, bpf_get_task_exe_file,
 	     KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
@@ -333,6 +403,9 @@ BTF_ID_FLAGS(func, bpf_get_dentry_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_get_file_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_set_dentry_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_remove_dentry_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_path_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_SLEEPABLE)
+BTF_ID_FLAGS(func, bpf_iter_path_next, KF_ITER_NEXT | KF_RET_NULL | KF_SLEEPABLE)
+BTF_ID_FLAGS(func, bpf_iter_path_destroy, KF_ITER_DESTROY | KF_SLEEPABLE)
 BTF_KFUNCS_END(bpf_fs_kfunc_set_ids)
 
 static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e31f6b0ccb30..2c187c05c06f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7036,6 +7036,10 @@ BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct socket) {
 	struct sock *sk;
 };
 
+BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct path) {
+	struct dentry *dentry;
+};
+
 static bool type_is_rcu(struct bpf_verifier_env *env,
 			struct bpf_reg_state *reg,
 			const char *field_name, u32 btf_id)
@@ -7076,6 +7080,7 @@ static bool type_is_trusted_or_null(struct bpf_verifier_env *env,
 				    const char *field_name, u32 btf_id)
 {
 	BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct socket));
+	BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct path));
 
 	return btf_nested_type_is_trusted(&env->log, reg, field_name, btf_id,
 					  "__safe_trusted_or_null");
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 bpf-next 4/5] selftests/bpf: Add tests for bpf path iterator
  2025-06-06 21:30 [PATCH v3 bpf-next 0/5] bpf path iterator Song Liu
                   ` (2 preceding siblings ...)
  2025-06-06 21:30 ` [PATCH v3 bpf-next 3/5] bpf: Introduce path iterator Song Liu
@ 2025-06-06 21:30 ` Song Liu
  2025-06-06 21:30 ` [PATCH v3 bpf-next 5/5] selftests/bpf: Path walk test Song Liu
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Song Liu @ 2025-06-06 21:30 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel, linux-security-module
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, mic, gnoack, m, Song Liu

Add tests for bpf path iterator, including test cases similar to real
workload (call bpf_path_d_path and bpf_get_dentry_xattr), and test cases
where the verifier rejects invalid use of the iterator.

Signed-off-by: Song Liu <song@kernel.org>
---
 .../testing/selftests/bpf/bpf_experimental.h  |   6 +
 .../selftests/bpf/prog_tests/path_iter.c      |  12 ++
 tools/testing/selftests/bpf/progs/path_iter.c | 145 ++++++++++++++++++
 3 files changed, 163 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/path_iter.c
 create mode 100644 tools/testing/selftests/bpf/progs/path_iter.c

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 5e512a1d09d1..cbb759b473df 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -596,4 +596,10 @@ extern int bpf_iter_dmabuf_new(struct bpf_iter_dmabuf *it) __weak __ksym;
 extern struct dma_buf *bpf_iter_dmabuf_next(struct bpf_iter_dmabuf *it) __weak __ksym;
 extern void bpf_iter_dmabuf_destroy(struct bpf_iter_dmabuf *it) __weak __ksym;
 
+struct bpf_iter_path;
+extern int bpf_iter_path_new(struct bpf_iter_path *it, struct path *start,
+			     __u64 flags) __weak __ksym;
+extern struct path *bpf_iter_path_next(struct bpf_iter_path *it) __weak __ksym;
+extern void bpf_iter_path_destroy(struct bpf_iter_path *it) __weak __ksym;
+
 #endif
diff --git a/tools/testing/selftests/bpf/prog_tests/path_iter.c b/tools/testing/selftests/bpf/prog_tests/path_iter.c
new file mode 100644
index 000000000000..3c99c24fbd96
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/path_iter.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+#include <bpf/libbpf.h>
+#include <bpf/btf.h>
+#include "path_iter.skel.h"
+
+void test_path_iter(void)
+{
+	RUN_TESTS(path_iter);
+}
diff --git a/tools/testing/selftests/bpf/progs/path_iter.c b/tools/testing/selftests/bpf/progs/path_iter.c
new file mode 100644
index 000000000000..74d0f4e19ffa
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/path_iter.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+char _license[] SEC("license") = "GPL";
+
+char path_name[256];
+char xattr_val[64];
+
+static __always_inline void access_path_dentry(struct path *p)
+{
+	struct bpf_dynptr ptr;
+	struct dentry *dentry;
+
+	if (!p)
+		return;
+
+	bpf_dynptr_from_mem(xattr_val, sizeof(xattr_val), 0, &ptr);
+	bpf_path_d_path(p, path_name, sizeof(path_name));
+
+	dentry = p->dentry;
+	if (dentry)
+		bpf_get_dentry_xattr(dentry, "user.xattr", &ptr);
+}
+
+SEC("lsm.s/file_open")
+__success
+int BPF_PROG(open_code, struct file *f)
+{
+	struct bpf_iter_path path_it;
+	struct path *p;
+	int ret;
+
+	ret = bpf_iter_path_new(&path_it, &f->f_path, 0);
+	if (ret) {
+		bpf_iter_path_destroy(&path_it);
+		return 0;
+	}
+
+	p = bpf_iter_path_next(&path_it);
+	access_path_dentry(p);
+	bpf_iter_path_destroy(&path_it);
+
+	return 0;
+}
+
+SEC("lsm.s/file_open")
+__success
+int BPF_PROG(for_each, struct file *f)
+{
+	struct path *p;
+
+	bpf_for_each(path, p, &f->f_path, 0)
+		access_path_dentry(p);
+
+	return 0;
+}
+
+SEC("lsm.s/file_open")
+__failure __msg("Unreleased reference")
+int BPF_PROG(missing_destroy, struct file *f)
+{
+	struct bpf_iter_path path_it;
+
+	bpf_iter_path_new(&path_it, &f->f_path, 0);
+
+	return 0;
+}
+
+SEC("lsm.s/file_open")
+__failure __msg("expected an initialized iter_path")
+int BPF_PROG(missing_new, struct file *f)
+{
+	struct bpf_iter_path path_it;
+
+	bpf_iter_path_destroy(&path_it);
+	return 0;
+}
+
+SEC("lsm.s/file_open")
+__failure __msg("expected uninitialized iter_path")
+int BPF_PROG(new_twice, struct file *f)
+{
+	struct bpf_iter_path path_it;
+
+	bpf_iter_path_new(&path_it, &f->f_path, 0);
+	bpf_iter_path_new(&path_it, &f->f_path, 0);
+	bpf_iter_path_destroy(&path_it);
+	return 0;
+}
+
+SEC("lsm.s/file_open")
+__failure __msg("expected an initialized iter_path")
+int BPF_PROG(destroy_twice, struct file *f)
+{
+	struct bpf_iter_path path_it;
+
+	bpf_iter_path_new(&path_it, &f->f_path, 0);
+	bpf_iter_path_destroy(&path_it);
+	bpf_iter_path_destroy(&path_it);
+	return 0;
+}
+
+SEC("lsm.s/file_open")
+__success
+int BPF_PROG(reuse_path_iter, struct file *f)
+{
+	struct bpf_iter_path path_it;
+
+	bpf_iter_path_new(&path_it, &f->f_path, 0);
+	bpf_iter_path_destroy(&path_it);
+	bpf_iter_path_new(&path_it, &f->f_path, 0);
+	bpf_iter_path_destroy(&path_it);
+	return 0;
+}
+
+SEC("lsm.s/file_open")
+__failure __msg("invalid read from stack off")
+int BPF_PROG(invalid_read_path_iter, struct file *f)
+{
+	struct bpf_iter_path path_it;
+	struct bpf_iter_path path_it_2;
+
+
+	bpf_iter_path_new(&path_it, &f->f_path, 0);
+	path_it_2 = path_it;
+	bpf_iter_path_destroy(&path_it_2);
+	return 0;
+}
+
+SEC("lsm.s/sb_alloc_security")
+__failure __msg("must be referenced or trusted")
+int BPF_PROG(untrusted_path, struct super_block *sb)
+{
+	struct bpf_iter_path path_it;
+
+	bpf_iter_path_new(&path_it, &sb->s_bdev_file->f_path, 0);
+	bpf_iter_path_destroy(&path_it);
+	return 0;
+}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 bpf-next 5/5] selftests/bpf: Path walk test
  2025-06-06 21:30 [PATCH v3 bpf-next 0/5] bpf path iterator Song Liu
                   ` (3 preceding siblings ...)
  2025-06-06 21:30 ` [PATCH v3 bpf-next 4/5] selftests/bpf: Add tests for bpf " Song Liu
@ 2025-06-06 21:30 ` Song Liu
  2025-06-08 17:32 ` [PATCH v3 bpf-next 0/5] bpf path iterator Tingmao Wang
  2025-06-08 17:32 ` Tingmao Wang
  6 siblings, 0 replies; 30+ messages in thread
From: Song Liu @ 2025-06-06 21:30 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel, linux-security-module
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, mic, gnoack, m, Song Liu

Add an end-to-end test with path_iter on security hook file_open.

A test file is created in folder /tmp/test_progs_path_iter/folder. On
file_open, walk file->f_path up to its parent and grand parent, and test
bpf_get_dentry_xattr and bpf_path_d_path on the folders.

Signed-off-by: Song Liu <song@kernel.org>
---
 .../selftests/bpf/prog_tests/path_iter.c      | 99 +++++++++++++++++++
 tools/testing/selftests/bpf/progs/path_walk.c | 59 +++++++++++
 2 files changed, 158 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/path_walk.c

diff --git a/tools/testing/selftests/bpf/prog_tests/path_iter.c b/tools/testing/selftests/bpf/prog_tests/path_iter.c
index 3c99c24fbd96..b9772026fbf7 100644
--- a/tools/testing/selftests/bpf/prog_tests/path_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/path_iter.c
@@ -2,11 +2,110 @@
 /* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
 
 #include <test_progs.h>
+#include <fcntl.h>
 #include <bpf/libbpf.h>
 #include <bpf/btf.h>
+#include <sys/stat.h>
+#include <sys/xattr.h>
+
 #include "path_iter.skel.h"
+#include "path_walk.skel.h"
+
+static const char grand_parent_path[] = "/tmp/test_progs_path_iter";
+static const char parent_path[] = "/tmp/test_progs_path_iter/folder";
+static const char file_path[] = "/tmp/test_progs_path_iter/folder/file";
+static const char xattr_name[] = "user.bpf.selftests";
+static const char xattr_value[] = "selftest_path_iter";
+
+static void cleanup_files(void)
+{
+	remove(file_path);
+	rmdir(parent_path);
+	rmdir(grand_parent_path);
+}
+
+static int setup_files_and_xattrs(void)
+{
+	int ret = -1;
+
+	/* create test folders */
+	if (mkdir(grand_parent_path, 0755))
+		goto error;
+	if (mkdir(parent_path, 0755))
+		goto error;
+
+	/* setxattr for test folders */
+	ret = setxattr(grand_parent_path, xattr_name,
+		       xattr_value, sizeof(xattr_value), 0);
+	if (ret < 0) {
+		/* return errno, so that we can handle EOPNOTSUPP in the caller */
+		ret = errno;
+		goto error;
+	}
+	ret = setxattr(parent_path, xattr_name,
+		       xattr_value, sizeof(xattr_value), 0);
+	if (ret < 0) {
+		/* return errno, so that we can handle EOPNOTSUPP in the caller */
+		ret = errno;
+		goto error;
+	}
+
+	return 0;
+error:
+	cleanup_files();
+	return ret;
+}
+
+static void test_path_walk(void)
+{
+	struct path_walk *skel = NULL;
+	int file_fd;
+	int err;
+
+	err = setup_files_and_xattrs();
+	if (err == EOPNOTSUPP) {
+		printf("%s:SKIP:local fs doesn't support xattr (%d)\n"
+		       "To run this test, make sure /tmp filesystem supports xattr.\n",
+		       __func__, errno);
+		test__skip();
+		return;
+	}
+
+	if (!ASSERT_OK(err, "setup_file"))
+		return;
+
+	skel = path_walk__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "path_walk__open_and_load"))
+		goto cleanup;
+
+	skel->bss->monitored_pid = getpid();
+	if (!ASSERT_OK(path_walk__attach(skel), "path_walk__attach"))
+		goto cleanup;
+
+	file_fd = open(file_path, O_CREAT);
+	if (!ASSERT_OK_FD(file_fd, "open_file"))
+		goto cleanup;
+	close(file_fd);
+
+	ASSERT_OK(strncmp(skel->bss->parent_xattr_buf, xattr_value, strlen(xattr_value)),
+		  "parent_xattr");
+	ASSERT_OK(strncmp(skel->bss->grand_parent_xattr_buf, xattr_value, strlen(xattr_value)),
+		  "grand_parent_xattr");
+
+	ASSERT_OK(strncmp(skel->bss->parent_path_buf, parent_path, strlen(parent_path)),
+		  "parent_d_path");
+	ASSERT_OK(strncmp(skel->bss->grand_parent_path_buf, grand_parent_path,
+			  strlen(grand_parent_path)),
+		  "grand_parent_d_path");
+
+cleanup:
+	path_walk__destroy(skel);
+	cleanup_files();
+}
 
 void test_path_iter(void)
 {
 	RUN_TESTS(path_iter);
+	if (test__start_subtest("path_walk_example"))
+		test_path_walk();
 }
diff --git a/tools/testing/selftests/bpf/progs/path_walk.c b/tools/testing/selftests/bpf/progs/path_walk.c
new file mode 100644
index 000000000000..1e1ae82b47a2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/path_walk.c
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <errno.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_kfuncs.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+__u32 monitored_pid;
+
+#define BUF_SIZE 1024
+char parent_path_buf[BUF_SIZE] = {};
+char parent_xattr_buf[BUF_SIZE] = {};
+char grand_parent_path_buf[BUF_SIZE] = {};
+char grand_parent_xattr_buf[BUF_SIZE] = {};
+
+static __always_inline void d_path_and_read_xattr(struct path *p, char *path, char *xattr)
+{
+	struct bpf_dynptr ptr;
+	struct dentry *dentry;
+
+	if (!p)
+		return;
+	bpf_path_d_path(p, path, BUF_SIZE);
+	bpf_dynptr_from_mem(xattr, BUF_SIZE, 0, &ptr);
+	dentry = p->dentry;
+	if (dentry)
+		bpf_get_dentry_xattr(dentry, "user.bpf.selftests", &ptr);
+}
+
+SEC("lsm.s/file_open")
+int BPF_PROG(test_file_open, struct file *f)
+{
+	__u32 pid = bpf_get_current_pid_tgid() >> 32;
+	struct bpf_iter_path path_it;
+	struct path *p;
+
+	if (pid != monitored_pid)
+		return 0;
+
+	bpf_iter_path_new(&path_it, &f->f_path, 0);
+
+	/* Get d_path and xattr for the parent directory */
+	p = bpf_iter_path_next(&path_it);
+	d_path_and_read_xattr(p, parent_path_buf, parent_xattr_buf);
+
+	/* Get d_path and xattr for the grand parent directory */
+	p = bpf_iter_path_next(&path_it);
+	d_path_and_read_xattr(p, grand_parent_path_buf, grand_parent_xattr_buf);
+
+	bpf_iter_path_destroy(&path_it);
+
+	return 0;
+}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 0/5] bpf path iterator
  2025-06-06 21:30 [PATCH v3 bpf-next 0/5] bpf path iterator Song Liu
                   ` (4 preceding siblings ...)
  2025-06-06 21:30 ` [PATCH v3 bpf-next 5/5] selftests/bpf: Path walk test Song Liu
@ 2025-06-08 17:32 ` Tingmao Wang
  2025-06-09  6:23   ` Song Liu
  2025-06-08 17:32 ` Tingmao Wang
  6 siblings, 1 reply; 30+ messages in thread
From: Tingmao Wang @ 2025-06-08 17:32 UTC (permalink / raw)
  To: Song Liu, Mickaël Salaün, Al Viro, Christian Brauner
  Cc: Tingmao Wang, amir73il, andrii, ast, bpf, daniel, eddyz87, gnoack,
	jack, jlayton, josef, kernel-team, kpsingh, linux-fsdevel,
	linux-kernel, linux-security-module, martin.lau, mattbobrowski,
	repnop

On 6/6/25 22:30, Song Liu wrote:
> In security use cases, it is common to apply rules to VFS subtrees.
> However, filtering files in a subtree is not straightforward [1].
>
> One solution to this problem is to start from a path and walk up the VFS
> tree (towards the root). Among in-tree LSMs, Landlock uses this solution.
>
> BPF LSM solutions, such like Tetragon [2], also use similar approaches.
> However, due to lack of proper helper/kfunc support, BPF LSM solutions
> usually do the path walk with probe read, which is racy.
>
> This patchset introduces a new helper path_walk_parent, which walks
> path to its VFS parent. The helper is used in Landlock.
>
> A new BPF iterator, path iterator, is introduced to do the path walking.
> The BPF path iterator uses the new path_walk_parent help to walk the VFS
> tree.

Hi Song, Christian, Al and others,

Previously I proposed in [1] to add ability to do a reference-less parent
walk for Landlock.  However, as Christian pointed out and I do agree in
hindsight, it is not a good idea to do things like this in non-VFS code.

However, I still think this is valuable to consider given the performance
improvement, and after some discussion with Mickaël, I would like to
propose extending Song's helper to support such usage.  While I recognize
that this patch series is already in its v3, and I do not want to delay it
by too much, putting this proposal out now is still better than after this
has merged, so that we may consider signature changes.

I've created a proof-of-concept and did some brief testing.  The
performance improvement attained here is the same as in [1] (with a "git
status" workload, median landlock overhead 35% -> 28%, median time in
landlock decreases by 26.6%).

If this idea is accepted, I'm happy to work on it further, split out this
patch, update the comments and do more testing etc, potentially in
collaboration with Song.

An alternative to this is perhaps to add a new helper
path_walk_parent_rcu, also living in namei.c, that will be used directly
by Landlock.  I'm happy to do it either way, but with some experimentation
I personally think that the code in this patch is still clean enough, and
can avoid some duplication.

Patch title: path_walk_parent: support reference-less walk

A later commit will update the BPF path iterator to use this.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 fs/namei.c             | 107 ++++++++++++++++++++++++++++++++++++-----
 include/linux/namei.h  |  19 +++++++-
 security/landlock/fs.c |  49 +++++++++++++++++--
 3 files changed, 157 insertions(+), 18 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 732b8fd02451..351ebe957db8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1424,6 +1424,30 @@ static bool choose_mountpoint(struct mount *m, const struct path *root,
 	return found;
 }
 
+/**
+ * acquires rcu read lock if rcu == true.
+ */
+void path_walk_parent_start(struct parent_iterator *pit, const struct path *path,
+			   const struct path *root, bool ref_less)
+{
+	pit->path = *path;
+
+	pit->root.mnt = NULL;
+	pit->root.dentry = NULL;
+	if (root)
+		pit->root = *root;
+
+	pit->rcu = ref_less;
+	if (ref_less) {
+		pit->m_seq = read_seqbegin(&mount_lock);
+		pit->r_seq = read_seqbegin(&rename_lock);
+		pit->next_seq = read_seqcount_begin(&pit->path.dentry->d_seq);
+		rcu_read_lock();
+	} else {
+		path_get(&pit->path);
+	}
+}
+
 /**
  * path_walk_parent - Walk to the parent of path
  * @path: input and output path.
@@ -1446,35 +1470,92 @@ static bool choose_mountpoint(struct mount *m, const struct path *root,
  *           // stop walking
  *
  * Returns:
- *  true  - if @path is updated to its parent.
- *  false - if @path is already the root (real root or @root).
+ *  PATH_WALK_PARENT_UPDATED      - if @path is updated to its parent.
+ *  PATH_WALK_PARENT_ALREADY_ROOT - if @path is already the root (real root or @root).
+ *  PATH_WALK_PARENT_RETRY        - reference-less path walk failed. Caller should restart with rcu == false.
  */
-bool path_walk_parent(struct path *path, const struct path *root)
+int path_walk_parent(struct parent_iterator *pit, struct path *next_parent)
 {
 	struct dentry *parent;
+	struct path *path = &pit->path;
+	struct path *root = &pit->root;
+	unsigned mountpoint_d_seq;
 
 	if (path_equal(path, root))
 		return false;
 
 	if (unlikely(path->dentry == path->mnt->mnt_root)) {
-		struct path p;
+		struct path upper_mountpoint;
 
-		if (!choose_mountpoint(real_mount(path->mnt), root, &p))
-			return false;
-		path_put(path);
-		*path = p;
+		if (pit->rcu) {
+			if (!choose_mountpoint_rcu(real_mount(path->mnt), root,
+						   &upper_mountpoint,
+						   &mountpoint_d_seq)) {
+				return PATH_WALK_PARENT_ALREADY_ROOT;
+			}
+			if (read_seqcount_retry(&path->dentry->d_seq,
+						pit->next_seq)) {
+				return PATH_WALK_PARENT_RETRY;
+			}
+			*path = upper_mountpoint;
+			pit->next_seq = mountpoint_d_seq;
+		} else {
+			if (!choose_mountpoint(real_mount(path->mnt), root,
+					       &upper_mountpoint))
+				return PATH_WALK_PARENT_ALREADY_ROOT;
+			path_put(path);
+			*path = upper_mountpoint;
+		}
 	}
 
 	if (unlikely(IS_ROOT(path->dentry)))
-		return false;
+		return PATH_WALK_PARENT_ALREADY_ROOT;
 
-	parent = dget_parent(path->dentry);
-	dput(path->dentry);
-	path->dentry = parent;
-	return true;
+	if (pit->rcu) {
+		parent = READ_ONCE(path->dentry->d_parent);
+		if (read_seqcount_retry(&path->dentry->d_seq, pit->next_seq)) {
+			return PATH_WALK_PARENT_RETRY;
+		}
+		path->dentry = parent;
+		pit->next_seq = read_seqcount_begin(&parent->d_seq);
+	} else {
+		parent = dget_parent(path->dentry);
+		dput(path->dentry);
+		path->dentry = parent;
+	}
+
+	if (next_parent)
+		*next_parent = *path;
+
+	return PATH_WALK_PARENT_UPDATED;
 }
 EXPORT_SYMBOL_GPL(path_walk_parent);
 
+/**
+ * releases rcu read lock if rcu == true.
+ * Returns -EAGAIN if rcu path walk failed.
+ */
+int path_walk_parent_end(struct parent_iterator *pit)
+{
+	bool need_restart = false;
+
+	if (pit->rcu) {
+		rcu_read_unlock();
+		/* do we need these if we're checking d_seq throughout? */
+		if (read_seqretry(&mount_lock, pit->m_seq) ||
+		    read_seqretry(&rename_lock, pit->r_seq)) {
+			need_restart = true;
+		}
+	} else {
+		path_put(&pit->path);
+	}
+
+	if (need_restart)
+		return -EAGAIN;
+
+	return 0;
+}
+
 /*
  * Perform an automount
  * - return -EISDIR to tell follow_managed() to stop and return the path we
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 9d220b1e823c..e7fdfae12bd5 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -86,7 +86,24 @@ extern int follow_down_one(struct path *);
 extern int follow_down(struct path *path, unsigned int flags);
 extern int follow_up(struct path *);
 
-bool path_walk_parent(struct path *path, const struct path *root);
+struct parent_iterator {
+	struct path path;
+	struct path root;
+	bool rcu;
+	/* expected seq of path->dentry */
+	unsigned next_seq;
+	unsigned m_seq, r_seq;
+};
+
+#define PATH_WALK_PARENT_UPDATED		0
+#define PATH_WALK_PARENT_ALREADY_ROOT	-1
+#define PATH_WALK_PARENT_RETRY			-2
+
+void path_walk_parent_start(struct parent_iterator *pit,
+			    const struct path *path, const struct path *root,
+			    bool ref_less);
+int path_walk_parent(struct parent_iterator *pit, struct path *next_parent);
+int path_walk_parent_end(struct parent_iterator *pit);
 
 extern struct dentry *lock_rename(struct dentry *, struct dentry *);
 extern struct dentry *lock_rename_child(struct dentry *, struct dentry *);
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 3adac544dc9e..522ac617d192 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -375,6 +375,9 @@ find_rule(const struct landlock_ruleset *const domain,
 		return NULL;
 
 	inode = d_backing_inode(dentry);
+	if (unlikely(!inode))
+		/* this can happen in reference-less path walk. Let outside retry. */
+		return NULL;
 	rcu_read_lock();
 	id.key.object = rcu_dereference(landlock_inode(inode)->object);
 	rule = landlock_find_rule(domain, id);
@@ -767,10 +770,15 @@ static bool is_access_to_paths_allowed(
 	     child1_is_directory = true, child2_is_directory = true;
 	struct path walker_path;
 	access_mask_t access_masked_parent1, access_masked_parent2;
+	layer_mask_t _layer_mask_parent_1_init[LANDLOCK_NUM_ACCESS_FS],
+		_layer_mask_parent_2_init[LANDLOCK_NUM_ACCESS_FS];
 	layer_mask_t _layer_masks_child1[LANDLOCK_NUM_ACCESS_FS],
 		_layer_masks_child2[LANDLOCK_NUM_ACCESS_FS];
 	layer_mask_t(*layer_masks_child1)[LANDLOCK_NUM_ACCESS_FS] = NULL,
 	(*layer_masks_child2)[LANDLOCK_NUM_ACCESS_FS] = NULL;
+	struct parent_iterator iter;
+	bool restart_pathwalk = false;
+	int err;
 
 	if (!access_request_parent1 && !access_request_parent2)
 		return true;
@@ -784,6 +792,15 @@ static bool is_access_to_paths_allowed(
 	if (WARN_ON_ONCE(!layer_masks_parent1))
 		return false;
 
+	memcpy(_layer_mask_parent_1_init, layer_masks_parent1,
+	       sizeof(*layer_masks_parent1));
+	if (unlikely(layer_masks_parent2)) {
+		memcpy(_layer_mask_parent_2_init, layer_masks_parent2,
+		       sizeof(*layer_masks_parent2));
+	}
+
+restart_pathwalk:
+
 	allowed_parent1 = is_layer_masks_allowed(layer_masks_parent1);
 
 	if (unlikely(layer_masks_parent2)) {
@@ -830,15 +847,15 @@ static bool is_access_to_paths_allowed(
 		child2_is_directory = d_is_dir(dentry_child2);
 	}
 
+	path_walk_parent_start(&iter, path, NULL, !restart_pathwalk);
 	walker_path = *path;
-	path_get(&walker_path);
+
 	/*
 	 * We need to walk through all the hierarchy to not miss any relevant
 	 * restriction.
 	 */
 	while (true) {
 		const struct landlock_rule *rule;
-		struct path root = {};
 
 		/*
 		 * If at least all accesses allowed on the destination are
@@ -896,8 +913,22 @@ static bool is_access_to_paths_allowed(
 		if (allowed_parent1 && allowed_parent2)
 			break;
 
-		if (path_walk_parent(&walker_path, &root))
+		switch (path_walk_parent(&iter, &walker_path)) {
+		case PATH_WALK_PARENT_UPDATED:
 			continue;
+		case PATH_WALK_PARENT_RETRY:
+			path_walk_parent_end(&iter);
+			memcpy(layer_masks_parent1, _layer_mask_parent_1_init,
+			       sizeof(*layer_masks_parent1));
+			if (layer_masks_parent2)
+				memcpy(layer_masks_parent2,
+				       _layer_mask_parent_2_init,
+				       sizeof(*layer_masks_parent2));
+			restart_pathwalk = true;
+			goto restart_pathwalk;
+		case PATH_WALK_PARENT_ALREADY_ROOT:
+			break;
+		}
 
 		if (unlikely(IS_ROOT(walker_path.dentry))) {
 			/*
@@ -913,7 +944,17 @@ static bool is_access_to_paths_allowed(
 		}
 		break;
 	}
-	path_put(&walker_path);
+
+	err = path_walk_parent_end(&iter);
+	if (err == -EAGAIN) {
+		memcpy(layer_masks_parent1, _layer_mask_parent_1_init,
+		       sizeof(*layer_masks_parent1));
+		if (layer_masks_parent2)
+			memcpy(layer_masks_parent2, _layer_mask_parent_2_init,
+			       sizeof(*layer_masks_parent2));
+		restart_pathwalk = true;
+		goto restart_pathwalk;
+	}
 
 	if (!allowed_parent1) {
 		log_request_parent1->type = LANDLOCK_REQUEST_FS_ACCESS;

base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
prerequisite-patch-id: 3a08c744682d5b01f98d196b3d3320b862d189c8
prerequisite-patch-id: 37586287398318c9896395b186f0809da1b0b81d
prerequisite-patch-id: 990fee8b55dae8d26bcf05a953e37988dd83d563
prerequisite-patch-id: 7f95cfaeaed0b5b30206b81691367fa520244526
prerequisite-patch-id: e10506d21bc71ff99db81bf5ab46ddfec98d1fca
prerequisite-patch-id: 8785acb37f7cf6fb62dcbdbf0a14338c04fe342f
-- 
2.49.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 0/5] bpf path iterator
  2025-06-06 21:30 [PATCH v3 bpf-next 0/5] bpf path iterator Song Liu
                   ` (5 preceding siblings ...)
  2025-06-08 17:32 ` [PATCH v3 bpf-next 0/5] bpf path iterator Tingmao Wang
@ 2025-06-08 17:32 ` Tingmao Wang
  6 siblings, 0 replies; 30+ messages in thread
From: Tingmao Wang @ 2025-06-08 17:32 UTC (permalink / raw)
  To: Song Liu, Mickaël Salaün, Al Viro, Christian Brauner
  Cc: Tingmao Wang, amir73il, andrii, ast, bpf, daniel, eddyz87, gnoack,
	jack, jlayton, josef, kernel-team, kpsingh, linux-fsdevel,
	linux-kernel, linux-security-module, martin.lau, mattbobrowski,
	repnop

Update bpf_fs_kfuncs to match path_walk_parent changes.

It compiles, but I've not tested this yet.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 fs/bpf_fs_kfuncs.c | 55 +++++++++++++++++++++++-----------------------
 1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 8c618154df0a..6599342dd0de 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -327,23 +327,18 @@ __bpf_kfunc_end_defs();
 
 /* open-coded path iterator */
 struct bpf_iter_path {
-	__u64 __opaque[2];
-} __aligned(8);
-
-struct bpf_iter_path_kern {
-	struct path path;
+	__u64 __opaque[sizeof(struct parent_iterator) / sizeof(__u64)];
 } __aligned(8);
 
 __bpf_kfunc_start_defs();
 
-__bpf_kfunc int bpf_iter_path_new(struct bpf_iter_path *it,
-				  struct path *start,
+__bpf_kfunc int bpf_iter_path_new(struct bpf_iter_path *it, struct path *start,
 				  __u64 flags)
 {
-	struct bpf_iter_path_kern *kit = (void *)it;
+	struct parent_iterator *pit = (void *)it;
 
-	BUILD_BUG_ON(sizeof(*kit) > sizeof(*it));
-	BUILD_BUG_ON(__alignof__(*kit) != __alignof__(*it));
+	BUILD_BUG_ON(sizeof(*pit) > sizeof(*it));
+	BUILD_BUG_ON(__alignof__(*pit) != __alignof__(*it));
 
 	if (flags) {
 		/*
@@ -351,45 +346,51 @@ __bpf_kfunc int bpf_iter_path_new(struct bpf_iter_path *it,
 		 * kit->path so that it be passed to path_put() safely.
 		 * Note: path_put() is no-op for zero'ed path.
 		 */
-		memset(&kit->path, 0, sizeof(struct path));
+		memset(pit, 0, sizeof(*pit));
 		return -EINVAL;
 	}
 
-	kit->path = *start;
-	path_get(&kit->path);
-
-	return 0;
-}
-
-__bpf_kfunc struct path *bpf_iter_path_next(struct bpf_iter_path *it)
-{
-	struct bpf_iter_path_kern *kit = (void *)it;
-	struct path root = {};
-
 	/*
-	 * "root" is zero'ed. Therefore, unless the loop is explicitly
+	 * "root" is NULL. Therefore, unless the loop is explicitly
 	 * terminated, bpf_iter_path_next() will continue looping until
 	 * we've reached the global root of the VFS.
 	 *
 	 * If a root of walk is needed, the user can check "path" against
 	 * that root on each iteration.
 	 */
-	if (!path_walk_parent(&kit->path, &root)) {
+	path_walk_parent_start(pit, start, NULL, false);
+
+	return 0;
+}
+
+__bpf_kfunc struct path *bpf_iter_path_next(struct bpf_iter_path *it)
+{
+	struct parent_iterator *pit = (void *)it;
+	struct path p;
+
+	switch (path_walk_parent(pit, &p)) {
+	case PATH_WALK_PARENT_UPDATED:
+		return &pit->path;
+	case PATH_WALK_PARENT_ALREADY_ROOT:
 		/*
 		 * Return NULL, but keep valid kit->path. _destroy() will
 		 * always path_put(&kit->path).
 		 */
 		return NULL;
+	default:
+		WARN_ONCE(
+			1,
+			"did not expect any other return from path_walk_parent");
 	}
 
-	return &kit->path;
+	return &pit->path;
 }
 
 __bpf_kfunc void bpf_iter_path_destroy(struct bpf_iter_path *it)
 {
-	struct bpf_iter_path_kern *kit = (void *)it;
+	struct parent_iterator *pit = (void *)it;
 
-	path_put(&kit->path);
+	path_walk_parent_end(pit);
 }
 
 __bpf_kfunc_end_defs();
-- 
2.49.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 2/5] landlock: Use path_walk_parent()
  2025-06-06 21:30 ` [PATCH v3 bpf-next 2/5] landlock: Use path_walk_parent() Song Liu
@ 2025-06-08 18:45   ` Tingmao Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Tingmao Wang @ 2025-06-08 18:45 UTC (permalink / raw)
  To: Song Liu, Mickaël Salaün
  Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module,
	kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, gnoack

On 6/6/25 22:30, Song Liu wrote:
> Use path_walk_parent() to walk a path up to its parent.
> 
> No functional changes intended.
> 
> Signed-off-by: Song Liu <song@kernel.org>

There is also path walk code in collect_domain_accesses even though that
one doesn't walk pass mount points.  Not sure if that one should be
updated to use this helper as well, or maybe fine to keep using
dget_parent.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 0/5] bpf path iterator
  2025-06-08 17:32 ` [PATCH v3 bpf-next 0/5] bpf path iterator Tingmao Wang
@ 2025-06-09  6:23   ` Song Liu
  2025-06-09  8:08     ` Tingmao Wang
  0 siblings, 1 reply; 30+ messages in thread
From: Song Liu @ 2025-06-09  6:23 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Mickaël Salaün, Al Viro, Christian Brauner, amir73il,
	andrii, ast, bpf, daniel, eddyz87, gnoack, jack, jlayton, josef,
	kernel-team, kpsingh, linux-fsdevel, linux-kernel,
	linux-security-module, martin.lau, mattbobrowski, repnop

On Sun, Jun 8, 2025 at 10:34 AM Tingmao Wang <m@maowtm.org> wrote:
[...]
> Hi Song, Christian, Al and others,
>
> Previously I proposed in [1] to add ability to do a reference-less parent
> walk for Landlock.  However, as Christian pointed out and I do agree in
> hindsight, it is not a good idea to do things like this in non-VFS code.
>
> However, I still think this is valuable to consider given the performance
> improvement, and after some discussion with Mickaël, I would like to
> propose extending Song's helper to support such usage.  While I recognize
> that this patch series is already in its v3, and I do not want to delay it
> by too much, putting this proposal out now is still better than after this
> has merged, so that we may consider signature changes.
>
> I've created a proof-of-concept and did some brief testing.  The
> performance improvement attained here is the same as in [1] (with a "git
> status" workload, median landlock overhead 35% -> 28%, median time in
> landlock decreases by 26.6%).
>
> If this idea is accepted, I'm happy to work on it further, split out this
> patch, update the comments and do more testing etc, potentially in
> collaboration with Song.
>
> An alternative to this is perhaps to add a new helper
> path_walk_parent_rcu, also living in namei.c, that will be used directly
> by Landlock.  I'm happy to do it either way, but with some experimentation
> I personally think that the code in this patch is still clean enough, and
> can avoid some duplication.
>
> Patch title: path_walk_parent: support reference-less walk
>
> A later commit will update the BPF path iterator to use this.
>
> Signed-off-by: Tingmao Wang <m@maowtm.org>
[...]
>
> -bool path_walk_parent(struct path *path, const struct path *root);
> +struct parent_iterator {
> +       struct path path;
> +       struct path root;
> +       bool rcu;
> +       /* expected seq of path->dentry */
> +       unsigned next_seq;
> +       unsigned m_seq, r_seq;

Most of parent_iterator is not really used by reference walk.
So it is probably just separate the two APIs?

Also, is it ok to make m_seq and r_seq available out of fs/?

> +};
> +
> +#define PATH_WALK_PARENT_UPDATED               0
> +#define PATH_WALK_PARENT_ALREADY_ROOT  -1
> +#define PATH_WALK_PARENT_RETRY                 -2
> +
> +void path_walk_parent_start(struct parent_iterator *pit,
> +                           const struct path *path, const struct path *root,
> +                           bool ref_less);
> +int path_walk_parent(struct parent_iterator *pit, struct path *next_parent);
> +int path_walk_parent_end(struct parent_iterator *pit);

I think it is better to make this rcu walk a separate set of APIs.
IOW, we will have:

int path_walk_parent(struct path *path, struct path *root);

and

void path_walk_parent_rcu_start(struct parent_iterator *pit,
                           const struct path *path, const struct path *root);
int path_walk_parent_rcu_next(struct parent_iterator *pit, struct path
*next_parent);
int path_walk_parent_rcu_end(struct parent_iterator *pit);

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 0/5] bpf path iterator
  2025-06-09  6:23   ` Song Liu
@ 2025-06-09  8:08     ` Tingmao Wang
  2025-06-11 11:36       ` Christian Brauner
  0 siblings, 1 reply; 30+ messages in thread
From: Tingmao Wang @ 2025-06-09  8:08 UTC (permalink / raw)
  To: Song Liu
  Cc: Mickaël Salaün, Al Viro, Christian Brauner, amir73il,
	andrii, ast, bpf, daniel, eddyz87, gnoack, jack, jlayton, josef,
	kernel-team, kpsingh, linux-fsdevel, linux-kernel,
	linux-security-module, martin.lau, mattbobrowski, repnop

On 6/9/25 07:23, Song Liu wrote:
> On Sun, Jun 8, 2025 at 10:34 AM Tingmao Wang <m@maowtm.org> wrote:
> [...]
>> Hi Song, Christian, Al and others,
>>
>> Previously I proposed in [1] to add ability to do a reference-less parent
>> walk for Landlock.  However, as Christian pointed out and I do agree in
>> hindsight, it is not a good idea to do things like this in non-VFS code.
>>
>> However, I still think this is valuable to consider given the performance
>> improvement, and after some discussion with Mickaël, I would like to
>> propose extending Song's helper to support such usage.  While I recognize
>> that this patch series is already in its v3, and I do not want to delay it
>> by too much, putting this proposal out now is still better than after this
>> has merged, so that we may consider signature changes.
>>
>> I've created a proof-of-concept and did some brief testing.  The
>> performance improvement attained here is the same as in [1] (with a "git
>> status" workload, median landlock overhead 35% -> 28%, median time in
>> landlock decreases by 26.6%).
>>
>> If this idea is accepted, I'm happy to work on it further, split out this
>> patch, update the comments and do more testing etc, potentially in
>> collaboration with Song.
>>
>> An alternative to this is perhaps to add a new helper
>> path_walk_parent_rcu, also living in namei.c, that will be used directly
>> by Landlock.  I'm happy to do it either way, but with some experimentation
>> I personally think that the code in this patch is still clean enough, and
>> can avoid some duplication.
>>
>> Patch title: path_walk_parent: support reference-less walk
>>
>> A later commit will update the BPF path iterator to use this.
>>
>> Signed-off-by: Tingmao Wang <m@maowtm.org>
> [...]
>>
>> -bool path_walk_parent(struct path *path, const struct path *root);
>> +struct parent_iterator {
>> +       struct path path;
>> +       struct path root;
>> +       bool rcu;
>> +       /* expected seq of path->dentry */
>> +       unsigned next_seq;
>> +       unsigned m_seq, r_seq;
> 
> Most of parent_iterator is not really used by reference walk.
> So it is probably just separate the two APIs?

I don't mind either way, but I feel like it might be nice to just have one
style of APIs (i.e. an iterator with start / end / next vs just one
function), even though this is not totally necessary for the ref-taking
walk.  After all, the BPF use case is iterator-based.  This also means
that the code at the user's side (mostly thinking of Landlock here) is
slightly simpler.

But I've not experimented with the other way.  I'm open to both, and I'm
happy to send a patch later for a separate API (in that case that would
not depend on this and I might just start a new series).

Would like to hear what VFS folks thinks of this first tho, and whether
there's any preference in one or two APIs.

> 
> Also, is it ok to make m_seq and r_seq available out of fs/?

The struct is not intended to be used directly by code outside.  Not sure
what is the standard way to do this but we can make it private by e.g.
putting the seq values in another struct, if needed.  Alternatively I
think we can hide the entire struct behind an opaque pointer by doing the
allocation ourselves.

> 
>> +};
>> +
>> +#define PATH_WALK_PARENT_UPDATED               0
>> +#define PATH_WALK_PARENT_ALREADY_ROOT  -1
>> +#define PATH_WALK_PARENT_RETRY                 -2
>> +
>> +void path_walk_parent_start(struct parent_iterator *pit,
>> +                           const struct path *path, const struct path *root,
>> +                           bool ref_less);
>> +int path_walk_parent(struct parent_iterator *pit, struct path *next_parent);
>> +int path_walk_parent_end(struct parent_iterator *pit);
> 
> I think it is better to make this rcu walk a separate set of APIs.
> IOW, we will have:
> 
> int path_walk_parent(struct path *path, struct path *root);
> 
> and
> 
> void path_walk_parent_rcu_start(struct parent_iterator *pit,
>                            const struct path *path, const struct path *root);
> int path_walk_parent_rcu_next(struct parent_iterator *pit, struct path
> *next_parent);
> int path_walk_parent_rcu_end(struct parent_iterator *pit);

(replied above)

> 
> Thanks,
> Song
> 
> [...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-06 21:30 ` [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent() Song Liu
@ 2025-06-10 17:18   ` Mickaël Salaün
  2025-06-10 17:26     ` Song Liu
  2025-06-10 23:34   ` NeilBrown
  1 sibling, 1 reply; 30+ messages in thread
From: Mickaël Salaün @ 2025-06-10 17:18 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module,
	kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, gnoack, m

On Fri, Jun 06, 2025 at 02:30:11PM -0700, Song Liu wrote:
> This helper walks an input path to its parent. Logic are added to handle
> walking across mount tree.
> 
> This will be used by landlock, and BPF LSM.
> 
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  fs/namei.c            | 51 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/namei.h |  2 ++
>  2 files changed, 53 insertions(+)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 4bb889fc980b..f02183e9c073 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1424,6 +1424,57 @@ static bool choose_mountpoint(struct mount *m, const struct path *root,
>  	return found;
>  }
>  
> +/**
> + * path_walk_parent - Walk to the parent of path
> + * @path: input and output path.
> + * @root: root of the path walk, do not go beyond this root. If @root is
> + *        zero'ed, walk all the way to real root.
> + *
> + * Given a path, find the parent path. Replace @path with the parent path.
> + * If we were already at the real root or a disconnected root, @path is
> + * not changed.
> + *
> + * The logic of path_walk_parent() is similar to follow_dotdot(), except
> + * that path_walk_parent() will continue walking for !path_connected case.
> + * This effectively means we are walking from disconnected bind mount to
> + * the original mount. If this behavior is not desired, the caller can add
> + * a check like:
> + *
> + *   if (path_walk_parent(&path) && !path_connected(path.mnt, path.dentry)
> + *           // continue walking
> + *   else
> + *           // stop walking
> + *
> + * Returns:
> + *  true  - if @path is updated to its parent.
> + *  false - if @path is already the root (real root or @root).
> + */
> +bool path_walk_parent(struct path *path, const struct path *root)
> +{
> +	struct dentry *parent;
> +
> +	if (path_equal(path, root))
> +		return false;
> +
> +	if (unlikely(path->dentry == path->mnt->mnt_root)) {
> +		struct path p;
> +
> +		if (!choose_mountpoint(real_mount(path->mnt), root, &p))
> +			return false;
> +		path_put(path);
> +		*path = p;
> +	}
> +
> +	if (unlikely(IS_ROOT(path->dentry)))

path would be updated while false is returned, which is not correct.

> +		return false;
> +
> +	parent = dget_parent(path->dentry);
> +	dput(path->dentry);
> +	path->dentry = parent;
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(path_walk_parent);
> +
>  /*
>   * Perform an automount
>   * - return -EISDIR to tell follow_managed() to stop and return the path we
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 5d085428e471..cba5373ecf86 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -85,6 +85,8 @@ extern int follow_down_one(struct path *);
>  extern int follow_down(struct path *path, unsigned int flags);
>  extern int follow_up(struct path *);
>  
> +bool path_walk_parent(struct path *path, const struct path *root);
> +
>  extern struct dentry *lock_rename(struct dentry *, struct dentry *);
>  extern struct dentry *lock_rename_child(struct dentry *, struct dentry *);
>  extern void unlock_rename(struct dentry *, struct dentry *);
> -- 
> 2.47.1
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-10 17:18   ` Mickaël Salaün
@ 2025-06-10 17:26     ` Song Liu
  2025-06-10 22:26       ` Tingmao Wang
  0 siblings, 1 reply; 30+ messages in thread
From: Song Liu @ 2025-06-10 17:26 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module,
	kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, gnoack, m

On Tue, Jun 10, 2025 at 10:19 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Fri, Jun 06, 2025 at 02:30:11PM -0700, Song Liu wrote:
> > This helper walks an input path to its parent. Logic are added to handle
> > walking across mount tree.
> >
> > This will be used by landlock, and BPF LSM.
> >
> > Signed-off-by: Song Liu <song@kernel.org>
> > ---
> >  fs/namei.c            | 51 +++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/namei.h |  2 ++
> >  2 files changed, 53 insertions(+)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 4bb889fc980b..f02183e9c073 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -1424,6 +1424,57 @@ static bool choose_mountpoint(struct mount *m, const struct path *root,
> >       return found;
> >  }
> >
> > +/**
> > + * path_walk_parent - Walk to the parent of path
> > + * @path: input and output path.
> > + * @root: root of the path walk, do not go beyond this root. If @root is
> > + *        zero'ed, walk all the way to real root.
> > + *
> > + * Given a path, find the parent path. Replace @path with the parent path.
> > + * If we were already at the real root or a disconnected root, @path is
> > + * not changed.
> > + *
> > + * The logic of path_walk_parent() is similar to follow_dotdot(), except
> > + * that path_walk_parent() will continue walking for !path_connected case.
> > + * This effectively means we are walking from disconnected bind mount to
> > + * the original mount. If this behavior is not desired, the caller can add
> > + * a check like:
> > + *
> > + *   if (path_walk_parent(&path) && !path_connected(path.mnt, path.dentry)
> > + *           // continue walking
> > + *   else
> > + *           // stop walking
> > + *
> > + * Returns:
> > + *  true  - if @path is updated to its parent.
> > + *  false - if @path is already the root (real root or @root).
> > + */
> > +bool path_walk_parent(struct path *path, const struct path *root)
> > +{
> > +     struct dentry *parent;
> > +
> > +     if (path_equal(path, root))
> > +             return false;
> > +
> > +     if (unlikely(path->dentry == path->mnt->mnt_root)) {
> > +             struct path p;
> > +
> > +             if (!choose_mountpoint(real_mount(path->mnt), root, &p))
> > +                     return false;
> > +             path_put(path);
> > +             *path = p;
> > +     }
> > +
> > +     if (unlikely(IS_ROOT(path->dentry)))
>
> path would be updated while false is returned, which is not correct.

Good catch.. How about the following:

bool path_walk_parent(struct path *path, const struct path *root)
{
        struct dentry *parent;
        bool ret = false;

        if (path_equal(path, root))
                return false;

        if (unlikely(path->dentry == path->mnt->mnt_root)) {
                struct path p;

                if (!choose_mountpoint(real_mount(path->mnt), root, &p))
                        return false;
                path_put(path);
                *path = p;
                ret = true;
        }

        if (unlikely(IS_ROOT(path->dentry)))
                return ret;

        parent = dget_parent(path->dentry);
        dput(path->dentry);
        path->dentry = parent;
        return true;
}

Thanks,
Song

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-10 17:26     ` Song Liu
@ 2025-06-10 22:26       ` Tingmao Wang
  2025-06-10 22:34         ` Tingmao Wang
  2025-06-10 23:08         ` Song Liu
  0 siblings, 2 replies; 30+ messages in thread
From: Tingmao Wang @ 2025-06-10 22:26 UTC (permalink / raw)
  To: Song Liu, Mickaël Salaün
  Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module,
	kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, gnoack

On 6/10/25 18:26, Song Liu wrote:
> On Tue, Jun 10, 2025 at 10:19 AM Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On Fri, Jun 06, 2025 at 02:30:11PM -0700, Song Liu wrote:
>>> [...]
>>> + * Returns:
>>> + *  true  - if @path is updated to its parent.
>>> + *  false - if @path is already the root (real root or @root).
>>> + */
>>> +bool path_walk_parent(struct path *path, const struct path *root)
>>> +{
>>> +     struct dentry *parent;
>>> +
>>> +     if (path_equal(path, root))
>>> +             return false;
>>> +
>>> +     if (unlikely(path->dentry == path->mnt->mnt_root)) {
>>> +             struct path p;
>>> +
>>> +             if (!choose_mountpoint(real_mount(path->mnt), root, &p))
>>> +                     return false;
>>> +             path_put(path);
>>> +             *path = p;
>>> +     }
>>> +
>>> +     if (unlikely(IS_ROOT(path->dentry)))
>>
>> path would be updated while false is returned, which is not correct.
> 
> Good catch.. How about the following:
> 
> bool path_walk_parent(struct path *path, const struct path *root)
> {
>         struct dentry *parent;
>         bool ret = false;
> 
>         if (path_equal(path, root))
>                 return false;
> 
>         if (unlikely(path->dentry == path->mnt->mnt_root)) {
>                 struct path p;
> 
>                 if (!choose_mountpoint(real_mount(path->mnt), root, &p))
>                         return false;
>                 path_put(path);
>                 *path = p;
>                 ret = true;
>         }
> 
>         if (unlikely(IS_ROOT(path->dentry)))
>                 return ret;

Returning true here would be the wrong semantic right?  This whole thing
is only possible when some mount shadows "/".  Say if you have a landlock
rule on the old "/", but then we mount a new "/" and chroot into it (via
"/.."), the landlock rule on the old "/" should not apply, but if we
change *path and return true here then this will "expose" that old "/" to
landlock.

A quick suggestion although I haven't tested anything - maybe we should do
a special case check for IS_ROOT inside the
    if (unlikely(path->dentry == path->mnt->mnt_root))
? Before "path_put(path);", if IS_ROOT(p.dentry) then we just path_get(p)
and return false.

> 
>         parent = dget_parent(path->dentry);
>         dput(path->dentry);
>         path->dentry = parent;
>         return true;
> }
> 
> Thanks,
> Song


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-10 22:26       ` Tingmao Wang
@ 2025-06-10 22:34         ` Tingmao Wang
  2025-06-10 23:08         ` Song Liu
  1 sibling, 0 replies; 30+ messages in thread
From: Tingmao Wang @ 2025-06-10 22:34 UTC (permalink / raw)
  To: Song Liu, Mickaël Salaün
  Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module,
	kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, gnoack

On 6/10/25 23:26, Tingmao Wang wrote:
> [...]
> 
> A quick suggestion although I haven't tested anything - maybe we should do
> a special case check for IS_ROOT inside the
>     if (unlikely(path->dentry == path->mnt->mnt_root))
> ? Before "path_put(path);", if IS_ROOT(p.dentry) then we just path_get(p)
                                                                     ^^^ sorry I meant put
> and return false.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-10 22:26       ` Tingmao Wang
  2025-06-10 22:34         ` Tingmao Wang
@ 2025-06-10 23:08         ` Song Liu
  2025-06-11  0:23           ` Tingmao Wang
  1 sibling, 1 reply; 30+ messages in thread
From: Song Liu @ 2025-06-10 23:08 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Mickaël Salaün, bpf, linux-fsdevel, linux-kernel,
	linux-security-module, kernel-team, andrii, eddyz87, ast, daniel,
	martin.lau, viro, brauner, jack, kpsingh, mattbobrowski, amir73il,
	repnop, jlayton, josef, gnoack

On Tue, Jun 10, 2025 at 3:26 PM Tingmao Wang <m@maowtm.org> wrote:
[..]
> >
> >                 if (!choose_mountpoint(real_mount(path->mnt), root, &p))
> >                         return false;
> >                 path_put(path);
> >                 *path = p;
> >                 ret = true;
> >         }
> >
> >         if (unlikely(IS_ROOT(path->dentry)))
> >                 return ret;
>
> Returning true here would be the wrong semantic right?  This whole thing
> is only possible when some mount shadows "/".  Say if you have a landlock
> rule on the old "/", but then we mount a new "/" and chroot into it (via
> "/.."), the landlock rule on the old "/" should not apply, but if we
> change *path and return true here then this will "expose" that old "/" to
> landlock.

Could you please provide more specific information about this case?

Thanks,
Song

> A quick suggestion although I haven't tested anything - maybe we should do
> a special case check for IS_ROOT inside the
>     if (unlikely(path->dentry == path->mnt->mnt_root))
> ? Before "path_put(path);", if IS_ROOT(p.dentry) then we just path_get(p)
> and return false.
>
> >
> >         parent = dget_parent(path->dentry);
> >         dput(path->dentry);
> >         path->dentry = parent;
> >         return true;
> > }
> >
> > Thanks,
> > Song
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-06 21:30 ` [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent() Song Liu
  2025-06-10 17:18   ` Mickaël Salaün
@ 2025-06-10 23:34   ` NeilBrown
  2025-06-11  0:56     ` Song Liu
  1 sibling, 1 reply; 30+ messages in thread
From: NeilBrown @ 2025-06-10 23:34 UTC (permalink / raw)
  To: Song Liu, Jan Kara
  Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module,
	kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, mic, gnoack, m, Song Liu

On Sat, 07 Jun 2025, Song Liu wrote:
> This helper walks an input path to its parent. Logic are added to handle
> walking across mount tree.
> 
> This will be used by landlock, and BPF LSM.
> 
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  fs/namei.c            | 51 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/namei.h |  2 ++
>  2 files changed, 53 insertions(+)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 4bb889fc980b..f02183e9c073 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1424,6 +1424,57 @@ static bool choose_mountpoint(struct mount *m, const struct path *root,
>  	return found;
>  }
>  
> +/**
> + * path_walk_parent - Walk to the parent of path
> + * @path: input and output path.
> + * @root: root of the path walk, do not go beyond this root. If @root is
> + *        zero'ed, walk all the way to real root.
> + *
> + * Given a path, find the parent path. Replace @path with the parent path.
> + * If we were already at the real root or a disconnected root, @path is
> + * not changed.
> + *
> + * The logic of path_walk_parent() is similar to follow_dotdot(), except
> + * that path_walk_parent() will continue walking for !path_connected case.
> + * This effectively means we are walking from disconnected bind mount to
> + * the original mount. If this behavior is not desired, the caller can add
> + * a check like:
> + *
> + *   if (path_walk_parent(&path) && !path_connected(path.mnt, path.dentry)
> + *           // continue walking
> + *   else
> + *           // stop walking
> + *
> + * Returns:
> + *  true  - if @path is updated to its parent.
> + *  false - if @path is already the root (real root or @root).
> + */
> +bool path_walk_parent(struct path *path, const struct path *root)
> +{
> +	struct dentry *parent;
> +
> +	if (path_equal(path, root))
> +		return false;
> +
> +	if (unlikely(path->dentry == path->mnt->mnt_root)) {
> +		struct path p;
> +
> +		if (!choose_mountpoint(real_mount(path->mnt), root, &p))
> +			return false;
> +		path_put(path);
> +		*path = p;
> +	}
> +
> +	if (unlikely(IS_ROOT(path->dentry)))
> +		return false;
> +
> +	parent = dget_parent(path->dentry);
> +	dput(path->dentry);
> +	path->dentry = parent;
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(path_walk_parent);

The above looks a lot like follow_dotdot().  This is good because it
means that it is likely correct.  But it is bad because it means there
are two copies of essentially the same code - making maintenance harder.

I think it would be good to split the part that you want out of
follow_dotdot() and use that.  Something like the following.

You might need a small wrapper in landlock which would, for example,
pass LOOKUP_BENEATH and replace path->dentry with the parent on success.

NeilBrown

diff --git a/fs/namei.c b/fs/namei.c
index 4bb889fc980b..b81d07b4417b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2048,36 +2048,65 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd)
 	return nd->path.dentry;
 }
 
-static struct dentry *follow_dotdot(struct nameidata *nd)
+/**
+ * path_walk_parent - Find the parent of the given struct path
+ * @path  - The struct path to start from
+ * @root  - A struct path which serves as a boundary not to be crosses
+ * @flags - Some LOOKUP_ flags
+ *
+ * Find and return the dentry for the parent of the given path (mount/dentry).
+ * If the given path is the root of a mounted tree, it is first updated to
+ * the mount point on which that tree is mounted.
+ *
+ * If %LOOKUP_NO_XDEV is given, then *after* the path is updated to a new mount,
+ * the error EXDEV is returned.
+ * If no parent can be found, either because the tree is not mounted or because
+ * the @path matches the @root, then @path->dentry is returned unless @flags
+ * contains %LOOKUP_BENEATH, in which case -EXDEV is returned.
+ *
+ * Returns: either an ERR_PTR() or the chosen parent which will have had the
+ * refcount incremented.
+ */
+struct dentry *path_walk_parent(struct path *path, struct path *root, int flags)
 {
 	struct dentry *parent;
 
-	if (path_equal(&nd->path, &nd->root))
+	if (path_equal(path, root))
 		goto in_root;
-	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
-		struct path path;
+	if (unlikely(path->dentry == path->mnt->mnt_root)) {
+		struct path new_path;
 
-		if (!choose_mountpoint(real_mount(nd->path.mnt),
-				       &nd->root, &path))
+		if (!choose_mountpoint(real_mount(path->mnt),
+				       root, &new_path))
 			goto in_root;
-		path_put(&nd->path);
-		nd->path = path;
-		nd->inode = path.dentry->d_inode;
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+		path_put(path);
+		*path = new_path;
+		if (unlikely(flags & LOOKUP_NO_XDEV))
 			return ERR_PTR(-EXDEV);
 	}
 	/* rare case of legitimate dget_parent()... */
-	parent = dget_parent(nd->path.dentry);
+	parent = dget_parent(path->dentry);
+	return parent;
+
+in_root:
+	if (unlikely(flags & LOOKUP_BENEATH))
+		return ERR_PTR(-EXDEV);
+	return dget(path->dentry);
+}
+EXPORT_SYMBOL(path_walk_parent);
+
+static struct dentry *follow_dotdot(struct nameidata *nd)
+{
+	struct dentry *parent = path_walk_parent(&nd->path, &nd->root, nd->flags);
+
+	if (IS_ERR(parent))
+		return parent;
 	if (unlikely(!path_connected(nd->path.mnt, parent))) {
 		dput(parent);
 		return ERR_PTR(-ENOENT);
 	}
+	nd->inode = nd->path.dentry->d_inode;
 	return parent;
-
-in_root:
-	if (unlikely(nd->flags & LOOKUP_BENEATH))
-		return ERR_PTR(-EXDEV);
-	return dget(nd->path.dentry);
 }
 
 static const char *handle_dots(struct nameidata *nd, int type)
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 5d085428e471..4cc15a58d900 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -80,6 +80,7 @@ struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap,
 struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
 					    struct qstr *name,
 					    struct dentry *base);
+struct dentry *path_walk_parent(struct path *path, struct path *root, int flags);
 
 extern int follow_down_one(struct path *);
 extern int follow_down(struct path *path, unsigned int flags);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-10 23:08         ` Song Liu
@ 2025-06-11  0:23           ` Tingmao Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Tingmao Wang @ 2025-06-11  0:23 UTC (permalink / raw)
  To: Song Liu
  Cc: Mickaël Salaün, bpf, linux-fsdevel, linux-kernel,
	linux-security-module, kernel-team, andrii, eddyz87, ast, daniel,
	martin.lau, viro, brauner, jack, kpsingh, mattbobrowski, amir73il,
	repnop, jlayton, josef, gnoack

On 6/11/25 00:08, Song Liu wrote:
> On Tue, Jun 10, 2025 at 3:26 PM Tingmao Wang <m@maowtm.org> wrote:
> [..]
>>>
>>>                 if (!choose_mountpoint(real_mount(path->mnt), root, &p))
>>>                         return false;
>>>                 path_put(path);
>>>                 *path = p;
>>>                 ret = true;
>>>         }
>>>
>>>         if (unlikely(IS_ROOT(path->dentry)))
>>>                 return ret;
>>
>> Returning true here would be the wrong semantic right?  This whole thing
>> is only possible when some mount shadows "/".  Say if you have a landlock
>> rule on the old "/", but then we mount a new "/" and chroot into it (via
>> "/.."), the landlock rule on the old "/" should not apply, but if we
>> change *path and return true here then this will "expose" that old "/" to
>> landlock.
> 
> Could you please provide more specific information about this case?

Apologies, it looks like I was mistaken in the above statement.

I was thinking of something like

# mount --mkdir -t tmpfs none tmproot
# cp busybox tmproot/ && chmod +x tmproot/busybox
# mount --move tmproot /
# env LL_FS_RW=/ LL_FS_RO=/.. ./sandboxer chroot /.. /busybox sh
  # echo can write to root > /a
  sh: can't create /a: Permission denied
  ^^^^ this does not work, but I was mistakenly thinking it would

I think because choose_mountpoint_rcu only returns true if
    if (mountpoint != m->mnt.mnt_root)
passes, this situation won't cause ret to be true in your code.

But then I can't think of when
      if (unlikely(IS_ROOT(path->dentry)))
          return ret;
would ever return true, unless somehow d_parent is corrupted?  Maybe I'm
just missing something obvious here.

Anyway, since there's a suggestion from Neil to refactor this, this might
not be too important, so feel free to ignore for now.

> 
> Thanks,
> Song
> 
>> A quick suggestion although I haven't tested anything - maybe we should do
>> a special case check for IS_ROOT inside the
>>     if (unlikely(path->dentry == path->mnt->mnt_root))
>> ? Before "path_put(path);", if IS_ROOT(p.dentry) then we just path_get(p)
>> and return false.
>>
>>>
>>>         parent = dget_parent(path->dentry);
>>>         dput(path->dentry);
>>>         path->dentry = parent;
>>>         return true;
>>> }
>>>
>>> Thanks,
>>> Song
>>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-10 23:34   ` NeilBrown
@ 2025-06-11  0:56     ` Song Liu
  2025-06-11 15:42       ` Mickaël Salaün
  0 siblings, 1 reply; 30+ messages in thread
From: Song Liu @ 2025-06-11  0:56 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jan Kara, bpf, linux-fsdevel, linux-kernel, linux-security-module,
	kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, kpsingh, mattbobrowski, amir73il, repnop, jlayton, josef,
	mic, gnoack, m

Hi Neil,

Thanks for your suggestion! It does look like a good solution.

On Tue, Jun 10, 2025 at 4:34 PM NeilBrown <neil@brown.name> wrote:

> The above looks a lot like follow_dotdot().  This is good because it
> means that it is likely correct.  But it is bad because it means there
> are two copies of essentially the same code - making maintenance harder.
>
> I think it would be good to split the part that you want out of
> follow_dotdot() and use that.  Something like the following.
>
> You might need a small wrapper in landlock which would, for example,
> pass LOOKUP_BENEATH and replace path->dentry with the parent on success.
>
> NeilBrown
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 4bb889fc980b..b81d07b4417b 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2048,36 +2048,65 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd)
>         return nd->path.dentry;
>  }
>
> -static struct dentry *follow_dotdot(struct nameidata *nd)
> +/**
> + * path_walk_parent - Find the parent of the given struct path
> + * @path  - The struct path to start from
> + * @root  - A struct path which serves as a boundary not to be crosses
> + * @flags - Some LOOKUP_ flags
> + *
> + * Find and return the dentry for the parent of the given path (mount/dentry).
> + * If the given path is the root of a mounted tree, it is first updated to
> + * the mount point on which that tree is mounted.
> + *
> + * If %LOOKUP_NO_XDEV is given, then *after* the path is updated to a new mount,
> + * the error EXDEV is returned.
> + * If no parent can be found, either because the tree is not mounted or because
> + * the @path matches the @root, then @path->dentry is returned unless @flags
> + * contains %LOOKUP_BENEATH, in which case -EXDEV is returned.
> + *
> + * Returns: either an ERR_PTR() or the chosen parent which will have had the
> + * refcount incremented.
> + */
> +struct dentry *path_walk_parent(struct path *path, struct path *root, int flags)

We can probably call this __path_walk_parent() and make it static.

Then we can add an exported path_walk_parent() that calls
__path_walk_parent() and adds extra logic.

If this looks good to folks, I can draft v4 based on this idea.

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 0/5] bpf path iterator
  2025-06-09  8:08     ` Tingmao Wang
@ 2025-06-11 11:36       ` Christian Brauner
  2025-06-11 15:39         ` Mickaël Salaün
  0 siblings, 1 reply; 30+ messages in thread
From: Christian Brauner @ 2025-06-11 11:36 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Song Liu, Mickaël Salaün, Al Viro, amir73il, andrii,
	ast, bpf, daniel, eddyz87, gnoack, jack, jlayton, josef,
	kernel-team, kpsingh, linux-fsdevel, linux-kernel,
	linux-security-module, martin.lau, mattbobrowski, repnop

On Mon, Jun 09, 2025 at 09:08:34AM +0100, Tingmao Wang wrote:
> On 6/9/25 07:23, Song Liu wrote:
> > On Sun, Jun 8, 2025 at 10:34 AM Tingmao Wang <m@maowtm.org> wrote:
> > [...]
> >> Hi Song, Christian, Al and others,
> >>
> >> Previously I proposed in [1] to add ability to do a reference-less parent
> >> walk for Landlock.  However, as Christian pointed out and I do agree in
> >> hindsight, it is not a good idea to do things like this in non-VFS code.
> >>
> >> However, I still think this is valuable to consider given the performance
> >> improvement, and after some discussion with Mickaël, I would like to
> >> propose extending Song's helper to support such usage.  While I recognize
> >> that this patch series is already in its v3, and I do not want to delay it
> >> by too much, putting this proposal out now is still better than after this
> >> has merged, so that we may consider signature changes.
> >>
> >> I've created a proof-of-concept and did some brief testing.  The
> >> performance improvement attained here is the same as in [1] (with a "git
> >> status" workload, median landlock overhead 35% -> 28%, median time in
> >> landlock decreases by 26.6%).
> >>
> >> If this idea is accepted, I'm happy to work on it further, split out this
> >> patch, update the comments and do more testing etc, potentially in
> >> collaboration with Song.
> >>
> >> An alternative to this is perhaps to add a new helper
> >> path_walk_parent_rcu, also living in namei.c, that will be used directly
> >> by Landlock.  I'm happy to do it either way, but with some experimentation
> >> I personally think that the code in this patch is still clean enough, and
> >> can avoid some duplication.
> >>
> >> Patch title: path_walk_parent: support reference-less walk
> >>
> >> A later commit will update the BPF path iterator to use this.
> >>
> >> Signed-off-by: Tingmao Wang <m@maowtm.org>
> > [...]
> >>
> >> -bool path_walk_parent(struct path *path, const struct path *root);
> >> +struct parent_iterator {
> >> +       struct path path;
> >> +       struct path root;
> >> +       bool rcu;
> >> +       /* expected seq of path->dentry */
> >> +       unsigned next_seq;
> >> +       unsigned m_seq, r_seq;
> > 
> > Most of parent_iterator is not really used by reference walk.
> > So it is probably just separate the two APIs?
> 
> I don't mind either way, but I feel like it might be nice to just have one
> style of APIs (i.e. an iterator with start / end / next vs just one
> function), even though this is not totally necessary for the ref-taking
> walk.  After all, the BPF use case is iterator-based.  This also means
> that the code at the user's side (mostly thinking of Landlock here) is
> slightly simpler.
> 
> But I've not experimented with the other way.  I'm open to both, and I'm
> happy to send a patch later for a separate API (in that case that would
> not depend on this and I might just start a new series).
> 
> Would like to hear what VFS folks thinks of this first tho, and whether
> there's any preference in one or two APIs.

I really dislike exposing the sequence number for mounts an for
dentries. That's just nonsense and a non-VFS low-level consumer of this
API has zero business caring about any of that. It's easy to
misunderstand, it's easy to abuse so that's not a good way of doing
this. It's the wrong API.

> 
> > 
> > Also, is it ok to make m_seq and r_seq available out of fs/?

No, it's not.

> 
> The struct is not intended to be used directly by code outside.  Not sure

That doesn't mean anything. It's simply the wrong API if it has to spill
so much of its bowels.

> what is the standard way to do this but we can make it private by e.g.
> putting the seq values in another struct, if needed.  Alternatively I
> think we can hide the entire struct behind an opaque pointer by doing the
> allocation ourselves.
> 
> > 
> >> +};
> >> +
> >> +#define PATH_WALK_PARENT_UPDATED               0
> >> +#define PATH_WALK_PARENT_ALREADY_ROOT  -1
> >> +#define PATH_WALK_PARENT_RETRY                 -2
> >> +
> >> +void path_walk_parent_start(struct parent_iterator *pit,
> >> +                           const struct path *path, const struct path *root,
> >> +                           bool ref_less);
> >> +int path_walk_parent(struct parent_iterator *pit, struct path *next_parent);
> >> +int path_walk_parent_end(struct parent_iterator *pit);
> > 
> > I think it is better to make this rcu walk a separate set of APIs.
> > IOW, we will have:
> > 
> > int path_walk_parent(struct path *path, struct path *root);
> > 
> > and
> > 
> > void path_walk_parent_rcu_start(struct parent_iterator *pit,
> >                            const struct path *path, const struct path *root);
> > int path_walk_parent_rcu_next(struct parent_iterator *pit, struct path
> > *next_parent);
> > int path_walk_parent_rcu_end(struct parent_iterator *pit);
> 
> (replied above)

Exposing two sets of different APIs for essentially the same things is
not going to happen.

The VFS doesn't expose a rcu variant and a non-rcu variant for itself so
we are absolutely not going to do that for outside stuff.

It always does the try RCU first, then try to continue the walk by
falling back to REF walk (e.g., via try_to_unlazy()). If that doesn't
work then let the caller know and require them to decide whether to
abort or redo everything in ref-walk.

There's zero need in that scheme for the caller to see any of the
internals of the VFS and that's what you should aim for.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 0/5] bpf path iterator
  2025-06-11 11:36       ` Christian Brauner
@ 2025-06-11 15:39         ` Mickaël Salaün
  0 siblings, 0 replies; 30+ messages in thread
From: Mickaël Salaün @ 2025-06-11 15:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Tingmao Wang, Song Liu, Al Viro, amir73il, andrii, ast, bpf,
	daniel, eddyz87, gnoack, jack, jlayton, josef, kernel-team,
	kpsingh, linux-fsdevel, linux-kernel, linux-security-module,
	martin.lau, mattbobrowski, repnop

On Wed, Jun 11, 2025 at 01:36:46PM +0200, Christian Brauner wrote:
> On Mon, Jun 09, 2025 at 09:08:34AM +0100, Tingmao Wang wrote:
> > On 6/9/25 07:23, Song Liu wrote:
> > > On Sun, Jun 8, 2025 at 10:34 AM Tingmao Wang <m@maowtm.org> wrote:
> > > [...]
> > >> Hi Song, Christian, Al and others,
> > >>
> > >> Previously I proposed in [1] to add ability to do a reference-less parent
> > >> walk for Landlock.  However, as Christian pointed out and I do agree in
> > >> hindsight, it is not a good idea to do things like this in non-VFS code.
> > >>
> > >> However, I still think this is valuable to consider given the performance
> > >> improvement, and after some discussion with Mickaël, I would like to
> > >> propose extending Song's helper to support such usage.  While I recognize
> > >> that this patch series is already in its v3, and I do not want to delay it
> > >> by too much, putting this proposal out now is still better than after this
> > >> has merged, so that we may consider signature changes.
> > >>
> > >> I've created a proof-of-concept and did some brief testing.  The
> > >> performance improvement attained here is the same as in [1] (with a "git
> > >> status" workload, median landlock overhead 35% -> 28%, median time in
> > >> landlock decreases by 26.6%).
> > >>
> > >> If this idea is accepted, I'm happy to work on it further, split out this
> > >> patch, update the comments and do more testing etc, potentially in
> > >> collaboration with Song.
> > >>
> > >> An alternative to this is perhaps to add a new helper
> > >> path_walk_parent_rcu, also living in namei.c, that will be used directly
> > >> by Landlock.  I'm happy to do it either way, but with some experimentation
> > >> I personally think that the code in this patch is still clean enough, and
> > >> can avoid some duplication.
> > >>
> > >> Patch title: path_walk_parent: support reference-less walk
> > >>
> > >> A later commit will update the BPF path iterator to use this.
> > >>
> > >> Signed-off-by: Tingmao Wang <m@maowtm.org>
> > > [...]
> > >>
> > >> -bool path_walk_parent(struct path *path, const struct path *root);
> > >> +struct parent_iterator {
> > >> +       struct path path;
> > >> +       struct path root;
> > >> +       bool rcu;
> > >> +       /* expected seq of path->dentry */
> > >> +       unsigned next_seq;
> > >> +       unsigned m_seq, r_seq;
> > > 
> > > Most of parent_iterator is not really used by reference walk.
> > > So it is probably just separate the two APIs?
> > 
> > I don't mind either way, but I feel like it might be nice to just have one
> > style of APIs (i.e. an iterator with start / end / next vs just one
> > function), even though this is not totally necessary for the ref-taking
> > walk.  After all, the BPF use case is iterator-based.  This also means
> > that the code at the user's side (mostly thinking of Landlock here) is
> > slightly simpler.
> > 
> > But I've not experimented with the other way.  I'm open to both, and I'm
> > happy to send a patch later for a separate API (in that case that would
> > not depend on this and I might just start a new series).
> > 
> > Would like to hear what VFS folks thinks of this first tho, and whether
> > there's any preference in one or two APIs.
> 
> I really dislike exposing the sequence number for mounts an for
> dentries. That's just nonsense and a non-VFS low-level consumer of this
> API has zero business caring about any of that. It's easy to
> misunderstand, it's easy to abuse so that's not a good way of doing
> this. It's the wrong API.
> 
> > 
> > > 
> > > Also, is it ok to make m_seq and r_seq available out of fs/?
> 
> No, it's not.
> 
> > 
> > The struct is not intended to be used directly by code outside.  Not sure
> 
> That doesn't mean anything. It's simply the wrong API if it has to spill
> so much of its bowels.
> 
> > what is the standard way to do this but we can make it private by e.g.
> > putting the seq values in another struct, if needed.  Alternatively I
> > think we can hide the entire struct behind an opaque pointer by doing the
> > allocation ourselves.
> > 
> > > 
> > >> +};
> > >> +
> > >> +#define PATH_WALK_PARENT_UPDATED               0
> > >> +#define PATH_WALK_PARENT_ALREADY_ROOT  -1
> > >> +#define PATH_WALK_PARENT_RETRY                 -2
> > >> +
> > >> +void path_walk_parent_start(struct parent_iterator *pit,
> > >> +                           const struct path *path, const struct path *root,
> > >> +                           bool ref_less);
> > >> +int path_walk_parent(struct parent_iterator *pit, struct path *next_parent);
> > >> +int path_walk_parent_end(struct parent_iterator *pit);
> > > 
> > > I think it is better to make this rcu walk a separate set of APIs.
> > > IOW, we will have:
> > > 
> > > int path_walk_parent(struct path *path, struct path *root);
> > > 
> > > and
> > > 
> > > void path_walk_parent_rcu_start(struct parent_iterator *pit,
> > >                            const struct path *path, const struct path *root);
> > > int path_walk_parent_rcu_next(struct parent_iterator *pit, struct path
> > > *next_parent);
> > > int path_walk_parent_rcu_end(struct parent_iterator *pit);
> > 
> > (replied above)
> 
> Exposing two sets of different APIs for essentially the same things is
> not going to happen.
> 
> The VFS doesn't expose a rcu variant and a non-rcu variant for itself so
> we are absolutely not going to do that for outside stuff.
> 
> It always does the try RCU first, then try to continue the walk by
> falling back to REF walk (e.g., via try_to_unlazy()). If that doesn't
> work then let the caller know and require them to decide whether to
> abort or redo everything in ref-walk.

That's indeed what is done by choose_mountpoint() (relying on
choose_mountpoint_rcu() when possible), but this proposal is about doing
a full path walk (i.e. multiple calls to path_walk_parent) within RCU.

> 
> There's zero need in that scheme for the caller to see any of the
> internals of the VFS and that's what you should aim for.

Yes, but how could we detect if a full path walk is invalid (because of
a rename or mount change)?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-11  0:56     ` Song Liu
@ 2025-06-11 15:42       ` Mickaël Salaün
  2025-06-11 16:31         ` Song Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Mickaël Salaün @ 2025-06-11 15:42 UTC (permalink / raw)
  To: Song Liu
  Cc: NeilBrown, Jan Kara, bpf, linux-fsdevel, linux-kernel,
	linux-security-module, kernel-team, andrii, eddyz87, ast, daniel,
	martin.lau, viro, brauner, kpsingh, mattbobrowski, amir73il,
	repnop, jlayton, josef, gnoack, m

On Tue, Jun 10, 2025 at 05:56:01PM -0700, Song Liu wrote:
> Hi Neil,
> 
> Thanks for your suggestion! It does look like a good solution.
> 
> On Tue, Jun 10, 2025 at 4:34 PM NeilBrown <neil@brown.name> wrote:
> 
> > The above looks a lot like follow_dotdot().  This is good because it
> > means that it is likely correct.  But it is bad because it means there
> > are two copies of essentially the same code - making maintenance harder.
> >
> > I think it would be good to split the part that you want out of
> > follow_dotdot() and use that.  Something like the following.
> >
> > You might need a small wrapper in landlock which would, for example,
> > pass LOOKUP_BENEATH and replace path->dentry with the parent on success.
> >
> > NeilBrown
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 4bb889fc980b..b81d07b4417b 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -2048,36 +2048,65 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd)
> >         return nd->path.dentry;
> >  }
> >
> > -static struct dentry *follow_dotdot(struct nameidata *nd)
> > +/**
> > + * path_walk_parent - Find the parent of the given struct path
> > + * @path  - The struct path to start from
> > + * @root  - A struct path which serves as a boundary not to be crosses
> > + * @flags - Some LOOKUP_ flags
> > + *
> > + * Find and return the dentry for the parent of the given path (mount/dentry).
> > + * If the given path is the root of a mounted tree, it is first updated to
> > + * the mount point on which that tree is mounted.
> > + *
> > + * If %LOOKUP_NO_XDEV is given, then *after* the path is updated to a new mount,
> > + * the error EXDEV is returned.
> > + * If no parent can be found, either because the tree is not mounted or because
> > + * the @path matches the @root, then @path->dentry is returned unless @flags
> > + * contains %LOOKUP_BENEATH, in which case -EXDEV is returned.
> > + *
> > + * Returns: either an ERR_PTR() or the chosen parent which will have had the
> > + * refcount incremented.
> > + */
> > +struct dentry *path_walk_parent(struct path *path, struct path *root, int flags)
> 
> We can probably call this __path_walk_parent() and make it static.
> 
> Then we can add an exported path_walk_parent() that calls
> __path_walk_parent() and adds extra logic.
> 
> If this looks good to folks, I can draft v4 based on this idea.

This looks good but it would be better if we could also do a full path
walk within RCU when possible.

> 
> Thanks,
> Song
> 
> [...]
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-11 15:42       ` Mickaël Salaün
@ 2025-06-11 16:31         ` Song Liu
  2025-06-11 17:50           ` Tingmao Wang
  0 siblings, 1 reply; 30+ messages in thread
From: Song Liu @ 2025-06-11 16:31 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: NeilBrown, Jan Kara, bpf, linux-fsdevel, linux-kernel,
	linux-security-module, kernel-team, andrii, eddyz87, ast, daniel,
	martin.lau, viro, brauner, kpsingh, mattbobrowski, amir73il,
	repnop, jlayton, josef, gnoack, m

On Wed, Jun 11, 2025 at 8:42 AM Mickaël Salaün <mic@digikod.net> wrote:
[...]
> > We can probably call this __path_walk_parent() and make it static.
> >
> > Then we can add an exported path_walk_parent() that calls
> > __path_walk_parent() and adds extra logic.
> >
> > If this looks good to folks, I can draft v4 based on this idea.
>
> This looks good but it would be better if we could also do a full path
> walk within RCU when possible.

I think we will need some callback mechanism for this. Something like:

for_each_parents(starting_path, root, callback_fn, cb_data, bool try_rcu) {
   if (!try_rcu)
      goto ref_walk;

   __read_seqcount_begin();
    /* rcu walk parents, from starting_path until root */
   walk_rcu(starting_path, root, path) {
    callback_fn(path, cb_data);
  }
  if (!read_seqcount_retry())
    return xxx;  /* successful rcu walk */

ref_walk:
  /* ref walk parents, from starting_path until root */
   walk(starting_path, root, path) {
    callback_fn(path, cb_data);
  }
  return xxx;
}

Personally, I don't like this version very much, because the callback
mechanism is not very flexible, and it is tricky to use it in BPF LSM.

Thanks,
Song

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-11 16:31         ` Song Liu
@ 2025-06-11 17:50           ` Tingmao Wang
  2025-06-11 18:08             ` Song Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Tingmao Wang @ 2025-06-11 17:50 UTC (permalink / raw)
  To: Song Liu, Mickaël Salaün
  Cc: NeilBrown, Jan Kara, bpf, linux-fsdevel, linux-kernel,
	linux-security-module, kernel-team, andrii, eddyz87, ast, daniel,
	martin.lau, viro, brauner, kpsingh, mattbobrowski, amir73il,
	repnop, jlayton, josef, gnoack

On 6/11/25 17:31, Song Liu wrote:
> On Wed, Jun 11, 2025 at 8:42 AM Mickaël Salaün <mic@digikod.net> wrote:
> [...]
>>> We can probably call this __path_walk_parent() and make it static.
>>>
>>> Then we can add an exported path_walk_parent() that calls
>>> __path_walk_parent() and adds extra logic.
>>>
>>> If this looks good to folks, I can draft v4 based on this idea.
>>
>> This looks good but it would be better if we could also do a full path
>> walk within RCU when possible.
> 
> I think we will need some callback mechanism for this. Something like:
> 
> for_each_parents(starting_path, root, callback_fn, cb_data, bool try_rcu) {
>    if (!try_rcu)
>       goto ref_walk;
> 
>    __read_seqcount_begin();
>     /* rcu walk parents, from starting_path until root */
>    walk_rcu(starting_path, root, path) {
>     callback_fn(path, cb_data);
>   }
>   if (!read_seqcount_retry())
>     return xxx;  /* successful rcu walk */
> 
> ref_walk:
>   /* ref walk parents, from starting_path until root */
>    walk(starting_path, root, path) {
>     callback_fn(path, cb_data);
>   }
>   return xxx;
> }
> 
> Personally, I don't like this version very much, because the callback
> mechanism is not very flexible, and it is tricky to use it in BPF LSM.

Aside from the "exposing mount seqcounts" problem, what do you think about
the parent_iterator approach I suggested earlier?  I feel that it is
better than such a callback - more flexible, and also fits in right with
the BPF API you already designed (i.e. with a callback you might then have
to allow BPF to pass a callback?).  There are some specifics that I can
improve - Mickaël suggested some in our discussion:

- Letting the caller take rcu_read_lock outside rather than doing it in
path_walk_parent_start

- Instead of always requiring a struct parent_iterator, allow passing in
NULL for the iterator to path_walk_parent to do a reference walk without
needing to call path_walk_parent_start - this way might be simpler and
path_walk_parent_start/end can just be for rcu case.

but what do you think about the overall shape of it?

And while it is technically doing two separate things (rcu walk and
reference walk), so is this callback to some extent.  The pro of callback
however is that the retry on ref walk failure is automatic, but the user
still has to be aware and detect such cases.  For example, landlock needs
to re-initialize the layer masks previously collected if parent walk is
restarting.

(and of course, this also hides the seqcounts from non VFS code, but I'm
wondering if there are other ways to make the seqcounts in the
parent_iterator struct private, if that is the only issue with it?)

Also, if the common logic with follow_dotdot is extracted out to
__path_walk_parent, path_walk_parent might just defer to that for the
non-rcu case, and so the complexity of that function is further reduced.

> 
> Thanks,
> Song


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-11 17:50           ` Tingmao Wang
@ 2025-06-11 18:08             ` Song Liu
  2025-06-12  9:01               ` Jan Kara
  0 siblings, 1 reply; 30+ messages in thread
From: Song Liu @ 2025-06-11 18:08 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Mickaël Salaün, NeilBrown, Jan Kara, bpf, linux-fsdevel,
	linux-kernel, linux-security-module, kernel-team, andrii, eddyz87,
	ast, daniel, martin.lau, viro, brauner, kpsingh, mattbobrowski,
	amir73il, repnop, jlayton, josef, gnoack

On Wed, Jun 11, 2025 at 10:50 AM Tingmao Wang <m@maowtm.org> wrote:
[...]
> > I think we will need some callback mechanism for this. Something like:
> >
> > for_each_parents(starting_path, root, callback_fn, cb_data, bool try_rcu) {
> >    if (!try_rcu)
> >       goto ref_walk;
> >
> >    __read_seqcount_begin();
> >     /* rcu walk parents, from starting_path until root */
> >    walk_rcu(starting_path, root, path) {
> >     callback_fn(path, cb_data);
> >   }
> >   if (!read_seqcount_retry())
> >     return xxx;  /* successful rcu walk */
> >
> > ref_walk:
> >   /* ref walk parents, from starting_path until root */
> >    walk(starting_path, root, path) {
> >     callback_fn(path, cb_data);
> >   }
> >   return xxx;
> > }
> >
> > Personally, I don't like this version very much, because the callback
> > mechanism is not very flexible, and it is tricky to use it in BPF LSM.
>
> Aside from the "exposing mount seqcounts" problem, what do you think about
> the parent_iterator approach I suggested earlier?  I feel that it is
> better than such a callback - more flexible, and also fits in right with
> the BPF API you already designed (i.e. with a callback you might then have
> to allow BPF to pass a callback?).  There are some specifics that I can
> improve - Mickaël suggested some in our discussion:
>
> - Letting the caller take rcu_read_lock outside rather than doing it in
> path_walk_parent_start
>
> - Instead of always requiring a struct parent_iterator, allow passing in
> NULL for the iterator to path_walk_parent to do a reference walk without
> needing to call path_walk_parent_start - this way might be simpler and
> path_walk_parent_start/end can just be for rcu case.
>
> but what do you think about the overall shape of it?

Personally, I don't have strong objections to this design. But VFS
folks may have other concerns with it.

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-11 18:08             ` Song Liu
@ 2025-06-12  9:01               ` Jan Kara
  2025-06-12  9:49                 ` Jan Kara
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2025-06-12  9:01 UTC (permalink / raw)
  To: Song Liu
  Cc: Tingmao Wang, Mickaël Salaün, NeilBrown, Jan Kara, bpf,
	linux-fsdevel, linux-kernel, linux-security-module, kernel-team,
	andrii, eddyz87, ast, daniel, martin.lau, viro, brauner, kpsingh,
	mattbobrowski, amir73il, repnop, jlayton, josef, gnoack

On Wed 11-06-25 11:08:30, Song Liu wrote:
> On Wed, Jun 11, 2025 at 10:50 AM Tingmao Wang <m@maowtm.org> wrote:
> [...]
> > > I think we will need some callback mechanism for this. Something like:
> > >
> > > for_each_parents(starting_path, root, callback_fn, cb_data, bool try_rcu) {
> > >    if (!try_rcu)
> > >       goto ref_walk;
> > >
> > >    __read_seqcount_begin();
> > >     /* rcu walk parents, from starting_path until root */
> > >    walk_rcu(starting_path, root, path) {
> > >     callback_fn(path, cb_data);
> > >   }
> > >   if (!read_seqcount_retry())
> > >     return xxx;  /* successful rcu walk */
> > >
> > > ref_walk:
> > >   /* ref walk parents, from starting_path until root */
> > >    walk(starting_path, root, path) {
> > >     callback_fn(path, cb_data);
> > >   }
> > >   return xxx;
> > > }
> > >
> > > Personally, I don't like this version very much, because the callback
> > > mechanism is not very flexible, and it is tricky to use it in BPF LSM.
> >
> > Aside from the "exposing mount seqcounts" problem, what do you think about
> > the parent_iterator approach I suggested earlier?  I feel that it is
> > better than such a callback - more flexible, and also fits in right with
> > the BPF API you already designed (i.e. with a callback you might then have
> > to allow BPF to pass a callback?).  There are some specifics that I can
> > improve - Mickaël suggested some in our discussion:
> >
> > - Letting the caller take rcu_read_lock outside rather than doing it in
> > path_walk_parent_start
> >
> > - Instead of always requiring a struct parent_iterator, allow passing in
> > NULL for the iterator to path_walk_parent to do a reference walk without
> > needing to call path_walk_parent_start - this way might be simpler and
> > path_walk_parent_start/end can just be for rcu case.
> >
> > but what do you think about the overall shape of it?
> 
> Personally, I don't have strong objections to this design. But VFS
> folks may have other concerns with it.

From what I've read above I'm not sure about details of the proposal but I
don't think mixing of RCU & non-RCU walk in a single function / iterator is
a good idea. IMHO the code would be quite messy. After all we have
follow_dotdot_rcu() and follow_dotdot() as separate functions for a reason.
Also given this series went through several iterations and we don't yet
have an acceptable / correct solution suggests getting even the standard
walk correct is hard enough. RCU walk is going to be only worse. So I'd
suggest to get the standard walk finished and agreed on first and
investigate feasibility of RCU variant later.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-12  9:01               ` Jan Kara
@ 2025-06-12  9:49                 ` Jan Kara
  2025-06-12 12:31                   ` Christian Brauner
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2025-06-12  9:49 UTC (permalink / raw)
  To: Song Liu
  Cc: Tingmao Wang, Mickaël Salaün, NeilBrown, Jan Kara, bpf,
	linux-fsdevel, linux-kernel, linux-security-module, kernel-team,
	andrii, eddyz87, ast, daniel, martin.lau, viro, brauner, kpsingh,
	mattbobrowski, amir73il, repnop, jlayton, josef, gnoack

On Thu 12-06-25 11:01:16, Jan Kara wrote:
> On Wed 11-06-25 11:08:30, Song Liu wrote:
> > On Wed, Jun 11, 2025 at 10:50 AM Tingmao Wang <m@maowtm.org> wrote:
> > [...]
> > > > I think we will need some callback mechanism for this. Something like:
> > > >
> > > > for_each_parents(starting_path, root, callback_fn, cb_data, bool try_rcu) {
> > > >    if (!try_rcu)
> > > >       goto ref_walk;
> > > >
> > > >    __read_seqcount_begin();
> > > >     /* rcu walk parents, from starting_path until root */
> > > >    walk_rcu(starting_path, root, path) {
> > > >     callback_fn(path, cb_data);
> > > >   }
> > > >   if (!read_seqcount_retry())
> > > >     return xxx;  /* successful rcu walk */
> > > >
> > > > ref_walk:
> > > >   /* ref walk parents, from starting_path until root */
> > > >    walk(starting_path, root, path) {
> > > >     callback_fn(path, cb_data);
> > > >   }
> > > >   return xxx;
> > > > }
> > > >
> > > > Personally, I don't like this version very much, because the callback
> > > > mechanism is not very flexible, and it is tricky to use it in BPF LSM.
> > >
> > > Aside from the "exposing mount seqcounts" problem, what do you think about
> > > the parent_iterator approach I suggested earlier?  I feel that it is
> > > better than such a callback - more flexible, and also fits in right with
> > > the BPF API you already designed (i.e. with a callback you might then have
> > > to allow BPF to pass a callback?).  There are some specifics that I can
> > > improve - Mickaël suggested some in our discussion:
> > >
> > > - Letting the caller take rcu_read_lock outside rather than doing it in
> > > path_walk_parent_start
> > >
> > > - Instead of always requiring a struct parent_iterator, allow passing in
> > > NULL for the iterator to path_walk_parent to do a reference walk without
> > > needing to call path_walk_parent_start - this way might be simpler and
> > > path_walk_parent_start/end can just be for rcu case.
> > >
> > > but what do you think about the overall shape of it?
> > 
> > Personally, I don't have strong objections to this design. But VFS
> > folks may have other concerns with it.
> 
> From what I've read above I'm not sure about details of the proposal but I
> don't think mixing of RCU & non-RCU walk in a single function / iterator is
> a good idea. IMHO the code would be quite messy. After all we have
> follow_dotdot_rcu() and follow_dotdot() as separate functions for a reason.
> Also given this series went through several iterations and we don't yet
> have an acceptable / correct solution suggests getting even the standard
> walk correct is hard enough. RCU walk is going to be only worse. So I'd
> suggest to get the standard walk finished and agreed on first and
> investigate feasibility of RCU variant later.

OK, I've now read some of Tingmaon's and Christian's replies which I've
missed previously so I guess I now better understand why you complicate
things with RCU walking but still I'm of the opinion that we should start
with getting the standard walk working. IMHO pulling in RCU walk into the
iterator will bring it to a completely new complexity level...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()
  2025-06-12  9:49                 ` Jan Kara
@ 2025-06-12 12:31                   ` Christian Brauner
  2025-06-16  0:24                     ` Ref-less parent walk from Landlock (was: Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()) Tingmao Wang
  0 siblings, 1 reply; 30+ messages in thread
From: Christian Brauner @ 2025-06-12 12:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: Song Liu, Tingmao Wang, Mickaël Salaün, NeilBrown, bpf,
	linux-fsdevel, linux-kernel, linux-security-module, kernel-team,
	andrii, eddyz87, ast, daniel, martin.lau, viro, kpsingh,
	mattbobrowski, amir73il, repnop, jlayton, josef, gnoack

On Thu, Jun 12, 2025 at 11:49:08AM +0200, Jan Kara wrote:
> On Thu 12-06-25 11:01:16, Jan Kara wrote:
> > On Wed 11-06-25 11:08:30, Song Liu wrote:
> > > On Wed, Jun 11, 2025 at 10:50 AM Tingmao Wang <m@maowtm.org> wrote:
> > > [...]
> > > > > I think we will need some callback mechanism for this. Something like:
> > > > >
> > > > > for_each_parents(starting_path, root, callback_fn, cb_data, bool try_rcu) {
> > > > >    if (!try_rcu)
> > > > >       goto ref_walk;
> > > > >
> > > > >    __read_seqcount_begin();
> > > > >     /* rcu walk parents, from starting_path until root */
> > > > >    walk_rcu(starting_path, root, path) {
> > > > >     callback_fn(path, cb_data);
> > > > >   }
> > > > >   if (!read_seqcount_retry())
> > > > >     return xxx;  /* successful rcu walk */
> > > > >
> > > > > ref_walk:
> > > > >   /* ref walk parents, from starting_path until root */
> > > > >    walk(starting_path, root, path) {
> > > > >     callback_fn(path, cb_data);
> > > > >   }
> > > > >   return xxx;
> > > > > }
> > > > >
> > > > > Personally, I don't like this version very much, because the callback
> > > > > mechanism is not very flexible, and it is tricky to use it in BPF LSM.
> > > >
> > > > Aside from the "exposing mount seqcounts" problem, what do you think about
> > > > the parent_iterator approach I suggested earlier?  I feel that it is
> > > > better than such a callback - more flexible, and also fits in right with
> > > > the BPF API you already designed (i.e. with a callback you might then have
> > > > to allow BPF to pass a callback?).  There are some specifics that I can
> > > > improve - Mickaël suggested some in our discussion:
> > > >
> > > > - Letting the caller take rcu_read_lock outside rather than doing it in
> > > > path_walk_parent_start
> > > >
> > > > - Instead of always requiring a struct parent_iterator, allow passing in
> > > > NULL for the iterator to path_walk_parent to do a reference walk without
> > > > needing to call path_walk_parent_start - this way might be simpler and
> > > > path_walk_parent_start/end can just be for rcu case.
> > > >
> > > > but what do you think about the overall shape of it?
> > > 
> > > Personally, I don't have strong objections to this design. But VFS
> > > folks may have other concerns with it.
> > 
> > From what I've read above I'm not sure about details of the proposal but I
> > don't think mixing of RCU & non-RCU walk in a single function / iterator is
> > a good idea. IMHO the code would be quite messy. After all we have
> > follow_dotdot_rcu() and follow_dotdot() as separate functions for a reason.
> > Also given this series went through several iterations and we don't yet
> > have an acceptable / correct solution suggests getting even the standard
> > walk correct is hard enough. RCU walk is going to be only worse. So I'd
> > suggest to get the standard walk finished and agreed on first and
> > investigate feasibility of RCU variant later.
> 
> OK, I've now read some of Tingmaon's and Christian's replies which I've
> missed previously so I guess I now better understand why you complicate
> things with RCU walking but still I'm of the opinion that we should start
> with getting the standard walk working. IMHO pulling in RCU walk into the
> iterator will bring it to a completely new complexity level...

I would not want it in the first place. But I have a deep seated
aversion to exposing two different variants. Especially if the second
variant wants or needs access to internal details such as mount or
dentry sequence counts. I'm not at all in favor of that.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Ref-less parent walk from Landlock (was: Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent())
  2025-06-12 12:31                   ` Christian Brauner
@ 2025-06-16  0:24                     ` Tingmao Wang
  2025-06-17  6:20                       ` Song Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Tingmao Wang @ 2025-06-16  0:24 UTC (permalink / raw)
  To: Christian Brauner, Jan Kara, Song Liu
  Cc: Mickaël Salaün, NeilBrown, bpf, linux-fsdevel,
	linux-kernel, linux-security-module, kernel-team, andrii, eddyz87,
	ast, daniel, martin.lau, viro, kpsingh, mattbobrowski, amir73il,
	repnop, jlayton, josef, gnoack

On 6/12/25 13:31, Christian Brauner wrote:
> On Thu, Jun 12, 2025 at 11:49:08AM +0200, Jan Kara wrote:
>> On Thu 12-06-25 11:01:16, Jan Kara wrote:
>>> On Wed 11-06-25 11:08:30, Song Liu wrote:
>>>> On Wed, Jun 11, 2025 at 10:50 AM Tingmao Wang <m@maowtm.org> wrote:
>>>>> [...]
>>>>> Aside from the "exposing mount seqcounts" problem, what do you think about
>>>>> the parent_iterator approach I suggested earlier?  I feel that it is
>>>>> better than such a callback - more flexible, and also fits in right with
>>>>> the BPF API you already designed (i.e. with a callback you might then have
>>>>> to allow BPF to pass a callback?).  There are some specifics that I can
>>>>> improve - Mickaël suggested some in our discussion:
>>>>>
>>>>> - Letting the caller take rcu_read_lock outside rather than doing it in
>>>>> path_walk_parent_start
>>>>>
>>>>> - Instead of always requiring a struct parent_iterator, allow passing in
>>>>> NULL for the iterator to path_walk_parent to do a reference walk without
>>>>> needing to call path_walk_parent_start - this way might be simpler and
>>>>> path_walk_parent_start/end can just be for rcu case.
>>>>>
>>>>> but what do you think about the overall shape of it?
>>>>
>>>> Personally, I don't have strong objections to this design. But VFS
>>>> folks may have other concerns with it.
>>>
>>> From what I've read above I'm not sure about details of the proposal but I
>>> don't think mixing of RCU & non-RCU walk in a single function / iterator is
>>> a good idea. IMHO the code would be quite messy. After all we have
>>> follow_dotdot_rcu() and follow_dotdot() as separate functions for a reason.
>>> Also given this series went through several iterations and we don't yet
>>> have an acceptable / correct solution suggests getting even the standard
>>> walk correct is hard enough. RCU walk is going to be only worse. So I'd
>>> suggest to get the standard walk finished and agreed on first and
>>> investigate feasibility of RCU variant later.
>>
>> OK, I've now read some of Tingmaon's and Christian's replies which I've
>> missed previously so I guess I now better understand why you complicate
>> things with RCU walking but still I'm of the opinion that we should start
>> with getting the standard walk working. IMHO pulling in RCU walk into the
>> iterator will bring it to a completely new complexity level...
> 
> I would not want it in the first place. But I have a deep seated
> aversion to exposing two different variants.

Hi Christian, Jan, Song,

I do appreciate your thoughts here and thanks for taking the time to
explain.  I just have some specific points which I would like you to
consider:

Taking a step back, maybe the specific designs need a bit more thought,
but are you at all open to the idea of letting other subsystems take
advantage of a rcu-based parent walk?  Testing shows that for specific
cases of a deep directory hierarchy the speedup (for time in Landlock) can
be almost 60%, and still very significant for the average case. [1]

I think what I'm proposing here is basically what follow_dotdot_rcu
already does (aside from checking rename_seq, but actually that was mostly
a conservative check. I think we're good even if we just check dentry seq
across the path walk calls), and in fact given the latest suggestion to
base the path walk helper on a modified version of follow_dotdot (that
takes path argument instead of using nameidata), I can see one approach
here being to do the same for follow_dotdot_rcu (i.e. extracting the logic
from start to before "nd->next_seq = read_seqcount_begin(&parent->d_seq);").
That way, we will not be "inventing" any new code that messes with VFS
internals.

In respect to the comment from Jan, I'm putting the suggestion out right
now to avoid only surfacing this ask after Song's path iterator API has
just been merged.  I'm not saying we have to do it here and now, but if
there is at all a possibility of incorporating rcu-based walk in this
helper (or a separate helper - I personally don't mind either way), I
would like to make sure that possibility stays open.

I'm happy to wait till Song's current patch is finished before continuing
this, but if there is strong objection to two separate APIs, I would
really appreciate if we can end up in a state where further change to
implement this is possible.

> Especially if the second variant wants or needs access to internal details
> such as mount or dentry sequence counts. I'm not at all in favor of that.

I don't want to expose VFS internals, but are you worried about even
making use of them?  (well, rename_lock and d_seq is already accessible
from outside since they are defined in include/linux/dcache.h, and so it's
just the (readout of the) mount seqcount here that will gain additional
exposure, but maybe we can mark it with __private.)

Would it be less worrying if any checks against those seqcount values are
kept within follow_dotdot_rcu, but just that it is stored in an iterator
that outside code are supposed to treat as opaque?  (We can maybe define
the semantic here as basically "this iterator makes sure your rcu walk
can't result in states where a reference-taking walk won't get to, as long
as you retry when the function returns -EAGAIN (or maybe -ECHILD)").

[1]: https://lore.kernel.org/all/cover.1748997840.git.m@maowtm.org/#t

Thanks a lot,
Tingmao

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ref-less parent walk from Landlock (was: Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent())
  2025-06-16  0:24                     ` Ref-less parent walk from Landlock (was: Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()) Tingmao Wang
@ 2025-06-17  6:20                       ` Song Liu
  0 siblings, 0 replies; 30+ messages in thread
From: Song Liu @ 2025-06-17  6:20 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Christian Brauner, Jan Kara, Mickaël Salaün, NeilBrown,
	bpf, linux-fsdevel, linux-kernel, linux-security-module,
	kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	kpsingh, mattbobrowski, amir73il, repnop, jlayton, josef, gnoack

On Sun, Jun 15, 2025 at 5:24 PM Tingmao Wang <m@maowtm.org> wrote:
[...]
> >
> > I would not want it in the first place. But I have a deep seated
> > aversion to exposing two different variants.
>
> Hi Christian, Jan, Song,
>
> I do appreciate your thoughts here and thanks for taking the time to
> explain.  I just have some specific points which I would like you to
> consider:
>
> Taking a step back, maybe the specific designs need a bit more thought,
> but are you at all open to the idea of letting other subsystems take
> advantage of a rcu-based parent walk?

I cannot really speak for VFS folks, but I guess rcu-based parent walk
out of fs/ is not preferred.

> Testing shows that for specific
> cases of a deep directory hierarchy the speedup (for time in Landlock) can
> be almost 60%, and still very significant for the average case. [1]
[...]
> I'm happy to wait till Song's current patch is finished before continuing
> this, but if there is strong objection to two separate APIs, I would
> really appreciate if we can end up in a state where further change to
> implement this is possible.

In v5, path_walk_parent API is not exported. We can easily change it
in the future. Therefore, I don't think we need to rush into a rcu-walk
design before landing path_walk_parent.

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-06-17  6:20 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-06 21:30 [PATCH v3 bpf-next 0/5] bpf path iterator Song Liu
2025-06-06 21:30 ` [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent() Song Liu
2025-06-10 17:18   ` Mickaël Salaün
2025-06-10 17:26     ` Song Liu
2025-06-10 22:26       ` Tingmao Wang
2025-06-10 22:34         ` Tingmao Wang
2025-06-10 23:08         ` Song Liu
2025-06-11  0:23           ` Tingmao Wang
2025-06-10 23:34   ` NeilBrown
2025-06-11  0:56     ` Song Liu
2025-06-11 15:42       ` Mickaël Salaün
2025-06-11 16:31         ` Song Liu
2025-06-11 17:50           ` Tingmao Wang
2025-06-11 18:08             ` Song Liu
2025-06-12  9:01               ` Jan Kara
2025-06-12  9:49                 ` Jan Kara
2025-06-12 12:31                   ` Christian Brauner
2025-06-16  0:24                     ` Ref-less parent walk from Landlock (was: Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()) Tingmao Wang
2025-06-17  6:20                       ` Song Liu
2025-06-06 21:30 ` [PATCH v3 bpf-next 2/5] landlock: Use path_walk_parent() Song Liu
2025-06-08 18:45   ` Tingmao Wang
2025-06-06 21:30 ` [PATCH v3 bpf-next 3/5] bpf: Introduce path iterator Song Liu
2025-06-06 21:30 ` [PATCH v3 bpf-next 4/5] selftests/bpf: Add tests for bpf " Song Liu
2025-06-06 21:30 ` [PATCH v3 bpf-next 5/5] selftests/bpf: Path walk test Song Liu
2025-06-08 17:32 ` [PATCH v3 bpf-next 0/5] bpf path iterator Tingmao Wang
2025-06-09  6:23   ` Song Liu
2025-06-09  8:08     ` Tingmao Wang
2025-06-11 11:36       ` Christian Brauner
2025-06-11 15:39         ` Mickaël Salaün
2025-06-08 17:32 ` Tingmao Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).