* [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr
@ 2025-06-18 23:37 Song Liu
2025-06-18 23:37 ` [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access Song Liu
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Song Liu @ 2025-06-18 23:37 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, gregkh, tj,
daan.j.demeyer, Song Liu
Introduce a new kfunc bpf_kernfs_read_xattr, which can read xattr from
kernfs nodes (cgroupfs, for example). The primary users are LSMs, for
example, from systemd. sched_ext could also use xattrs on cgroupfs nodes.
However, this is not allowed yet, because bpf_kernfs_read_xattr is only
allowed from LSM hooks. The plan is to address sched_ext later (or in a
later revision of this set).
Song Liu (4):
kernfs: Add __kernfs_xattr_get for RCU protected access
bpf: Introduce bpf_kernfs_read_xattr to read xattr of kernfs nodes
bpf: Mark cgroup_subsys_state->cgroup RCU safe
selftests/bpf: Add tests for bpf_kernfs_read_xattr
fs/bpf_fs_kfuncs.c | 33 ++++
fs/kernfs/inode.c | 14 ++
include/linux/kernfs.h | 2 +
kernel/bpf/verifier.c | 5 +
.../selftests/bpf/prog_tests/kernfs_xattr.c | 145 ++++++++++++++++++
.../selftests/bpf/progs/kernfs_read_xattr.c | 117 ++++++++++++++
.../selftests/bpf/progs/read_cgroupfs_xattr.c | 60 ++++++++
7 files changed, 376 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/kernfs_xattr.c
create mode 100644 tools/testing/selftests/bpf/progs/kernfs_read_xattr.c
create mode 100644 tools/testing/selftests/bpf/progs/read_cgroupfs_xattr.c
--
2.47.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access
2025-06-18 23:37 [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr Song Liu
@ 2025-06-18 23:37 ` Song Liu
2025-06-19 10:01 ` Christian Brauner
2025-06-19 13:57 ` kernel test robot
2025-06-18 23:37 ` [PATCH bpf-next 2/4] bpf: Introduce bpf_kernfs_read_xattr to read xattr of kernfs nodes Song Liu
` (3 subsequent siblings)
4 siblings, 2 replies; 14+ messages in thread
From: Song Liu @ 2025-06-18 23:37 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, gregkh, tj,
daan.j.demeyer, Song Liu
Existing kernfs_xattr_get() locks iattr_mutex, so it cannot be used in
RCU critical sections. Introduce __kernfs_xattr_get(), which reads xattr
under RCU read lock. This can be used by BPF programs to access cgroupfs
xattrs.
Signed-off-by: Song Liu <song@kernel.org>
---
fs/kernfs/inode.c | 14 ++++++++++++++
include/linux/kernfs.h | 2 ++
2 files changed, 16 insertions(+)
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index b83054da68b3..0ca231d2012c 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -302,6 +302,20 @@ int kernfs_xattr_get(struct kernfs_node *kn, const char *name,
return simple_xattr_get(&attrs->xattrs, name, value, size);
}
+int __kernfs_xattr_get(struct kernfs_node *kn, const char *name,
+ void *value, size_t size)
+{
+ struct kernfs_iattrs *attrs;
+
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ attrs = rcu_dereference(kn->iattr);
+ if (!attrs)
+ return -ENODATA;
+
+ return simple_xattr_get(&attrs->xattrs, name, value, size);
+}
+
int kernfs_xattr_set(struct kernfs_node *kn, const char *name,
const void *value, size_t size, int flags)
{
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index b5a5f32fdfd1..8536ffc5c9f1 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -456,6 +456,8 @@ void kernfs_notify(struct kernfs_node *kn);
int kernfs_xattr_get(struct kernfs_node *kn, const char *name,
void *value, size_t size);
+int __kernfs_xattr_get(struct kernfs_node *kn, const char *name,
+ void *value, size_t size);
int kernfs_xattr_set(struct kernfs_node *kn, const char *name,
const void *value, size_t size, int flags);
--
2.47.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next 2/4] bpf: Introduce bpf_kernfs_read_xattr to read xattr of kernfs nodes
2025-06-18 23:37 [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr Song Liu
2025-06-18 23:37 ` [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access Song Liu
@ 2025-06-18 23:37 ` Song Liu
2025-06-19 8:49 ` Christian Brauner
2025-06-18 23:37 ` [PATCH bpf-next 3/4] bpf: Mark cgroup_subsys_state->cgroup RCU safe Song Liu
` (2 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Song Liu @ 2025-06-18 23:37 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, gregkh, tj,
daan.j.demeyer, Song Liu
BPF programs, such as LSM and sched_ext, would benefit from tags on
cgroups. One common practice to apply such tags is to set xattrs on
cgroupfs files and folders.
Introduce kfunc bpf_kernfs_read_xattr, which allows reading kernfs
xattr under RCU read lock.
Note that, we already have bpf_get_[file|dentry]_xattr. However, these
two APIs are not ideal for reading cgroupfs xattrs, because:
1) These two APIs only works in sleepable contexts;
2) There is no kfunc that matches current cgroup to cgroupfs dentry.
Signed-off-by: Song Liu <song@kernel.org>
---
fs/bpf_fs_kfuncs.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 08412532db1b..7576dbc9b340 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -9,6 +9,7 @@
#include <linux/fs.h>
#include <linux/fsnotify.h>
#include <linux/file.h>
+#include <linux/kernfs.h>
#include <linux/mm.h>
#include <linux/xattr.h>
@@ -322,6 +323,37 @@ __bpf_kfunc int bpf_remove_dentry_xattr(struct dentry *dentry, const char *name_
return ret;
}
+/**
+ * bpf_kernfs_read_xattr - get xattr of a kernfs node
+ * @kn: kernfs_node to get xattr from
+ * @name__str: name of the xattr
+ * @value_p: output buffer of the xattr value
+ *
+ * Get xattr *name__str* of *kn* and store the output in *value_ptr*.
+ *
+ * For security reasons, only *name__str* with prefix "user." is allowed.
+ *
+ * Return: length of the xattr value on success, a negative value on error.
+ */
+__bpf_kfunc int bpf_kernfs_read_xattr(struct kernfs_node *kn, const char *name__str,
+ struct bpf_dynptr *value_p)
+{
+ struct bpf_dynptr_kern *value_ptr = (struct bpf_dynptr_kern *)value_p;
+ u32 value_len;
+ void *value;
+
+ /* Only allow reading "user.*" xattrs */
+ if (strncmp(name__str, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN))
+ return -EPERM;
+
+ value_len = __bpf_dynptr_size(value_ptr);
+ value = __bpf_dynptr_data_rw(value_ptr, value_len);
+ if (!value)
+ return -EINVAL;
+
+ return __kernfs_xattr_get(kn, name__str, value, value_len);
+}
+
__bpf_kfunc_end_defs();
BTF_KFUNCS_START(bpf_fs_kfunc_set_ids)
@@ -333,6 +365,7 @@ BTF_ID_FLAGS(func, bpf_get_dentry_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_get_file_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_set_dentry_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_remove_dentry_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_kernfs_read_xattr, KF_RCU | KF_RCU_PROTECTED)
BTF_KFUNCS_END(bpf_fs_kfunc_set_ids)
static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
--
2.47.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next 3/4] bpf: Mark cgroup_subsys_state->cgroup RCU safe
2025-06-18 23:37 [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr Song Liu
2025-06-18 23:37 ` [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access Song Liu
2025-06-18 23:37 ` [PATCH bpf-next 2/4] bpf: Introduce bpf_kernfs_read_xattr to read xattr of kernfs nodes Song Liu
@ 2025-06-18 23:37 ` Song Liu
2025-06-18 23:37 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for bpf_kernfs_read_xattr Song Liu
2025-06-19 0:43 ` [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr Tejun Heo
4 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2025-06-18 23:37 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, gregkh, tj,
daan.j.demeyer, Song Liu
Mark struct cgroup_subsys_state->cgroup as safe under RCU read lock. This
will enable accessing css->cgroup from a bpf css iterator.
Signed-off-by: Song Liu <song@kernel.org>
---
kernel/bpf/verifier.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 279a64933262..e2f53dc8766a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7058,6 +7058,10 @@ BTF_TYPE_SAFE_RCU(struct css_set) {
struct cgroup *dfl_cgrp;
};
+BTF_TYPE_SAFE_RCU(struct cgroup_subsys_state) {
+ struct cgroup *cgroup;
+};
+
/* RCU trusted: these fields are trusted in RCU CS and can be NULL */
BTF_TYPE_SAFE_RCU_OR_NULL(struct mm_struct) {
struct file __rcu *exe_file;
@@ -7108,6 +7112,7 @@ static bool type_is_rcu(struct bpf_verifier_env *env,
BTF_TYPE_EMIT(BTF_TYPE_SAFE_RCU(struct task_struct));
BTF_TYPE_EMIT(BTF_TYPE_SAFE_RCU(struct cgroup));
BTF_TYPE_EMIT(BTF_TYPE_SAFE_RCU(struct css_set));
+ BTF_TYPE_EMIT(BTF_TYPE_SAFE_RCU(struct cgroup_subsys_state));
return btf_nested_type_is_trusted(&env->log, reg, field_name, btf_id, "__safe_rcu");
}
--
2.47.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next 4/4] selftests/bpf: Add tests for bpf_kernfs_read_xattr
2025-06-18 23:37 [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr Song Liu
` (2 preceding siblings ...)
2025-06-18 23:37 ` [PATCH bpf-next 3/4] bpf: Mark cgroup_subsys_state->cgroup RCU safe Song Liu
@ 2025-06-18 23:37 ` Song Liu
2025-06-19 0:43 ` [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr Tejun Heo
4 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2025-06-18 23:37 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, gregkh, tj,
daan.j.demeyer, Song Liu
Add tests for different scenarios with bpf_kernfs_read_xattr:
1. Read cgroup xattr from bpf_cgroup_from_id;
2. Read cgroup xattr from bpf_cgroup_ancestor;
3. Read cgroup xattr from css_iter;
4. Verifier reject using bpf_kernfs_read_xattr in sleepable contexts.
5. Use bpf_kernfs_read_xattr in LSM hook security_socket_connect.
Signed-off-by: Song Liu <song@kernel.org>
---
.../selftests/bpf/prog_tests/kernfs_xattr.c | 145 ++++++++++++++++++
.../selftests/bpf/progs/kernfs_read_xattr.c | 117 ++++++++++++++
.../selftests/bpf/progs/read_cgroupfs_xattr.c | 60 ++++++++
3 files changed, 322 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/kernfs_xattr.c
create mode 100644 tools/testing/selftests/bpf/progs/kernfs_read_xattr.c
create mode 100644 tools/testing/selftests/bpf/progs/read_cgroupfs_xattr.c
diff --git a/tools/testing/selftests/bpf/prog_tests/kernfs_xattr.c b/tools/testing/selftests/bpf/prog_tests/kernfs_xattr.c
new file mode 100644
index 000000000000..b60ccfdc4c4d
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/kernfs_xattr.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <sys/stat.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <sys/xattr.h>
+
+#include <test_progs.h>
+
+#include "read_cgroupfs_xattr.skel.h"
+#include "kernfs_read_xattr.skel.h"
+
+#define CGROUP_FS_ROOT "/sys/fs/cgroup/"
+#define CGROUP_FS_PARENT CGROUP_FS_ROOT "foo/"
+#define CGROUP_FS_CHILD CGROUP_FS_PARENT "bar/"
+
+static int move_pid_to_cgroup(const char *cgroup_folder, pid_t pid)
+{
+ char filename[128];
+ char pid_str[64];
+ int procs_fd;
+ int ret;
+
+ snprintf(filename, sizeof(filename), "%scgroup.procs", cgroup_folder);
+ snprintf(pid_str, sizeof(pid_str), "%d", pid);
+
+ procs_fd = open(filename, O_WRONLY | O_APPEND);
+ if (!ASSERT_OK_FD(procs_fd, "open"))
+ return -1;
+
+ ret = write(procs_fd, pid_str, strlen(pid_str));
+ close(procs_fd);
+ if (!ASSERT_GT(ret, 0, "write cgroup.procs"))
+ return -1;
+ return 0;
+}
+
+static void reset_cgroups_and_lo(void)
+{
+ rmdir(CGROUP_FS_CHILD);
+ rmdir(CGROUP_FS_PARENT);
+ system("ip addr del 1.1.1.1/32 dev lo");
+ system("ip link set dev lo down");
+}
+
+static const char xattr_value_a[] = "bpf_selftest_value_a";
+static const char xattr_value_b[] = "bpf_selftest_value_b";
+static const char xattr_name[] = "user.bpf_test";
+
+static int setup_cgroups_and_lo(void)
+{
+ int err;
+
+ err = mkdir(CGROUP_FS_PARENT, 0755);
+ if (!ASSERT_OK(err, "mkdir 1"))
+ goto error;
+ err = mkdir(CGROUP_FS_CHILD, 0755);
+ if (!ASSERT_OK(err, "mkdir 2"))
+ goto error;
+
+ err = setxattr(CGROUP_FS_PARENT, xattr_name, xattr_value_a,
+ strlen(xattr_value_a) + 1, 0);
+ if (!ASSERT_OK(err, "setxattr 1"))
+ goto error;
+
+ err = setxattr(CGROUP_FS_CHILD, xattr_name, xattr_value_b,
+ strlen(xattr_value_b) + 1, 0);
+ if (!ASSERT_OK(err, "setxattr 2"))
+ goto error;
+
+ err = system("ip link set dev lo up");
+ if (!ASSERT_OK(err, "lo up"))
+ goto error;
+
+ err = system("ip addr add 1.1.1.1 dev lo");
+ if (!ASSERT_OK(err, "lo addr v4"))
+ goto error;
+
+ err = write_sysctl("/proc/sys/net/ipv4/ping_group_range", "0 0");
+ if (!ASSERT_OK(err, "write_sysctl"))
+ goto error;
+
+ return 0;
+error:
+ reset_cgroups_and_lo();
+ return err;
+}
+
+static void test_read_cgroup_xattr(void)
+{
+ struct sockaddr_in sa4 = {
+ .sin_family = AF_INET,
+ .sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+ };
+ struct read_cgroupfs_xattr *skel = NULL;
+ pid_t pid = gettid();
+ int sock_fd = -1;
+ int connect_fd = -1;
+
+ if (!ASSERT_OK(setup_cgroups_and_lo(), "setup_cgroups_and_lo"))
+ return;
+ if (!ASSERT_OK(move_pid_to_cgroup(CGROUP_FS_CHILD, pid),
+ "move_pid_to_cgroup"))
+ goto out;
+
+ skel = read_cgroupfs_xattr__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "read_cgroupfs_xattr__open_and_load"))
+ goto out;
+
+ skel->bss->target_pid = pid;
+
+ if (!ASSERT_OK(read_cgroupfs_xattr__attach(skel), "read_cgroupfs_xattr__attach"))
+ goto out;
+
+ sock_fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
+ if (!ASSERT_OK_FD(sock_fd, "sock create"))
+ goto out;
+
+ connect_fd = connect(sock_fd, &sa4, sizeof(sa4));
+ if (!ASSERT_OK_FD(connect_fd, "connect 1"))
+ goto out;
+ close(connect_fd);
+
+ ASSERT_TRUE(skel->bss->found_value_a, "found_value_a");
+ ASSERT_TRUE(skel->bss->found_value_b, "found_value_b");
+
+out:
+ close(connect_fd);
+ close(sock_fd);
+ read_cgroupfs_xattr__destroy(skel);
+ move_pid_to_cgroup(CGROUP_FS_ROOT, pid);
+ reset_cgroups_and_lo();
+}
+
+void test_kernfs_xattr(void)
+{
+ RUN_TESTS(kernfs_read_xattr);
+
+ if (test__start_subtest("read_cgroupfs_xattr"))
+ test_read_cgroup_xattr();
+}
diff --git a/tools/testing/selftests/bpf/progs/kernfs_read_xattr.c b/tools/testing/selftests/bpf/progs/kernfs_read_xattr.c
new file mode 100644
index 000000000000..851cdcec05a6
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/kernfs_read_xattr.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+char value[16];
+
+__always_inline void read_xattr(struct cgroup *cgroup)
+{
+ struct bpf_dynptr value_ptr;
+
+ bpf_dynptr_from_mem(value, sizeof(value), 0, &value_ptr);
+ bpf_kernfs_read_xattr(cgroup->kn, "user.bpf_test",
+ &value_ptr);
+}
+
+SEC("lsm.s/socket_connect")
+__failure __msg("R1 must be a rcu pointer")
+int BPF_PROG(sleepable_missing_rcu_lock, struct file *f)
+{
+ u64 cgrp_id = bpf_get_current_cgroup_id();
+ struct cgroup *cgrp;
+
+ cgrp = bpf_cgroup_from_id(cgrp_id);
+ if (!cgrp)
+ return 0;
+
+ /* Sleepable, so cgrp->kn is not trusted */
+ read_xattr(cgrp);
+ bpf_cgroup_release(cgrp);
+ return 0;
+}
+
+SEC("lsm.s/socket_connect")
+__success
+int BPF_PROG(sleepable_with_rcu_lock, struct file *f)
+{
+ u64 cgrp_id = bpf_get_current_cgroup_id();
+ struct cgroup *cgrp;
+
+ bpf_rcu_read_lock();
+ cgrp = bpf_cgroup_from_id(cgrp_id);
+ if (!cgrp)
+ goto out;
+
+ /* In rcu read lock, so cgrp->kn is trusted */
+ read_xattr(cgrp);
+ bpf_cgroup_release(cgrp);
+out:
+ bpf_rcu_read_unlock();
+ return 0;
+}
+
+SEC("lsm/socket_connect")
+__success
+int BPF_PROG(non_sleepable, struct file *f)
+{
+ u64 cgrp_id = bpf_get_current_cgroup_id();
+ struct cgroup *cgrp;
+
+ cgrp = bpf_cgroup_from_id(cgrp_id);
+ if (!cgrp)
+ return 0;
+
+ /* non-sleepable, so cgrp->kn is trusted */
+ read_xattr(cgrp);
+ bpf_cgroup_release(cgrp);
+ return 0;
+}
+
+SEC("lsm/socket_connect")
+__success
+int BPF_PROG(use_css_iter, struct file *f)
+{
+ u64 cgrp_id = bpf_get_current_cgroup_id();
+ struct cgroup_subsys_state *css;
+ struct cgroup *cgrp;
+
+ cgrp = bpf_cgroup_from_id(cgrp_id);
+ if (!cgrp)
+ return 0;
+
+ bpf_for_each(css, css, &cgrp->self, BPF_CGROUP_ITER_ANCESTORS_UP)
+ read_xattr(css->cgroup);
+
+ bpf_cgroup_release(cgrp);
+ return 0;
+}
+
+SEC("lsm/socket_connect")
+__success
+int BPF_PROG(use_bpf_cgroup_ancestor, struct file *f)
+{
+ u64 cgrp_id = bpf_get_current_cgroup_id();
+ struct cgroup *cgrp, *ancestor;
+
+ cgrp = bpf_cgroup_from_id(cgrp_id);
+ if (!cgrp)
+ return 0;
+
+ ancestor = bpf_cgroup_ancestor(cgrp, 1);
+ if (!ancestor)
+ goto out;
+ /* non-sleepable, so cgrp->kn is trusted */
+ read_xattr(cgrp);
+ bpf_cgroup_release(ancestor);
+out:
+ bpf_cgroup_release(cgrp);
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/read_cgroupfs_xattr.c b/tools/testing/selftests/bpf/progs/read_cgroupfs_xattr.c
new file mode 100644
index 000000000000..5109e800a443
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/read_cgroupfs_xattr.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+char _license[] SEC("license") = "GPL";
+
+pid_t target_pid = 0;
+
+char xattr_value[64];
+const char expected_value_a[] = "bpf_selftest_value_a";
+const char expected_value_b[] = "bpf_selftest_value_b";
+bool found_value_a;
+bool found_value_b;
+
+SEC("lsm.s/socket_connect")
+int BPF_PROG(test_socket_connect, struct file *f)
+{
+ u64 cgrp_id = bpf_get_current_cgroup_id();
+ struct cgroup_subsys_state *css, *tmp;
+ struct bpf_dynptr value_ptr;
+ struct cgroup *cgrp;
+
+ if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+ return 0;
+
+ bpf_rcu_read_lock();
+ cgrp = bpf_cgroup_from_id(cgrp_id);
+ if (!cgrp) {
+ bpf_rcu_read_unlock();
+ return 0;
+ }
+
+ css = &cgrp->self;
+ bpf_dynptr_from_mem(xattr_value, sizeof(xattr_value), 0, &value_ptr);
+ bpf_for_each(css, tmp, css, BPF_CGROUP_ITER_ANCESTORS_UP) {
+ int ret;
+
+ ret = bpf_kernfs_read_xattr(tmp->cgroup->kn, "user.bpf_test",
+ &value_ptr);
+ if (ret < 0)
+ continue;
+
+ if (ret == sizeof(expected_value_a) &&
+ !bpf_strncmp(xattr_value, sizeof(expected_value_a), expected_value_a))
+ found_value_a = true;
+ if (ret == sizeof(expected_value_b) &&
+ !bpf_strncmp(xattr_value, sizeof(expected_value_b), expected_value_b))
+ found_value_b = true;
+ }
+
+ bpf_rcu_read_unlock();
+ bpf_cgroup_release(cgrp);
+
+ return 0;
+}
--
2.47.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr
2025-06-18 23:37 [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr Song Liu
` (3 preceding siblings ...)
2025-06-18 23:37 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for bpf_kernfs_read_xattr Song Liu
@ 2025-06-19 0:43 ` Tejun Heo
2025-06-19 8:48 ` Christian Brauner
4 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2025-06-19 0:43 UTC (permalink / raw)
To: Song Liu
Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module,
kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, gregkh,
daan.j.demeyer
Hello,
On Wed, Jun 18, 2025 at 04:37:35PM -0700, Song Liu wrote:
> Introduce a new kfunc bpf_kernfs_read_xattr, which can read xattr from
> kernfs nodes (cgroupfs, for example). The primary users are LSMs, for
> example, from systemd. sched_ext could also use xattrs on cgroupfs nodes.
> However, this is not allowed yet, because bpf_kernfs_read_xattr is only
> allowed from LSM hooks. The plan is to address sched_ext later (or in a
> later revision of this set).
I don't think kernfs is the name we should be exposing to BPF users. This is
an implementation detail which may change in the future. I'd rather make it
a generic interface or a cgroup specific one. The name "kernfs" doesn't
really mean much outside kernel code that's using them.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr
2025-06-19 0:43 ` [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr Tejun Heo
@ 2025-06-19 8:48 ` Christian Brauner
2025-06-19 15:31 ` Song Liu
0 siblings, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2025-06-19 8:48 UTC (permalink / raw)
To: Tejun Heo
Cc: Song Liu, bpf, linux-fsdevel, linux-kernel, linux-security-module,
kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro, jack,
kpsingh, mattbobrowski, amir73il, gregkh, daan.j.demeyer
On Wed, Jun 18, 2025 at 02:43:34PM -1000, Tejun Heo wrote:
> Hello,
>
> On Wed, Jun 18, 2025 at 04:37:35PM -0700, Song Liu wrote:
> > Introduce a new kfunc bpf_kernfs_read_xattr, which can read xattr from
> > kernfs nodes (cgroupfs, for example). The primary users are LSMs, for
> > example, from systemd. sched_ext could also use xattrs on cgroupfs nodes.
> > However, this is not allowed yet, because bpf_kernfs_read_xattr is only
> > allowed from LSM hooks. The plan is to address sched_ext later (or in a
> > later revision of this set).
>
> I don't think kernfs is the name we should be exposing to BPF users. This is
> an implementation detail which may change in the future. I'd rather make it
> a generic interface or a cgroup specific one. The name "kernfs" doesn't
cgroup specific, please. That's what I suggested to Daan.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 2/4] bpf: Introduce bpf_kernfs_read_xattr to read xattr of kernfs nodes
2025-06-18 23:37 ` [PATCH bpf-next 2/4] bpf: Introduce bpf_kernfs_read_xattr to read xattr of kernfs nodes Song Liu
@ 2025-06-19 8:49 ` Christian Brauner
0 siblings, 0 replies; 14+ messages in thread
From: Christian Brauner @ 2025-06-19 8:49 UTC (permalink / raw)
To: Song Liu
Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module,
kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro, jack,
kpsingh, mattbobrowski, amir73il, gregkh, tj, daan.j.demeyer
On Wed, Jun 18, 2025 at 04:37:37PM -0700, Song Liu wrote:
> BPF programs, such as LSM and sched_ext, would benefit from tags on
> cgroups. One common practice to apply such tags is to set xattrs on
> cgroupfs files and folders.
>
> Introduce kfunc bpf_kernfs_read_xattr, which allows reading kernfs
> xattr under RCU read lock.
>
> Note that, we already have bpf_get_[file|dentry]_xattr. However, these
> two APIs are not ideal for reading cgroupfs xattrs, because:
>
> 1) These two APIs only works in sleepable contexts;
> 2) There is no kfunc that matches current cgroup to cgroupfs dentry.
>
> Signed-off-by: Song Liu <song@kernel.org>
> ---
> fs/bpf_fs_kfuncs.c | 33 +++++++++++++++++++++++++++++++++
> 1 file changed, 33 insertions(+)
>
> diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
> index 08412532db1b..7576dbc9b340 100644
> --- a/fs/bpf_fs_kfuncs.c
> +++ b/fs/bpf_fs_kfuncs.c
> @@ -9,6 +9,7 @@
> #include <linux/fs.h>
> #include <linux/fsnotify.h>
> #include <linux/file.h>
> +#include <linux/kernfs.h>
> #include <linux/mm.h>
> #include <linux/xattr.h>
>
> @@ -322,6 +323,37 @@ __bpf_kfunc int bpf_remove_dentry_xattr(struct dentry *dentry, const char *name_
> return ret;
> }
>
> +/**
> + * bpf_kernfs_read_xattr - get xattr of a kernfs node
> + * @kn: kernfs_node to get xattr from
> + * @name__str: name of the xattr
> + * @value_p: output buffer of the xattr value
> + *
> + * Get xattr *name__str* of *kn* and store the output in *value_ptr*.
> + *
> + * For security reasons, only *name__str* with prefix "user." is allowed.
> + *
> + * Return: length of the xattr value on success, a negative value on error.
> + */
> +__bpf_kfunc int bpf_kernfs_read_xattr(struct kernfs_node *kn, const char *name__str,
> + struct bpf_dynptr *value_p)
Please pass struct cgroup, then go from struct cgroup to kernfs node.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access
2025-06-18 23:37 ` [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access Song Liu
@ 2025-06-19 10:01 ` Christian Brauner
2025-06-19 10:33 ` Greg Kroah-Hartman
` (2 more replies)
2025-06-19 13:57 ` kernel test robot
1 sibling, 3 replies; 14+ messages in thread
From: Christian Brauner @ 2025-06-19 10:01 UTC (permalink / raw)
To: Song Liu, Greg Kroah-Hartman, Tejun Heo
Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module,
kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro, jack,
kpsingh, mattbobrowski, amir73il, gregkh, tj, daan.j.demeyer
[-- Attachment #1: Type: text/plain, Size: 1684 bytes --]
On Wed, Jun 18, 2025 at 04:37:36PM -0700, Song Liu wrote:
> Existing kernfs_xattr_get() locks iattr_mutex, so it cannot be used in
> RCU critical sections. Introduce __kernfs_xattr_get(), which reads xattr
> under RCU read lock. This can be used by BPF programs to access cgroupfs
> xattrs.
>
> Signed-off-by: Song Liu <song@kernel.org>
> ---
> fs/kernfs/inode.c | 14 ++++++++++++++
> include/linux/kernfs.h | 2 ++
> 2 files changed, 16 insertions(+)
>
> diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
> index b83054da68b3..0ca231d2012c 100644
> --- a/fs/kernfs/inode.c
> +++ b/fs/kernfs/inode.c
> @@ -302,6 +302,20 @@ int kernfs_xattr_get(struct kernfs_node *kn, const char *name,
> return simple_xattr_get(&attrs->xattrs, name, value, size);
> }
>
> +int __kernfs_xattr_get(struct kernfs_node *kn, const char *name,
> + void *value, size_t size)
> +{
> + struct kernfs_iattrs *attrs;
> +
> + WARN_ON_ONCE(!rcu_read_lock_held());
> +
> + attrs = rcu_dereference(kn->iattr);
> + if (!attrs)
> + return -ENODATA;
Hm, that looks a bit silly. Which isn't your fault. I'm looking at the
kernfs code that does the xattr allocations and I think that's the
origin of the silliness. It uses a single global mutex for all kernfs
users thus serializing all allocations for kernfs->iattr. That seems
crazy but maybe I'm missing a good reason.
I'm appending a patch to remove that mutex. @Greg, @Tejun, can you take
a look whether that makes sense to you. Then I can take that patch and
you can build yours on top of the series and I'll pick it all up in one
go.
You should then just use READ_ONCE(kn->iattr) or the
kernfs_iattrs_noalloc(kn) helper in your kfunc.
[-- Attachment #2: 0001-kernfs-remove-iattr_mutex.patch --]
[-- Type: text/x-diff, Size: 5027 bytes --]
From bdc53435a1cd5c456dc28d8239eff0e7fa4e8dda Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Thu, 19 Jun 2025 11:50:26 +0200
Subject: [PATCH] kernfs: remove iattr_mutex
All allocations of struct kernfs_iattrs are serialized through a global
mutex. Simply do a racy allocation and let the first one win. I bet most
callers are under inode->i_rwsem anyway and it wouldn't be needed but
let's not require that.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Note, that this uses kfree() for the kmem cache allocation.
That's been possible for a while now but not everyone knows about it
yet so I'm pointing it out explicitly.
---
fs/kernfs/inode.c | 74 +++++++++++++++++++++++++----------------------
1 file changed, 40 insertions(+), 34 deletions(-)
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index b83054da68b3..f4b73b9482b7 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -24,45 +24,46 @@ static const struct inode_operations kernfs_iops = {
.listxattr = kernfs_iop_listxattr,
};
-static struct kernfs_iattrs *__kernfs_iattrs(struct kernfs_node *kn, int alloc)
+static struct kernfs_iattrs *__kernfs_iattrs(struct kernfs_node *kn, bool alloc)
{
- static DEFINE_MUTEX(iattr_mutex);
- struct kernfs_iattrs *ret;
+ struct kernfs_iattrs *ret __free(kfree) = NULL;
+ struct kernfs_iattrs *attr;
- mutex_lock(&iattr_mutex);
+ attr = READ_ONCE(kn->iattr);
+ if (attr || !alloc)
+ return attr;
- if (kn->iattr || !alloc)
- goto out_unlock;
-
- kn->iattr = kmem_cache_zalloc(kernfs_iattrs_cache, GFP_KERNEL);
- if (!kn->iattr)
- goto out_unlock;
+ ret = kmem_cache_zalloc(kernfs_iattrs_cache, GFP_KERNEL);
+ if (!ret)
+ return NULL;
/* assign default attributes */
- kn->iattr->ia_uid = GLOBAL_ROOT_UID;
- kn->iattr->ia_gid = GLOBAL_ROOT_GID;
-
- ktime_get_real_ts64(&kn->iattr->ia_atime);
- kn->iattr->ia_mtime = kn->iattr->ia_atime;
- kn->iattr->ia_ctime = kn->iattr->ia_atime;
-
- simple_xattrs_init(&kn->iattr->xattrs);
- atomic_set(&kn->iattr->nr_user_xattrs, 0);
- atomic_set(&kn->iattr->user_xattr_size, 0);
-out_unlock:
- ret = kn->iattr;
- mutex_unlock(&iattr_mutex);
- return ret;
+ ret->ia_uid = GLOBAL_ROOT_UID;
+ ret->ia_gid = GLOBAL_ROOT_GID;
+
+ ktime_get_real_ts64(&ret->ia_atime);
+ ret->ia_mtime = ret->ia_atime;
+ ret->ia_ctime = ret->ia_atime;
+
+ simple_xattrs_init(&ret->xattrs);
+ atomic_set(&ret->nr_user_xattrs, 0);
+ atomic_set(&ret->user_xattr_size, 0);
+
+ /* If someone raced us, recognize it. */
+ if (!try_cmpxchg(&kn->iattr, &attr, ret))
+ return READ_ONCE(kn->iattr);
+
+ return no_free_ptr(ret);
}
static struct kernfs_iattrs *kernfs_iattrs(struct kernfs_node *kn)
{
- return __kernfs_iattrs(kn, 1);
+ return __kernfs_iattrs(kn, true);
}
static struct kernfs_iattrs *kernfs_iattrs_noalloc(struct kernfs_node *kn)
{
- return __kernfs_iattrs(kn, 0);
+ return __kernfs_iattrs(kn, false);
}
int __kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr)
@@ -141,9 +142,9 @@ ssize_t kernfs_iop_listxattr(struct dentry *dentry, char *buf, size_t size)
struct kernfs_node *kn = kernfs_dentry_node(dentry);
struct kernfs_iattrs *attrs;
- attrs = kernfs_iattrs(kn);
+ attrs = kernfs_iattrs_noalloc(kn);
if (!attrs)
- return -ENOMEM;
+ return -ENODATA;
return simple_xattr_list(d_inode(dentry), &attrs->xattrs, buf, size);
}
@@ -166,9 +167,10 @@ static inline void set_inode_attr(struct inode *inode,
static void kernfs_refresh_inode(struct kernfs_node *kn, struct inode *inode)
{
- struct kernfs_iattrs *attrs = kn->iattr;
+ struct kernfs_iattrs *attrs;
inode->i_mode = kn->mode;
+ attrs = kernfs_iattrs_noalloc(kn);
if (attrs)
/*
* kernfs_node has non-default attributes get them from
@@ -306,7 +308,9 @@ int kernfs_xattr_set(struct kernfs_node *kn, const char *name,
const void *value, size_t size, int flags)
{
struct simple_xattr *old_xattr;
- struct kernfs_iattrs *attrs = kernfs_iattrs(kn);
+ struct kernfs_iattrs *attrs;
+
+ attrs = kernfs_iattrs(kn);
if (!attrs)
return -ENOMEM;
@@ -345,8 +349,9 @@ static int kernfs_vfs_user_xattr_add(struct kernfs_node *kn,
struct simple_xattrs *xattrs,
const void *value, size_t size, int flags)
{
- atomic_t *sz = &kn->iattr->user_xattr_size;
- atomic_t *nr = &kn->iattr->nr_user_xattrs;
+ struct kernfs_iattrs *attr = kernfs_iattrs_noalloc(kn);
+ atomic_t *sz = &attr->user_xattr_size;
+ atomic_t *nr = &attr->nr_user_xattrs;
struct simple_xattr *old_xattr;
int ret;
@@ -384,8 +389,9 @@ static int kernfs_vfs_user_xattr_rm(struct kernfs_node *kn,
struct simple_xattrs *xattrs,
const void *value, size_t size, int flags)
{
- atomic_t *sz = &kn->iattr->user_xattr_size;
- atomic_t *nr = &kn->iattr->nr_user_xattrs;
+ struct kernfs_iattrs *attr = kernfs_iattrs(kn);
+ atomic_t *sz = &attr->user_xattr_size;
+ atomic_t *nr = &attr->nr_user_xattrs;
struct simple_xattr *old_xattr;
old_xattr = simple_xattr_set(xattrs, full_name, value, size, flags);
--
2.47.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access
2025-06-19 10:01 ` Christian Brauner
@ 2025-06-19 10:33 ` Greg Kroah-Hartman
2025-06-19 15:33 ` Song Liu
2025-06-21 2:38 ` Tejun Heo
2 siblings, 0 replies; 14+ messages in thread
From: Greg Kroah-Hartman @ 2025-06-19 10:33 UTC (permalink / raw)
To: Christian Brauner
Cc: Song Liu, Tejun Heo, bpf, linux-fsdevel, linux-kernel,
linux-security-module, kernel-team, andrii, eddyz87, ast, daniel,
martin.lau, viro, jack, kpsingh, mattbobrowski, amir73il,
daan.j.demeyer
On Thu, Jun 19, 2025 at 12:01:19PM +0200, Christian Brauner wrote:
> On Wed, Jun 18, 2025 at 04:37:36PM -0700, Song Liu wrote:
> > Existing kernfs_xattr_get() locks iattr_mutex, so it cannot be used in
> > RCU critical sections. Introduce __kernfs_xattr_get(), which reads xattr
> > under RCU read lock. This can be used by BPF programs to access cgroupfs
> > xattrs.
> >
> > Signed-off-by: Song Liu <song@kernel.org>
> > ---
> > fs/kernfs/inode.c | 14 ++++++++++++++
> > include/linux/kernfs.h | 2 ++
> > 2 files changed, 16 insertions(+)
> >
> > diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
> > index b83054da68b3..0ca231d2012c 100644
> > --- a/fs/kernfs/inode.c
> > +++ b/fs/kernfs/inode.c
> > @@ -302,6 +302,20 @@ int kernfs_xattr_get(struct kernfs_node *kn, const char *name,
> > return simple_xattr_get(&attrs->xattrs, name, value, size);
> > }
> >
> > +int __kernfs_xattr_get(struct kernfs_node *kn, const char *name,
> > + void *value, size_t size)
> > +{
> > + struct kernfs_iattrs *attrs;
> > +
> > + WARN_ON_ONCE(!rcu_read_lock_held());
> > +
> > + attrs = rcu_dereference(kn->iattr);
> > + if (!attrs)
> > + return -ENODATA;
>
> Hm, that looks a bit silly. Which isn't your fault. I'm looking at the
> kernfs code that does the xattr allocations and I think that's the
> origin of the silliness. It uses a single global mutex for all kernfs
> users thus serializing all allocations for kernfs->iattr. That seems
> crazy but maybe I'm missing a good reason.
>
> I'm appending a patch to remove that mutex. @Greg, @Tejun, can you take
> a look whether that makes sense to you. Then I can take that patch and
> you can build yours on top of the series and I'll pick it all up in one
> go.
Looks sane to me, thanks!
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access
2025-06-18 23:37 ` [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access Song Liu
2025-06-19 10:01 ` Christian Brauner
@ 2025-06-19 13:57 ` kernel test robot
1 sibling, 0 replies; 14+ messages in thread
From: kernel test robot @ 2025-06-19 13:57 UTC (permalink / raw)
To: Song Liu, bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: oe-kbuild-all, kernel-team, andrii, eddyz87, ast, daniel,
martin.lau, viro, brauner, jack, kpsingh, mattbobrowski, amir73il,
gregkh, tj, daan.j.demeyer, Song Liu
Hi Song,
kernel test robot noticed the following build warnings:
[auto build test WARNING on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Song-Liu/kernfs-Add-__kernfs_xattr_get-for-RCU-protected-access/20250619-074026
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20250618233739.189106-2-song%40kernel.org
patch subject: [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access
config: m68k-randconfig-r122-20250619 (https://download.01.org/0day-ci/archive/20250619/202506192154.T111naKp-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 8.5.0
reproduce: (https://download.01.org/0day-ci/archive/20250619/202506192154.T111naKp-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202506192154.T111naKp-lkp@intel.com/
sparse warnings: (new ones prefixed by >>)
>> fs/kernfs/inode.c:312:17: sparse: sparse: incompatible types in comparison expression (different address spaces):
fs/kernfs/inode.c:312:17: sparse: struct kernfs_iattrs [noderef] __rcu *
fs/kernfs/inode.c:312:17: sparse: struct kernfs_iattrs *
vim +312 fs/kernfs/inode.c
304
305 int __kernfs_xattr_get(struct kernfs_node *kn, const char *name,
306 void *value, size_t size)
307 {
308 struct kernfs_iattrs *attrs;
309
310 WARN_ON_ONCE(!rcu_read_lock_held());
311
> 312 attrs = rcu_dereference(kn->iattr);
313 if (!attrs)
314 return -ENODATA;
315
316 return simple_xattr_get(&attrs->xattrs, name, value, size);
317 }
318
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr
2025-06-19 8:48 ` Christian Brauner
@ 2025-06-19 15:31 ` Song Liu
0 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2025-06-19 15:31 UTC (permalink / raw)
To: Christian Brauner
Cc: Tejun Heo, Song Liu, bpf@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-security-module@vger.kernel.org, Kernel Team,
andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
daniel@iogearbox.net, martin.lau@linux.dev,
viro@zeniv.linux.org.uk, jack@suse.cz, kpsingh@kernel.org,
mattbobrowski@google.com, amir73il@gmail.com,
gregkh@linuxfoundation.org, daan.j.demeyer@gmail.com
> On Jun 19, 2025, at 1:48 AM, Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, Jun 18, 2025 at 02:43:34PM -1000, Tejun Heo wrote:
>> Hello,
>>
>> On Wed, Jun 18, 2025 at 04:37:35PM -0700, Song Liu wrote:
>>> Introduce a new kfunc bpf_kernfs_read_xattr, which can read xattr from
>>> kernfs nodes (cgroupfs, for example). The primary users are LSMs, for
>>> example, from systemd. sched_ext could also use xattrs on cgroupfs nodes.
>>> However, this is not allowed yet, because bpf_kernfs_read_xattr is only
>>> allowed from LSM hooks. The plan is to address sched_ext later (or in a
>>> later revision of this set).
>>
>> I don't think kernfs is the name we should be exposing to BPF users. This is
>> an implementation detail which may change in the future. I'd rather make it
>> a generic interface or a cgroup specific one. The name "kernfs" doesn't
>
> cgroup specific, please. That's what I suggested to Daan.
I guess there was some misunderstanding. I will make this cgroup specific
in v2.
Thanks,
Song
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access
2025-06-19 10:01 ` Christian Brauner
2025-06-19 10:33 ` Greg Kroah-Hartman
@ 2025-06-19 15:33 ` Song Liu
2025-06-21 2:38 ` Tejun Heo
2 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2025-06-19 15:33 UTC (permalink / raw)
To: Christian Brauner
Cc: Song Liu, Greg Kroah-Hartman, Tejun Heo, bpf@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-security-module@vger.kernel.org, Kernel Team,
andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
daniel@iogearbox.net, martin.lau@linux.dev,
viro@zeniv.linux.org.uk, jack@suse.cz, kpsingh@kernel.org,
mattbobrowski@google.com, amir73il@gmail.com,
daan.j.demeyer@gmail.com
> On Jun 19, 2025, at 3:01 AM, Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, Jun 18, 2025 at 04:37:36PM -0700, Song Liu wrote:
>> Existing kernfs_xattr_get() locks iattr_mutex, so it cannot be used in
>> RCU critical sections. Introduce __kernfs_xattr_get(), which reads xattr
>> under RCU read lock. This can be used by BPF programs to access cgroupfs
>> xattrs.
>>
>> Signed-off-by: Song Liu <song@kernel.org>
>> ---
>> fs/kernfs/inode.c | 14 ++++++++++++++
>> include/linux/kernfs.h | 2 ++
>> 2 files changed, 16 insertions(+)
>>
>> diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
>> index b83054da68b3..0ca231d2012c 100644
>> --- a/fs/kernfs/inode.c
>> +++ b/fs/kernfs/inode.c
>> @@ -302,6 +302,20 @@ int kernfs_xattr_get(struct kernfs_node *kn, const char *name,
>> return simple_xattr_get(&attrs->xattrs, name, value, size);
>> }
>>
>> +int __kernfs_xattr_get(struct kernfs_node *kn, const char *name,
>> + void *value, size_t size)
>> +{
>> + struct kernfs_iattrs *attrs;
>> +
>> + WARN_ON_ONCE(!rcu_read_lock_held());
>> +
>> + attrs = rcu_dereference(kn->iattr);
>> + if (!attrs)
>> + return -ENODATA;
>
> Hm, that looks a bit silly. Which isn't your fault. I'm looking at the
> kernfs code that does the xattr allocations and I think that's the
> origin of the silliness. It uses a single global mutex for all kernfs
> users thus serializing all allocations for kernfs->iattr. That seems
> crazy but maybe I'm missing a good reason.
>
> I'm appending a patch to remove that mutex. @Greg, @Tejun, can you take
> a look whether that makes sense to you. Then I can take that patch and
> you can build yours on top of the series and I'll pick it all up in one
> go.
>
> You should then just use READ_ONCE(kn->iattr) or the
> kernfs_iattrs_noalloc(kn) helper in your kfunc.
> <0001-kernfs-remove-iattr_mutex.patch>
This looks better indeed.
I will drop 1/4 and include this patch.
Thanks,
Song
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access
2025-06-19 10:01 ` Christian Brauner
2025-06-19 10:33 ` Greg Kroah-Hartman
2025-06-19 15:33 ` Song Liu
@ 2025-06-21 2:38 ` Tejun Heo
2 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2025-06-21 2:38 UTC (permalink / raw)
To: Christian Brauner
Cc: Song Liu, Greg Kroah-Hartman, bpf, linux-fsdevel, linux-kernel,
linux-security-module, kernel-team, andrii, eddyz87, ast, daniel,
martin.lau, viro, jack, kpsingh, mattbobrowski, amir73il,
daan.j.demeyer
On Thu, Jun 19, 2025 at 12:01:19PM +0200, Christian Brauner wrote:
> From bdc53435a1cd5c456dc28d8239eff0e7fa4e8dda Mon Sep 17 00:00:00 2001
> From: Christian Brauner <brauner@kernel.org>
> Date: Thu, 19 Jun 2025 11:50:26 +0200
> Subject: [PATCH] kernfs: remove iattr_mutex
>
> All allocations of struct kernfs_iattrs are serialized through a global
> mutex. Simply do a racy allocation and let the first one win. I bet most
> callers are under inode->i_rwsem anyway and it wouldn't be needed but
> let's not require that.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-06-21 2:38 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18 23:37 [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr Song Liu
2025-06-18 23:37 ` [PATCH bpf-next 1/4] kernfs: Add __kernfs_xattr_get for RCU protected access Song Liu
2025-06-19 10:01 ` Christian Brauner
2025-06-19 10:33 ` Greg Kroah-Hartman
2025-06-19 15:33 ` Song Liu
2025-06-21 2:38 ` Tejun Heo
2025-06-19 13:57 ` kernel test robot
2025-06-18 23:37 ` [PATCH bpf-next 2/4] bpf: Introduce bpf_kernfs_read_xattr to read xattr of kernfs nodes Song Liu
2025-06-19 8:49 ` Christian Brauner
2025-06-18 23:37 ` [PATCH bpf-next 3/4] bpf: Mark cgroup_subsys_state->cgroup RCU safe Song Liu
2025-06-18 23:37 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for bpf_kernfs_read_xattr Song Liu
2025-06-19 0:43 ` [PATCH bpf-next 0/4] Introduce bpf_kernfs_read_xattr Tejun Heo
2025-06-19 8:48 ` Christian Brauner
2025-06-19 15:31 ` Song Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).