cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] kernfs: Use RCU to access kernfs_node::{parent|name}.
@ 2025-01-28  8:42 Sebastian Andrzej Siewior
  2025-01-28  8:42 ` [PATCH v5 1/6] kernfs: Acquire kernfs_rwsem in kernfs_notify_workfn() Sebastian Andrzej Siewior
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-28  8:42 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Michal Koutný, Paul E. McKenney, Boqun Feng,
	Greg Kroah-Hartman, Hillf Danton, Johannes Weiner, Marco Elver,
	Tejun Heo, tglx, Sebastian Andrzej Siewior

Hi,

This started as a bug report by Hillf Danton and aims to access
kernfs_node::{name|parent} with RCU to avoid the lock during
kernfs_path_from_node().

I've split the individual fixes in separate patches (#1 to #4). I've
also split the ::parent and ::name RCU conversation into a single patch
(#5 and #6).

v4…v5 https://lore.kernel.org/all/20250124174614.866884-1-bigeasy@linutronix.de/
  - rdtgroup:
    - Add a comment to rdt_get_kn_parent_priv() regarding lifetime of
      parent.
    - Move individual rcu_dereference_check() invocations into
      rdt_kn_parent() with a comment on lifetime.
    - Use rcu_access_pointer() in kernfs_to_rdtgroup() instead
      rcu_dereference_check(, true)
  - s/kernfs_rcu_get_name/kernfs_rcu_name/
  - Move all rcu_dereference_check() within kernfs into kernfs_parent()
    and extend its checks to have all cases in one spot. Document why
    each case makes sense.
  - kernfs_notify_workfn(): Do unlocks in the reverse order of locks.
  - Add kernfs_root_flags() and use it in cgroup's kn_get_priv() to
    check the right KERNFS_ROOT_INVARIANT_PARENT flag.

v3: https://lore.kernel.org/all/20241121175250.EJbI7VMb@linutronix.de/
v2: https://lore.kernel.org/all/20241112155713.269214-1-bigeasy@linutronix.de/
v1: https://lore.kernel.org/all/20241108222406.n5azgO98@linutronix.de/

Sebastian

Sebastian Andrzej Siewior (6):
  kernfs: Acquire kernfs_rwsem in kernfs_notify_workfn().
  kernfs: Acquire kernfs_rwsem in kernfs_get_parent_dentry().
  kernfs: Acquire kernfs_rwsem in kernfs_node_dentry().
  kernfs: Don't re-lock kernfs_root::kernfs_rwsem in
    kernfs_fop_readdir().
  kernfs: Use RCU to access kernfs_node::parent.
  kernfs: Use RCU to access kernfs_node::name.

 arch/x86/kernel/cpu/resctrl/internal.h    |   5 +
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  14 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  73 +++++---
 fs/kernfs/dir.c                           | 214 ++++++++++++----------
 fs/kernfs/file.c                          |   6 +-
 fs/kernfs/kernfs-internal.h               |  37 +++-
 fs/kernfs/mount.c                         |  21 ++-
 fs/kernfs/symlink.c                       |  30 +--
 fs/sysfs/dir.c                            |   2 +-
 fs/sysfs/file.c                           |  24 ++-
 include/linux/kernfs.h                    |  14 +-
 kernel/cgroup/cgroup-v1.c                 |   2 +-
 kernel/cgroup/cgroup.c                    |  24 ++-
 security/selinux/hooks.c                  |   7 +-
 14 files changed, 306 insertions(+), 167 deletions(-)

-- 
2.47.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 1/6] kernfs: Acquire kernfs_rwsem in kernfs_notify_workfn().
  2025-01-28  8:42 [PATCH v5 0/6] kernfs: Use RCU to access kernfs_node::{parent|name} Sebastian Andrzej Siewior
@ 2025-01-28  8:42 ` Sebastian Andrzej Siewior
  2025-01-28  8:42 ` [PATCH v5 2/6] kernfs: Acquire kernfs_rwsem in kernfs_get_parent_dentry() Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-28  8:42 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Michal Koutný, Paul E. McKenney, Boqun Feng,
	Greg Kroah-Hartman, Hillf Danton, Johannes Weiner, Marco Elver,
	Tejun Heo, tglx, Sebastian Andrzej Siewior

kernfs_notify_workfn() dereferences kernfs_node::name and passes it
later to fsnotify(). If the node is renamed then the previously observed
name pointer becomes invalid.

Acquire kernfs_root::kernfs_rwsem to block renames of the node.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 fs/kernfs/file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 8502ef68459b9..165d8e37976ba 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -911,6 +911,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
 	/* kick fsnotify */
 
 	down_read(&root->kernfs_supers_rwsem);
+	down_read(&root->kernfs_rwsem);
 	list_for_each_entry(info, &kernfs_root(kn)->supers, node) {
 		struct kernfs_node *parent;
 		struct inode *p_inode = NULL;
@@ -947,6 +948,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
 		iput(inode);
 	}
 
+	up_read(&root->kernfs_rwsem);
 	up_read(&root->kernfs_supers_rwsem);
 	kernfs_put(kn);
 	goto repeat;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 2/6] kernfs: Acquire kernfs_rwsem in kernfs_get_parent_dentry().
  2025-01-28  8:42 [PATCH v5 0/6] kernfs: Use RCU to access kernfs_node::{parent|name} Sebastian Andrzej Siewior
  2025-01-28  8:42 ` [PATCH v5 1/6] kernfs: Acquire kernfs_rwsem in kernfs_notify_workfn() Sebastian Andrzej Siewior
@ 2025-01-28  8:42 ` Sebastian Andrzej Siewior
  2025-01-28  8:42 ` [PATCH v5 3/6] kernfs: Acquire kernfs_rwsem in kernfs_node_dentry() Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-28  8:42 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Michal Koutný, Paul E. McKenney, Boqun Feng,
	Greg Kroah-Hartman, Hillf Danton, Johannes Weiner, Marco Elver,
	Tejun Heo, tglx, Sebastian Andrzej Siewior

kernfs_get_parent_dentry() passes kernfs_node::parent to
kernfs_get_inode().

Acquire kernfs_root::kernfs_rwsem to ensure kernfs_node::parent isn't
replaced during the operation.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 fs/kernfs/mount.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 1358c21837f1a..b9b16e97bff18 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -145,7 +145,9 @@ static struct dentry *kernfs_fh_to_parent(struct super_block *sb,
 static struct dentry *kernfs_get_parent_dentry(struct dentry *child)
 {
 	struct kernfs_node *kn = kernfs_dentry_node(child);
+	struct kernfs_root *root = kernfs_root(kn);
 
+	guard(rwsem_read)(&root->kernfs_rwsem);
 	return d_obtain_alias(kernfs_get_inode(child->d_sb, kn->parent));
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 3/6] kernfs: Acquire kernfs_rwsem in kernfs_node_dentry().
  2025-01-28  8:42 [PATCH v5 0/6] kernfs: Use RCU to access kernfs_node::{parent|name} Sebastian Andrzej Siewior
  2025-01-28  8:42 ` [PATCH v5 1/6] kernfs: Acquire kernfs_rwsem in kernfs_notify_workfn() Sebastian Andrzej Siewior
  2025-01-28  8:42 ` [PATCH v5 2/6] kernfs: Acquire kernfs_rwsem in kernfs_get_parent_dentry() Sebastian Andrzej Siewior
@ 2025-01-28  8:42 ` Sebastian Andrzej Siewior
  2025-01-28  8:42 ` [PATCH v5 4/6] kernfs: Don't re-lock kernfs_root::kernfs_rwsem in kernfs_fop_readdir() Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-28  8:42 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Michal Koutný, Paul E. McKenney, Boqun Feng,
	Greg Kroah-Hartman, Hillf Danton, Johannes Weiner, Marco Elver,
	Tejun Heo, tglx, Sebastian Andrzej Siewior

kernfs_node_dentry() passes kernfs_node::name to
lookup_positive_unlocked().

Acquire kernfs_root::kernfs_rwsem to ensure the node is not renamed
during the operation.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 fs/kernfs/mount.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index b9b16e97bff18..4a0ff08d589ca 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -209,6 +209,7 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
 {
 	struct dentry *dentry;
 	struct kernfs_node *knparent;
+	struct kernfs_root *root;
 
 	BUG_ON(sb->s_op != &kernfs_sops);
 
@@ -218,6 +219,9 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
 	if (!kn->parent)
 		return dentry;
 
+	root = kernfs_root(kn);
+	guard(rwsem_read)(&root->kernfs_rwsem);
+
 	knparent = find_next_ancestor(kn, NULL);
 	if (WARN_ON(!knparent)) {
 		dput(dentry);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 4/6] kernfs: Don't re-lock kernfs_root::kernfs_rwsem in kernfs_fop_readdir().
  2025-01-28  8:42 [PATCH v5 0/6] kernfs: Use RCU to access kernfs_node::{parent|name} Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2025-01-28  8:42 ` [PATCH v5 3/6] kernfs: Acquire kernfs_rwsem in kernfs_node_dentry() Sebastian Andrzej Siewior
@ 2025-01-28  8:42 ` Sebastian Andrzej Siewior
  2025-01-28  8:42 ` [PATCH v5 5/6] kernfs: Use RCU to access kernfs_node::parent Sebastian Andrzej Siewior
  2025-01-28  8:42 ` [PATCH v5 6/6] kernfs: Use RCU to access kernfs_node::name Sebastian Andrzej Siewior
  5 siblings, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-28  8:42 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Michal Koutný, Paul E. McKenney, Boqun Feng,
	Greg Kroah-Hartman, Hillf Danton, Johannes Weiner, Marco Elver,
	Tejun Heo, tglx, Sebastian Andrzej Siewior

The readdir operation iterates over all entries and invokes dir_emit()
for every entry passing kernfs_node::name as argument.
Since the name argument can change, and become invalid, the
kernfs_root::kernfs_rwsem lock should not be dropped to prevent renames
during the operation.

The lock drop around dir_emit() has been initially introduced in commit
   1e5289c97bba2 ("sysfs: Cache the last sysfs_dirent to improve readdir scalability v2")

to avoid holding a global lock during a page fault. The lock drop is
wrong since the support of renames and not a big burden since the lock
is no longer global.

Don't re-acquire kernfs_root::kernfs_rwsem while copying the name to the
userpace buffer.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 fs/kernfs/dir.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 458519e416fe7..5a1fea414996e 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1868,10 +1868,10 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx)
 		file->private_data = pos;
 		kernfs_get(pos);
 
-		up_read(&root->kernfs_rwsem);
-		if (!dir_emit(ctx, name, len, ino, type))
+		if (!dir_emit(ctx, name, len, ino, type)) {
+			up_read(&root->kernfs_rwsem);
 			return 0;
-		down_read(&root->kernfs_rwsem);
+		}
 	}
 	up_read(&root->kernfs_rwsem);
 	file->private_data = NULL;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 5/6] kernfs: Use RCU to access kernfs_node::parent.
  2025-01-28  8:42 [PATCH v5 0/6] kernfs: Use RCU to access kernfs_node::{parent|name} Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2025-01-28  8:42 ` [PATCH v5 4/6] kernfs: Don't re-lock kernfs_root::kernfs_rwsem in kernfs_fop_readdir() Sebastian Andrzej Siewior
@ 2025-01-28  8:42 ` Sebastian Andrzej Siewior
  2025-01-28 20:31   ` Tejun Heo
  2025-01-28  8:42 ` [PATCH v5 6/6] kernfs: Use RCU to access kernfs_node::name Sebastian Andrzej Siewior
  5 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-28  8:42 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Michal Koutný, Paul E. McKenney, Boqun Feng,
	Greg Kroah-Hartman, Hillf Danton, Johannes Weiner, Marco Elver,
	Tejun Heo, tglx, Sebastian Andrzej Siewior

kernfs_rename_lock is used to obtain stable kernfs_node::{name|parent}
pointer. This is a preparation to access kernfs_node::parent under RCU
and ensure that the pointer remains stable under the RCU lifetime
guarantees.

For a complete path, as it is done in kernfs_path_from_node(), the
kernfs_rename_lock is still required in order to obtain a stable parent
relationship while computing the relevant node depth. This must not
change while the nodes are inspected in order to build the path.
If the kernfs user never moves the nodes (changes the parent) then the
kernfs_rename_lock is not required and the RCU guarantees are
sufficient. This "restriction" can be set with
KERNFS_ROOT_INVARIANT_PARENT. Otherwise the lock is required.

Rename kernfs_node::parent to kernfs_node::__parent to denote the RCU
access and use RCU accessor while accessing the node.
Make cgroup use KERNFS_ROOT_INVARIANT_PARENT since the parent here can
not change.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 65 +++++++++++------
 fs/kernfs/dir.c                        | 99 +++++++++++++++-----------
 fs/kernfs/kernfs-internal.h            | 32 ++++++++-
 fs/kernfs/mount.c                      | 10 +--
 fs/kernfs/symlink.c                    | 23 +++---
 fs/sysfs/file.c                        | 24 ++++---
 include/linux/kernfs.h                 | 10 ++-
 kernel/cgroup/cgroup-v1.c              |  2 +-
 kernel/cgroup/cgroup.c                 | 24 +++++--
 9 files changed, 193 insertions(+), 96 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d906a1cd84917..d9e24100fd8b0 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -947,10 +947,20 @@ static int rdt_last_cmd_status_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static void *rdt_get_kn_parent_priv(struct kernfs_node *kn)
+{
+	/*
+	 * The parent pointer is only valid within RCU section since it can be
+	 * replaced.
+	 */
+	guard(rcu)();
+	return rcu_dereference(kn->__parent)->priv;
+}
+
 static int rdt_num_closids_show(struct kernfs_open_file *of,
 				struct seq_file *seq, void *v)
 {
-	struct resctrl_schema *s = of->kn->parent->priv;
+	struct resctrl_schema *s = rdt_get_kn_parent_priv(of->kn);
 
 	seq_printf(seq, "%u\n", s->num_closid);
 	return 0;
@@ -959,7 +969,7 @@ static int rdt_num_closids_show(struct kernfs_open_file *of,
 static int rdt_default_ctrl_show(struct kernfs_open_file *of,
 			     struct seq_file *seq, void *v)
 {
-	struct resctrl_schema *s = of->kn->parent->priv;
+	struct resctrl_schema *s = rdt_get_kn_parent_priv(of->kn);
 	struct rdt_resource *r = s->res;
 
 	seq_printf(seq, "%x\n", r->default_ctrl);
@@ -969,7 +979,7 @@ static int rdt_default_ctrl_show(struct kernfs_open_file *of,
 static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 			     struct seq_file *seq, void *v)
 {
-	struct resctrl_schema *s = of->kn->parent->priv;
+	struct resctrl_schema *s = rdt_get_kn_parent_priv(of->kn);
 	struct rdt_resource *r = s->res;
 
 	seq_printf(seq, "%u\n", r->cache.min_cbm_bits);
@@ -979,7 +989,7 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 static int rdt_shareable_bits_show(struct kernfs_open_file *of,
 				   struct seq_file *seq, void *v)
 {
-	struct resctrl_schema *s = of->kn->parent->priv;
+	struct resctrl_schema *s = rdt_get_kn_parent_priv(of->kn);
 	struct rdt_resource *r = s->res;
 
 	seq_printf(seq, "%x\n", r->cache.shareable_bits);
@@ -1003,7 +1013,7 @@ static int rdt_shareable_bits_show(struct kernfs_open_file *of,
 static int rdt_bit_usage_show(struct kernfs_open_file *of,
 			      struct seq_file *seq, void *v)
 {
-	struct resctrl_schema *s = of->kn->parent->priv;
+	struct resctrl_schema *s = rdt_get_kn_parent_priv(of->kn);
 	/*
 	 * Use unsigned long even though only 32 bits are used to ensure
 	 * test_bit() is used safely.
@@ -1085,7 +1095,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
 static int rdt_min_bw_show(struct kernfs_open_file *of,
 			     struct seq_file *seq, void *v)
 {
-	struct resctrl_schema *s = of->kn->parent->priv;
+	struct resctrl_schema *s = rdt_get_kn_parent_priv(of->kn);
 	struct rdt_resource *r = s->res;
 
 	seq_printf(seq, "%u\n", r->membw.min_bw);
@@ -1095,7 +1105,7 @@ static int rdt_min_bw_show(struct kernfs_open_file *of,
 static int rdt_num_rmids_show(struct kernfs_open_file *of,
 			      struct seq_file *seq, void *v)
 {
-	struct rdt_resource *r = of->kn->parent->priv;
+	struct rdt_resource *r = rdt_get_kn_parent_priv(of->kn);
 
 	seq_printf(seq, "%d\n", r->num_rmid);
 
@@ -1105,7 +1115,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
 static int rdt_mon_features_show(struct kernfs_open_file *of,
 				 struct seq_file *seq, void *v)
 {
-	struct rdt_resource *r = of->kn->parent->priv;
+	struct rdt_resource *r = rdt_get_kn_parent_priv(of->kn);
 	struct mon_evt *mevt;
 
 	list_for_each_entry(mevt, &r->evt_list, list) {
@@ -1120,7 +1130,7 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
 static int rdt_bw_gran_show(struct kernfs_open_file *of,
 			     struct seq_file *seq, void *v)
 {
-	struct resctrl_schema *s = of->kn->parent->priv;
+	struct resctrl_schema *s = rdt_get_kn_parent_priv(of->kn);
 	struct rdt_resource *r = s->res;
 
 	seq_printf(seq, "%u\n", r->membw.bw_gran);
@@ -1130,7 +1140,7 @@ static int rdt_bw_gran_show(struct kernfs_open_file *of,
 static int rdt_delay_linear_show(struct kernfs_open_file *of,
 			     struct seq_file *seq, void *v)
 {
-	struct resctrl_schema *s = of->kn->parent->priv;
+	struct resctrl_schema *s = rdt_get_kn_parent_priv(of->kn);
 	struct rdt_resource *r = s->res;
 
 	seq_printf(seq, "%u\n", r->membw.delay_linear);
@@ -1148,7 +1158,7 @@ static int max_threshold_occ_show(struct kernfs_open_file *of,
 static int rdt_thread_throttle_mode_show(struct kernfs_open_file *of,
 					 struct seq_file *seq, void *v)
 {
-	struct resctrl_schema *s = of->kn->parent->priv;
+	struct resctrl_schema *s = rdt_get_kn_parent_priv(of->kn);
 	struct rdt_resource *r = s->res;
 
 	if (r->membw.throttle_mode == THREAD_THROTTLE_PER_THREAD)
@@ -1213,7 +1223,7 @@ static enum resctrl_conf_type resctrl_peer_type(enum resctrl_conf_type my_type)
 static int rdt_has_sparse_bitmasks_show(struct kernfs_open_file *of,
 					struct seq_file *seq, void *v)
 {
-	struct resctrl_schema *s = of->kn->parent->priv;
+	struct resctrl_schema *s = rdt_get_kn_parent_priv(of->kn);
 	struct rdt_resource *r = s->res;
 
 	seq_printf(seq, "%u\n", r->cache.arch_has_sparse_bitmasks);
@@ -1625,7 +1635,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
 static int mbm_total_bytes_config_show(struct kernfs_open_file *of,
 				       struct seq_file *seq, void *v)
 {
-	struct rdt_resource *r = of->kn->parent->priv;
+	struct rdt_resource *r = rdt_get_kn_parent_priv(of->kn);
 
 	mbm_config_show(seq, r, QOS_L3_MBM_TOTAL_EVENT_ID);
 
@@ -1635,7 +1645,7 @@ static int mbm_total_bytes_config_show(struct kernfs_open_file *of,
 static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
 				       struct seq_file *seq, void *v)
 {
-	struct rdt_resource *r = of->kn->parent->priv;
+	struct rdt_resource *r = rdt_get_kn_parent_priv(of->kn);
 
 	mbm_config_show(seq, r, QOS_L3_MBM_LOCAL_EVENT_ID);
 
@@ -1741,7 +1751,7 @@ static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
 					    char *buf, size_t nbytes,
 					    loff_t off)
 {
-	struct rdt_resource *r = of->kn->parent->priv;
+	struct rdt_resource *r = rdt_get_kn_parent_priv(of->kn);
 	int ret;
 
 	/* Valid input requires a trailing newline */
@@ -1767,7 +1777,7 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
 					    char *buf, size_t nbytes,
 					    loff_t off)
 {
-	struct rdt_resource *r = of->kn->parent->priv;
+	struct rdt_resource *r = rdt_get_kn_parent_priv(of->kn);
 	int ret;
 
 	/* Valid input requires a trailing newline */
@@ -2429,12 +2439,13 @@ static struct rdtgroup *kernfs_to_rdtgroup(struct kernfs_node *kn)
 		 * resource. "info" and its subdirectories don't
 		 * have rdtgroup structures, so return NULL here.
 		 */
-		if (kn == kn_info || kn->parent == kn_info)
+		if (kn == kn_info ||
+		    rcu_access_pointer(kn->__parent) == kn_info)
 			return NULL;
 		else
 			return kn->priv;
 	} else {
-		return kn->parent->priv;
+		return rdt_get_kn_parent_priv(kn);
 	}
 }
 
@@ -3758,9 +3769,18 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	return 0;
 }
 
+static struct kernfs_node *rdt_kn_parent(struct kernfs_node *kn)
+{
+	/*
+	 * Valid within the RCU section it was obtained or while rdtgroup_mutex
+	 * is held.
+	 */
+	return rcu_dereference_check(kn->__parent, lockdep_is_held(&rdtgroup_mutex));
+}
+
 static int rdtgroup_rmdir(struct kernfs_node *kn)
 {
-	struct kernfs_node *parent_kn = kn->parent;
+	struct kernfs_node *parent_kn;
 	struct rdtgroup *rdtgrp;
 	cpumask_var_t tmpmask;
 	int ret = 0;
@@ -3773,6 +3793,7 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
 		ret = -EPERM;
 		goto out;
 	}
+	parent_kn = rdt_kn_parent(kn);
 
 	/*
 	 * If the rdtgroup is a ctrl_mon group and parent directory
@@ -3841,6 +3862,7 @@ static void mongrp_reparent(struct rdtgroup *rdtgrp,
 static int rdtgroup_rename(struct kernfs_node *kn,
 			   struct kernfs_node *new_parent, const char *new_name)
 {
+	struct kernfs_node *kn_parent;
 	struct rdtgroup *new_prdtgrp;
 	struct rdtgroup *rdtgrp;
 	cpumask_var_t tmpmask;
@@ -3875,8 +3897,9 @@ static int rdtgroup_rename(struct kernfs_node *kn,
 		goto out;
 	}
 
-	if (rdtgrp->type != RDTMON_GROUP || !kn->parent ||
-	    !is_mon_groups(kn->parent, kn->name)) {
+	kn_parent = rdt_kn_parent(kn);
+	if (rdtgrp->type != RDTMON_GROUP || !kn_parent ||
+	    !is_mon_groups(kn_parent, kn->name)) {
 		rdt_last_cmd_puts("Source must be a MON group\n");
 		ret = -EPERM;
 		goto out;
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 5a1fea414996e..16d268345e3b7 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -17,7 +17,7 @@
 
 #include "kernfs-internal.h"
 
-static DEFINE_RWLOCK(kernfs_rename_lock);	/* kn->parent and ->name */
+DEFINE_RWLOCK(kernfs_rename_lock);	/* kn->parent and ->name */
 /*
  * Don't use rename_lock to piggy back on pr_cont_buf. We don't want to
  * call pr_cont() while holding rename_lock. Because sometimes pr_cont()
@@ -56,7 +56,7 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	if (!kn)
 		return strscpy(buf, "(null)", buflen);
 
-	return strscpy(buf, kn->parent ? kn->name : "/", buflen);
+	return strscpy(buf, rcu_access_pointer(kn->__parent) ? kn->name : "/", buflen);
 }
 
 /* kernfs_node_depth - compute depth from @from to @to */
@@ -64,9 +64,9 @@ static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
 {
 	size_t depth = 0;
 
-	while (to->parent && to != from) {
+	while (rcu_dereference(to->__parent) && to != from) {
 		depth++;
-		to = to->parent;
+		to = rcu_dereference(to->__parent);
 	}
 	return depth;
 }
@@ -84,18 +84,18 @@ static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
 	db = kernfs_depth(rb->kn, b);
 
 	while (da > db) {
-		a = a->parent;
+		a = rcu_dereference(a->__parent);
 		da--;
 	}
 	while (db > da) {
-		b = b->parent;
+		b = rcu_dereference(b->__parent);
 		db--;
 	}
 
 	/* worst case b and a will be the same at root */
 	while (b != a) {
-		b = b->parent;
-		a = a->parent;
+		b = rcu_dereference(b->__parent);
+		a = rcu_dereference(a->__parent);
 	}
 
 	return a;
@@ -168,8 +168,9 @@ static int kernfs_path_from_node_locked(struct kernfs_node *kn_to,
 
 	/* Calculate how many bytes we need for the rest */
 	for (i = depth_to - 1; i >= 0; i--) {
+
 		for (kn = kn_to, j = 0; j < i; j++)
-			kn = kn->parent;
+			kn = rcu_dereference(kn->__parent);
 
 		len += scnprintf(buf + len, buflen - len, "/%s", kn->name);
 	}
@@ -226,6 +227,7 @@ int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
 	unsigned long flags;
 	int ret;
 
+	guard(rcu)();
 	read_lock_irqsave(&kernfs_rename_lock, flags);
 	ret = kernfs_path_from_node_locked(to, from, buf, buflen);
 	read_unlock_irqrestore(&kernfs_rename_lock, flags);
@@ -295,7 +297,7 @@ struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn)
 	unsigned long flags;
 
 	read_lock_irqsave(&kernfs_rename_lock, flags);
-	parent = kn->parent;
+	parent = kernfs_parent(kn);
 	kernfs_get(parent);
 	read_unlock_irqrestore(&kernfs_rename_lock, flags);
 
@@ -360,8 +362,12 @@ static int kernfs_sd_compare(const struct kernfs_node *left,
  */
 static int kernfs_link_sibling(struct kernfs_node *kn)
 {
-	struct rb_node **node = &kn->parent->dir.children.rb_node;
 	struct rb_node *parent = NULL;
+	struct kernfs_node *kn_parent;
+	struct rb_node **node;
+
+	kn_parent = kernfs_parent(kn);
+	node = &kn_parent->dir.children.rb_node;
 
 	while (*node) {
 		struct kernfs_node *pos;
@@ -380,13 +386,13 @@ static int kernfs_link_sibling(struct kernfs_node *kn)
 
 	/* add new node and rebalance the tree */
 	rb_link_node(&kn->rb, parent, node);
-	rb_insert_color(&kn->rb, &kn->parent->dir.children);
+	rb_insert_color(&kn->rb, &kn_parent->dir.children);
 
 	/* successfully added, account subdir number */
 	down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
 	if (kernfs_type(kn) == KERNFS_DIR)
-		kn->parent->dir.subdirs++;
-	kernfs_inc_rev(kn->parent);
+		kn_parent->dir.subdirs++;
+	kernfs_inc_rev(kn_parent);
 	up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
 
 	return 0;
@@ -407,16 +413,19 @@ static int kernfs_link_sibling(struct kernfs_node *kn)
  */
 static bool kernfs_unlink_sibling(struct kernfs_node *kn)
 {
+	struct kernfs_node *kn_parent;
+
 	if (RB_EMPTY_NODE(&kn->rb))
 		return false;
 
+	kn_parent = kernfs_parent(kn);
 	down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
 	if (kernfs_type(kn) == KERNFS_DIR)
-		kn->parent->dir.subdirs--;
-	kernfs_inc_rev(kn->parent);
+		kn_parent->dir.subdirs--;
+	kernfs_inc_rev(kn_parent);
 	up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
 
-	rb_erase(&kn->rb, &kn->parent->dir.children);
+	rb_erase(&kn->rb, &kn_parent->dir.children);
 	RB_CLEAR_NODE(&kn->rb);
 	return true;
 }
@@ -558,11 +567,7 @@ void kernfs_put(struct kernfs_node *kn)
 		return;
 	root = kernfs_root(kn);
  repeat:
-	/*
-	 * Moving/renaming is always done while holding reference.
-	 * kn->parent won't change beneath us.
-	 */
-	parent = kn->parent;
+	parent = kernfs_parent(kn);
 
 	WARN_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS,
 		  "kernfs_put: %s/%s: released with incorrect active_ref %d\n",
@@ -701,7 +706,7 @@ struct kernfs_node *kernfs_new_node(struct kernfs_node *parent,
 			       name, mode, uid, gid, flags);
 	if (kn) {
 		kernfs_get(parent);
-		kn->parent = parent;
+		rcu_assign_pointer(kn->__parent, parent);
 	}
 	return kn;
 }
@@ -769,13 +774,14 @@ struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root,
  */
 int kernfs_add_one(struct kernfs_node *kn)
 {
-	struct kernfs_node *parent = kn->parent;
-	struct kernfs_root *root = kernfs_root(parent);
+	struct kernfs_root *root = kernfs_root(kn);
 	struct kernfs_iattrs *ps_iattr;
+	struct kernfs_node *parent;
 	bool has_ns;
 	int ret;
 
 	down_write(&root->kernfs_rwsem);
+	parent = kernfs_parent(kn);
 
 	ret = -EINVAL;
 	has_ns = kernfs_ns_enabled(parent);
@@ -949,6 +955,11 @@ struct kernfs_node *kernfs_walk_and_get_ns(struct kernfs_node *parent,
 	return kn;
 }
 
+unsigned int kernfs_root_flags(struct kernfs_node *kn)
+{
+	return kernfs_root(kn)->flags;
+}
+
 /**
  * kernfs_create_root - create a new kernfs hierarchy
  * @scops: optional syscall operations for the hierarchy
@@ -1111,7 +1122,7 @@ struct kernfs_node *kernfs_create_empty_dir(struct kernfs_node *parent,
 
 static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
 {
-	struct kernfs_node *kn;
+	struct kernfs_node *kn, *parent;
 	struct kernfs_root *root;
 
 	if (flags & LOOKUP_RCU)
@@ -1162,8 +1173,9 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
 	if (!kernfs_active(kn))
 		goto out_bad;
 
+	parent = kernfs_parent(kn);
 	/* The kernfs node has been moved? */
-	if (kernfs_dentry_node(dentry->d_parent) != kn->parent)
+	if (kernfs_dentry_node(dentry->d_parent) != parent)
 		goto out_bad;
 
 	/* The kernfs node has been renamed */
@@ -1171,7 +1183,7 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
 		goto out_bad;
 
 	/* The kernfs node has been moved to a different namespace */
-	if (kn->parent && kernfs_ns_enabled(kn->parent) &&
+	if (parent && kernfs_ns_enabled(parent) &&
 	    kernfs_info(dentry->d_sb)->ns != kn->ns)
 		goto out_bad;
 
@@ -1364,7 +1376,7 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos,
 		return kernfs_leftmost_descendant(rb_to_kn(rbn));
 
 	/* no sibling left, visit parent */
-	return pos->parent;
+	return kernfs_parent(pos);
 }
 
 static void kernfs_activate_one(struct kernfs_node *kn)
@@ -1376,7 +1388,7 @@ static void kernfs_activate_one(struct kernfs_node *kn)
 	if (kernfs_active(kn) || (kn->flags & (KERNFS_HIDDEN | KERNFS_REMOVING)))
 		return;
 
-	WARN_ON_ONCE(kn->parent && RB_EMPTY_NODE(&kn->rb));
+	WARN_ON_ONCE(kernfs_parent(kn) && RB_EMPTY_NODE(&kn->rb));
 	WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
 
 	atomic_sub(KN_DEACTIVATED_BIAS, &kn->active);
@@ -1446,7 +1458,7 @@ void kernfs_show(struct kernfs_node *kn, bool show)
 
 static void __kernfs_remove(struct kernfs_node *kn)
 {
-	struct kernfs_node *pos;
+	struct kernfs_node *pos, *parent;
 
 	/* Short-circuit if non-root @kn has already finished removal. */
 	if (!kn)
@@ -1458,7 +1470,7 @@ static void __kernfs_remove(struct kernfs_node *kn)
 	 * This is for kernfs_remove_self() which plays with active ref
 	 * after removal.
 	 */
-	if (kn->parent && RB_EMPTY_NODE(&kn->rb))
+	if (kernfs_parent(kn) && RB_EMPTY_NODE(&kn->rb))
 		return;
 
 	pr_debug("kernfs %s: removing\n", kn->name);
@@ -1484,14 +1496,14 @@ static void __kernfs_remove(struct kernfs_node *kn)
 		kernfs_get(pos);
 
 		kernfs_drain(pos);
-
+		parent = kernfs_parent(pos);
 		/*
 		 * kernfs_unlink_sibling() succeeds once per node.  Use it
 		 * to decide who's responsible for cleanups.
 		 */
-		if (!pos->parent || kernfs_unlink_sibling(pos)) {
+		if (!parent || kernfs_unlink_sibling(pos)) {
 			struct kernfs_iattrs *ps_iattr =
-				pos->parent ? pos->parent->iattr : NULL;
+				parent ? parent->iattr : NULL;
 
 			/* update timestamps on the parent */
 			down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
@@ -1721,7 +1733,7 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
 	int error;
 
 	/* can't move or rename root */
-	if (!kn->parent)
+	if (!rcu_access_pointer(kn->__parent))
 		return -EINVAL;
 
 	root = kernfs_root(kn);
@@ -1732,8 +1744,15 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
 	    (new_parent->flags & KERNFS_EMPTY_DIR))
 		goto out;
 
+	old_parent = kernfs_parent(kn);
+	if (root->flags & KERNFS_ROOT_INVARIANT_PARENT) {
+		error = -EINVAL;
+		if (WARN_ON_ONCE(old_parent != new_parent))
+			goto out;
+	}
+
 	error = 0;
-	if ((kn->parent == new_parent) && (kn->ns == new_ns) &&
+	if ((old_parent == new_parent) && (kn->ns == new_ns) &&
 	    (strcmp(kn->name, new_name) == 0))
 		goto out;	/* nothing to rename */
 
@@ -1760,8 +1779,8 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
 	/* rename_lock protects ->parent and ->name accessors */
 	write_lock_irq(&kernfs_rename_lock);
 
-	old_parent = kn->parent;
-	kn->parent = new_parent;
+	old_parent = kernfs_parent(kn);
+	rcu_assign_pointer(kn->__parent, new_parent);
 
 	kn->ns = new_ns;
 	if (new_name) {
@@ -1794,7 +1813,7 @@ static struct kernfs_node *kernfs_dir_pos(const void *ns,
 {
 	if (pos) {
 		int valid = kernfs_active(pos) &&
-			pos->parent == parent && hash == pos->hash;
+			kernfs_parent(pos) == parent && hash == pos->hash;
 		kernfs_put(pos);
 		if (!valid)
 			pos = NULL;
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index b42ee6547cdc1..c43bee18b79f7 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -19,6 +19,8 @@
 #include <linux/kernfs.h>
 #include <linux/fs_context.h>
 
+extern rwlock_t kernfs_rename_lock;
+
 struct kernfs_iattrs {
 	kuid_t			ia_uid;
 	kgid_t			ia_gid;
@@ -64,11 +66,14 @@ struct kernfs_root {
  *
  * Return: the kernfs_root @kn belongs to.
  */
-static inline struct kernfs_root *kernfs_root(struct kernfs_node *kn)
+static inline struct kernfs_root *kernfs_root(const struct kernfs_node *kn)
 {
+	const struct kernfs_node *knp;
 	/* if parent exists, it's always a dir; otherwise, @sd is a dir */
-	if (kn->parent)
-		kn = kn->parent;
+	guard(rcu)();
+	knp = rcu_dereference(kn->__parent);
+	if (knp)
+		kn = knp;
 	return kn->dir.root;
 }
 
@@ -97,6 +102,27 @@ struct kernfs_super_info {
 };
 #define kernfs_info(SB) ((struct kernfs_super_info *)(SB->s_fs_info))
 
+static inline bool kernfs_root_is_locked(const struct kernfs_node *kn)
+{
+	return lockdep_is_held(&kernfs_root(kn)->kernfs_rwsem);
+}
+
+static inline struct kernfs_node *kernfs_parent(const struct kernfs_node *kn)
+{
+	/*
+	 * The kernfs_node::__parent remains valid within a RCU section. The kn
+	 * can be reparented (and renamed) which changes the entry. This can be
+	 * avoided by locking kernfs_root::kernfs_rwsem or kernfs_rename_lock.
+	 * Both locks can be used to obtain a reference on __parent. Once the
+	 * reference count reaches 0 then the node is about to be freed
+	 * and can not be renamed (or become a different parent) anymore.
+	 */
+	return rcu_dereference_check(kn->__parent,
+				     kernfs_root_is_locked(kn) ||
+				     lockdep_is_held(&kernfs_rename_lock) ||
+				     !atomic_read(&kn->count));
+}
+
 static inline struct kernfs_node *kernfs_dentry_node(struct dentry *dentry)
 {
 	if (d_really_is_negative(dentry))
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 4a0ff08d589ca..2252b16e6ef0b 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -148,7 +148,7 @@ static struct dentry *kernfs_get_parent_dentry(struct dentry *child)
 	struct kernfs_root *root = kernfs_root(kn);
 
 	guard(rwsem_read)(&root->kernfs_rwsem);
-	return d_obtain_alias(kernfs_get_inode(child->d_sb, kn->parent));
+	return d_obtain_alias(kernfs_get_inode(child->d_sb, kernfs_parent(kn)));
 }
 
 static const struct export_operations kernfs_export_ops = {
@@ -188,10 +188,10 @@ static struct kernfs_node *find_next_ancestor(struct kernfs_node *child,
 		return NULL;
 	}
 
-	while (child->parent != parent) {
-		if (!child->parent)
+	while (kernfs_parent(child) != parent) {
+		child = kernfs_parent(child);
+		if (!child)
 			return NULL;
-		child = child->parent;
 	}
 
 	return child;
@@ -216,7 +216,7 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
 	dentry = dget(sb->s_root);
 
 	/* Check if this is the root kernfs_node */
-	if (!kn->parent)
+	if (!rcu_access_pointer(kn->__parent))
 		return dentry;
 
 	root = kernfs_root(kn);
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 45371a70caa71..05c62ca93c53d 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -62,10 +62,10 @@ static int kernfs_get_target_path(struct kernfs_node *parent,
 
 	/* go up to the root, stop at the base */
 	base = parent;
-	while (base->parent) {
-		kn = target->parent;
-		while (kn->parent && base != kn)
-			kn = kn->parent;
+	while (kernfs_parent(base)) {
+		kn = kernfs_parent(target);
+		while (kernfs_parent(kn) && base != kn)
+			kn = kernfs_parent(kn);
 
 		if (base == kn)
 			break;
@@ -75,14 +75,14 @@ static int kernfs_get_target_path(struct kernfs_node *parent,
 
 		strcpy(s, "../");
 		s += 3;
-		base = base->parent;
+		base = kernfs_parent(base);
 	}
 
 	/* determine end of target string for reverse fillup */
 	kn = target;
-	while (kn->parent && kn != base) {
+	while (kernfs_parent(kn) && kn != base) {
 		len += strlen(kn->name) + 1;
-		kn = kn->parent;
+		kn = kernfs_parent(kn);
 	}
 
 	/* check limits */
@@ -94,7 +94,7 @@ static int kernfs_get_target_path(struct kernfs_node *parent,
 
 	/* reverse fillup of target string from target to base */
 	kn = target;
-	while (kn->parent && kn != base) {
+	while (kernfs_parent(kn) && kn != base) {
 		int slen = strlen(kn->name);
 
 		len -= slen;
@@ -102,7 +102,7 @@ static int kernfs_get_target_path(struct kernfs_node *parent,
 		if (len)
 			s[--len] = '/';
 
-		kn = kn->parent;
+		kn = kernfs_parent(kn);
 	}
 
 	return 0;
@@ -111,12 +111,13 @@ static int kernfs_get_target_path(struct kernfs_node *parent,
 static int kernfs_getlink(struct inode *inode, char *path)
 {
 	struct kernfs_node *kn = inode->i_private;
-	struct kernfs_node *parent = kn->parent;
+	struct kernfs_node *parent;
 	struct kernfs_node *target = kn->symlink.target_kn;
-	struct kernfs_root *root = kernfs_root(parent);
+	struct kernfs_root *root = kernfs_root(kn);
 	int error;
 
 	down_read(&root->kernfs_rwsem);
+	parent = kernfs_parent(kn);
 	error = kernfs_get_target_path(parent, target, path);
 	up_read(&root->kernfs_rwsem);
 
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 785408861c01c..1d16b4a386aa9 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -19,13 +19,19 @@
 
 #include "sysfs.h"
 
+static struct kobject *sysfs_file_kobj(struct kernfs_node *kn)
+{
+	guard(rcu)();
+	return rcu_dereference(kn->__parent)->priv;
+}
+
 /*
  * Determine ktype->sysfs_ops for the given kernfs_node.  This function
  * must be called while holding an active reference.
  */
 static const struct sysfs_ops *sysfs_file_ops(struct kernfs_node *kn)
 {
-	struct kobject *kobj = kn->parent->priv;
+	struct kobject *kobj = sysfs_file_kobj(kn);
 
 	if (kn->flags & KERNFS_LOCKDEP)
 		lockdep_assert_held(kn);
@@ -40,7 +46,7 @@ static const struct sysfs_ops *sysfs_file_ops(struct kernfs_node *kn)
 static int sysfs_kf_seq_show(struct seq_file *sf, void *v)
 {
 	struct kernfs_open_file *of = sf->private;
-	struct kobject *kobj = of->kn->parent->priv;
+	struct kobject *kobj = sysfs_file_kobj(of->kn);
 	const struct sysfs_ops *ops = sysfs_file_ops(of->kn);
 	ssize_t count;
 	char *buf;
@@ -78,7 +84,7 @@ static ssize_t sysfs_kf_bin_read(struct kernfs_open_file *of, char *buf,
 				 size_t count, loff_t pos)
 {
 	struct bin_attribute *battr = of->kn->priv;
-	struct kobject *kobj = of->kn->parent->priv;
+	struct kobject *kobj = sysfs_file_kobj(of->kn);
 	loff_t size = file_inode(of->file)->i_size;
 
 	if (!count)
@@ -105,7 +111,7 @@ static ssize_t sysfs_kf_read(struct kernfs_open_file *of, char *buf,
 			     size_t count, loff_t pos)
 {
 	const struct sysfs_ops *ops = sysfs_file_ops(of->kn);
-	struct kobject *kobj = of->kn->parent->priv;
+	struct kobject *kobj = sysfs_file_kobj(of->kn);
 	ssize_t len;
 
 	/*
@@ -131,7 +137,7 @@ static ssize_t sysfs_kf_write(struct kernfs_open_file *of, char *buf,
 			      size_t count, loff_t pos)
 {
 	const struct sysfs_ops *ops = sysfs_file_ops(of->kn);
-	struct kobject *kobj = of->kn->parent->priv;
+	struct kobject *kobj = sysfs_file_kobj(of->kn);
 
 	if (!count)
 		return 0;
@@ -144,7 +150,7 @@ static ssize_t sysfs_kf_bin_write(struct kernfs_open_file *of, char *buf,
 				  size_t count, loff_t pos)
 {
 	struct bin_attribute *battr = of->kn->priv;
-	struct kobject *kobj = of->kn->parent->priv;
+	struct kobject *kobj = sysfs_file_kobj(of->kn);
 	loff_t size = file_inode(of->file)->i_size;
 
 	if (size) {
@@ -168,7 +174,7 @@ static int sysfs_kf_bin_mmap(struct kernfs_open_file *of,
 			     struct vm_area_struct *vma)
 {
 	struct bin_attribute *battr = of->kn->priv;
-	struct kobject *kobj = of->kn->parent->priv;
+	struct kobject *kobj = sysfs_file_kobj(of->kn);
 
 	return battr->mmap(of->file, kobj, battr, vma);
 }
@@ -177,7 +183,7 @@ static loff_t sysfs_kf_bin_llseek(struct kernfs_open_file *of, loff_t offset,
 				  int whence)
 {
 	struct bin_attribute *battr = of->kn->priv;
-	struct kobject *kobj = of->kn->parent->priv;
+	struct kobject *kobj = sysfs_file_kobj(of->kn);
 
 	if (battr->llseek)
 		return battr->llseek(of->file, kobj, battr, offset, whence);
@@ -494,7 +500,7 @@ EXPORT_SYMBOL_GPL(sysfs_break_active_protection);
  */
 void sysfs_unbreak_active_protection(struct kernfs_node *kn)
 {
-	struct kobject *kobj = kn->parent->priv;
+	struct kobject *kobj = sysfs_file_kobj(kn);
 
 	kernfs_unbreak_active_protection(kn);
 	kernfs_put(kn);
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 87c79d076d6d7..5dda9a268e44c 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -147,6 +147,11 @@ enum kernfs_root_flag {
 	 * Support user xattrs to be written to nodes rooted at this root.
 	 */
 	KERNFS_ROOT_SUPPORT_USER_XATTR		= 0x0008,
+
+	/*
+	 * Renames must not change the parent node.
+	 */
+	KERNFS_ROOT_INVARIANT_PARENT		= 0x0010,
 };
 
 /* type-specific structures for kernfs_node union members */
@@ -199,8 +204,8 @@ struct kernfs_node {
 	 * never moved to a different parent, it is safe to access the
 	 * parent directly.
 	 */
-	struct kernfs_node	*parent;
 	const char		*name;
+	struct kernfs_node	__rcu *__parent;
 
 	struct rb_node		rb;
 
@@ -416,6 +421,7 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
 struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops,
 				       unsigned int flags, void *priv);
 void kernfs_destroy_root(struct kernfs_root *root);
+unsigned int kernfs_root_flags(struct kernfs_node *kn);
 
 struct kernfs_node *kernfs_create_dir_ns(struct kernfs_node *parent,
 					 const char *name, umode_t mode,
@@ -514,6 +520,8 @@ kernfs_create_root(struct kernfs_syscall_ops *scops, unsigned int flags,
 { return ERR_PTR(-ENOSYS); }
 
 static inline void kernfs_destroy_root(struct kernfs_root *root) { }
+static inline unsigned int kernfs_root_flags(struct kernfs_node *kn)
+{ return 0; }
 
 static inline struct kernfs_node *
 kernfs_create_dir_ns(struct kernfs_node *parent, const char *name,
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index e28d5f0d20ed0..c9752eb607ec9 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -844,7 +844,7 @@ static int cgroup1_rename(struct kernfs_node *kn, struct kernfs_node *new_parent
 
 	if (kernfs_type(kn) != KERNFS_DIR)
 		return -ENOTDIR;
-	if (kn->parent != new_parent)
+	if (rcu_access_pointer(kn->__parent) != new_parent)
 		return -EIO;
 
 	/*
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index d9061bd55436b..214aa378936cd 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -633,9 +633,22 @@ int cgroup_task_count(const struct cgroup *cgrp)
 	return count;
 }
 
+static struct cgroup *kn_get_priv(struct kernfs_node *kn)
+{
+	struct kernfs_node *parent;
+	/*
+	 * The parent can not be replaced due to KERNFS_ROOT_INVARIANT_PARENT.
+	 * Therefore it is always safe to dereference this pointer outside of a
+	 * RCU section.
+	 */
+	parent = rcu_dereference_check(kn->__parent,
+				       kernfs_root_flags(kn) & KERNFS_ROOT_INVARIANT_PARENT);
+	return parent->priv;
+}
+
 struct cgroup_subsys_state *of_css(struct kernfs_open_file *of)
 {
-	struct cgroup *cgrp = of->kn->parent->priv;
+	struct cgroup *cgrp = kn_get_priv(of->kn);
 	struct cftype *cft = of_cft(of);
 
 	/*
@@ -1612,7 +1625,7 @@ void cgroup_kn_unlock(struct kernfs_node *kn)
 	if (kernfs_type(kn) == KERNFS_DIR)
 		cgrp = kn->priv;
 	else
-		cgrp = kn->parent->priv;
+		cgrp = kn_get_priv(kn);
 
 	cgroup_unlock();
 
@@ -1644,7 +1657,7 @@ struct cgroup *cgroup_kn_lock_live(struct kernfs_node *kn, bool drain_offline)
 	if (kernfs_type(kn) == KERNFS_DIR)
 		cgrp = kn->priv;
 	else
-		cgrp = kn->parent->priv;
+		cgrp = kn_get_priv(kn);
 
 	/*
 	 * We're gonna grab cgroup_mutex which nests outside kernfs
@@ -2118,7 +2131,8 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
 	root->kf_root = kernfs_create_root(kf_sops,
 					   KERNFS_ROOT_CREATE_DEACTIVATED |
 					   KERNFS_ROOT_SUPPORT_EXPORTOP |
-					   KERNFS_ROOT_SUPPORT_USER_XATTR,
+					   KERNFS_ROOT_SUPPORT_USER_XATTR |
+					   KERNFS_ROOT_INVARIANT_PARENT,
 					   root_cgrp);
 	if (IS_ERR(root->kf_root)) {
 		ret = PTR_ERR(root->kf_root);
@@ -4119,7 +4133,7 @@ static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
 				 size_t nbytes, loff_t off)
 {
 	struct cgroup_file_ctx *ctx = of->priv;
-	struct cgroup *cgrp = of->kn->parent->priv;
+	struct cgroup *cgrp = kn_get_priv(of->kn);
 	struct cftype *cft = of_cft(of);
 	struct cgroup_subsys_state *css;
 	int ret;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 6/6] kernfs: Use RCU to access kernfs_node::name.
  2025-01-28  8:42 [PATCH v5 0/6] kernfs: Use RCU to access kernfs_node::{parent|name} Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2025-01-28  8:42 ` [PATCH v5 5/6] kernfs: Use RCU to access kernfs_node::parent Sebastian Andrzej Siewior
@ 2025-01-28  8:42 ` Sebastian Andrzej Siewior
  2025-01-28 20:40   ` Tejun Heo
  5 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-28  8:42 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Michal Koutný, Paul E. McKenney, Boqun Feng,
	Greg Kroah-Hartman, Hillf Danton, Johannes Weiner, Marco Elver,
	Tejun Heo, tglx, Sebastian Andrzej Siewior,
	syzbot+6ea37e2e6ffccf41a7e6

Using RCU lifetime rules to access kernfs_node::name can avoid the
trouble kernfs_rename_lock in kernfs_name() and kernfs_path_from_node()
if the fs was created with KERNFS_ROOT_INVARIANT_PARENT. This is useful
as it allows to implement kernfs_path_from_node() only with RCU
protection and avoiding kernfs_rename_lock. The lock is only required if
the __parent node can be changed and the function requires an unchanged
hierarchy while it iterates from the node to its parent.

I went through all ::name users and added the required access for the lookup
with a few extensions:
- rdtgroup_pseudo_lock_create() drops all locks and then uses the name
  later on. resctrl supports rename with different parents. Here I made
  a temporal copy of the name while it is used outside of the lock.

- kernfs_rename_ns() accepts NULL as new_parent. This simplifies
  sysfs_move_dir_ns() where it can set NULL in order to reuse the current
  name.

- kernfs_rename_ns() is only using kernfs_rename_lock if the parents are
  different. All users use either kernfs_rwsem (for stable path view) or
  just RCU for the lookup. The ::name uses always RCU free.

Use RCU lifetime guarantees to access kernfs_node::name.

Suggested-by: Tejun Heo <tj@kernel.org>
Reported-by: syzbot+6ea37e2e6ffccf41a7e6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/67251dc6.050a0220.529b6.015e.GAE@google.com/
Reported-by: Hillf Danton <hdanton@sina.com>
Closes: https://lore.kernel.org/20241102001224.2789-1-hdanton@sina.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/cpu/resctrl/internal.h    |   5 +
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  14 ++-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  10 +-
 fs/kernfs/dir.c                           | 113 ++++++++++++----------
 fs/kernfs/file.c                          |   4 +-
 fs/kernfs/kernfs-internal.h               |   5 +
 fs/kernfs/mount.c                         |   5 +-
 fs/kernfs/symlink.c                       |   7 +-
 fs/sysfs/dir.c                            |   2 +-
 include/linux/kernfs.h                    |   4 +-
 security/selinux/hooks.c                  |   7 +-
 11 files changed, 105 insertions(+), 71 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 955999aecfca9..cc83dbc70a8ef 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -505,6 +505,11 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
 
 extern struct mutex rdtgroup_mutex;
 
+static inline const char *rdt_kn_get_name(const struct kernfs_node *kn)
+{
+	return rcu_dereference_check(kn->name, lockdep_is_held(&rdtgroup_mutex));
+}
+
 extern struct rdt_hw_resource rdt_resources_all[];
 extern struct rdtgroup rdtgroup_default;
 extern struct dentry *debugfs_resctrl;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 972e6b6b0481f..efe1fc5a6866f 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -52,7 +52,8 @@ static char *pseudo_lock_devnode(const struct device *dev, umode_t *mode)
 	rdtgrp = dev_get_drvdata(dev);
 	if (mode)
 		*mode = 0600;
-	return kasprintf(GFP_KERNEL, "pseudo_lock/%s", rdtgrp->kn->name);
+	guard(mutex)(&rdtgroup_mutex);
+	return kasprintf(GFP_KERNEL, "pseudo_lock/%s", rdt_kn_get_name(rdtgrp->kn));
 }
 
 static const struct class pseudo_lock_class = {
@@ -1301,6 +1302,7 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 	struct task_struct *thread;
 	unsigned int new_minor;
 	struct device *dev;
+	char *kn_name __free(kfree) = NULL;
 	int ret;
 
 	ret = pseudo_lock_region_alloc(plr);
@@ -1312,6 +1314,11 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 		ret = -EINVAL;
 		goto out_region;
 	}
+	kn_name = kstrdup(rdt_kn_get_name(rdtgrp->kn), GFP_KERNEL);
+	if (!kn_name) {
+		ret = -ENOMEM;
+		goto out_cstates;
+	}
 
 	plr->thread_done = 0;
 
@@ -1360,8 +1367,7 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 	mutex_unlock(&rdtgroup_mutex);
 
 	if (!IS_ERR_OR_NULL(debugfs_resctrl)) {
-		plr->debugfs_dir = debugfs_create_dir(rdtgrp->kn->name,
-						      debugfs_resctrl);
+		plr->debugfs_dir = debugfs_create_dir(kn_name, debugfs_resctrl);
 		if (!IS_ERR_OR_NULL(plr->debugfs_dir))
 			debugfs_create_file("pseudo_lock_measure", 0200,
 					    plr->debugfs_dir, rdtgrp,
@@ -1370,7 +1376,7 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 
 	dev = device_create(&pseudo_lock_class, NULL,
 			    MKDEV(pseudo_lock_major, new_minor),
-			    rdtgrp, "%s", rdtgrp->kn->name);
+			    rdtgrp, "%s", kn_name);
 
 	mutex_lock(&rdtgroup_mutex);
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d9e24100fd8b0..bf06b34cd1e96 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -907,14 +907,14 @@ int proc_resctrl_show(struct seq_file *s, struct pid_namespace *ns,
 			continue;
 
 		seq_printf(s, "res:%s%s\n", (rdtg == &rdtgroup_default) ? "/" : "",
-			   rdtg->kn->name);
+			   rdt_kn_get_name(rdtg->kn));
 		seq_puts(s, "mon:");
 		list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
 				    mon.crdtgrp_list) {
 			if (!resctrl_arch_match_rmid(tsk, crg->mon.parent->closid,
 						     crg->mon.rmid))
 				continue;
-			seq_printf(s, "%s", crg->kn->name);
+			seq_printf(s, "%s", rdt_kn_get_name(crg->kn));
 			break;
 		}
 		seq_putc(s, '\n');
@@ -3662,7 +3662,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
  */
 static bool is_mon_groups(struct kernfs_node *kn, const char *name)
 {
-	return (!strcmp(kn->name, "mon_groups") &&
+	return (!strcmp(rdt_kn_get_name(kn), "mon_groups") &&
 		strcmp(name, "mon_groups"));
 }
 
@@ -3811,7 +3811,7 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
 			ret = rdtgroup_rmdir_ctrl(rdtgrp, tmpmask);
 		}
 	} else if (rdtgrp->type == RDTMON_GROUP &&
-		 is_mon_groups(parent_kn, kn->name)) {
+		 is_mon_groups(parent_kn, rdt_kn_get_name(kn))) {
 		ret = rdtgroup_rmdir_mon(rdtgrp, tmpmask);
 	} else {
 		ret = -EPERM;
@@ -3899,7 +3899,7 @@ static int rdtgroup_rename(struct kernfs_node *kn,
 
 	kn_parent = rdt_kn_parent(kn);
 	if (rdtgrp->type != RDTMON_GROUP || !kn_parent ||
-	    !is_mon_groups(kn_parent, kn->name)) {
+	    !is_mon_groups(kn_parent, rdt_kn_get_name(kn))) {
 		rdt_last_cmd_puts("Source must be a MON group\n");
 		ret = -EPERM;
 		goto out;
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 16d268345e3b7..42c451045d7c2 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -51,14 +51,6 @@ static bool kernfs_lockdep(struct kernfs_node *kn)
 #endif
 }
 
-static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
-{
-	if (!kn)
-		return strscpy(buf, "(null)", buflen);
-
-	return strscpy(buf, rcu_access_pointer(kn->__parent) ? kn->name : "/", buflen);
-}
-
 /* kernfs_node_depth - compute depth from @from to @to */
 static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
 {
@@ -168,11 +160,13 @@ static int kernfs_path_from_node_locked(struct kernfs_node *kn_to,
 
 	/* Calculate how many bytes we need for the rest */
 	for (i = depth_to - 1; i >= 0; i--) {
+		const char *name;
 
 		for (kn = kn_to, j = 0; j < i; j++)
 			kn = rcu_dereference(kn->__parent);
 
-		len += scnprintf(buf + len, buflen - len, "/%s", kn->name);
+		name = rcu_dereference(kn->name);
+		len += scnprintf(buf + len, buflen - len, "/%s", name);
 	}
 
 	return len;
@@ -196,13 +190,18 @@ static int kernfs_path_from_node_locked(struct kernfs_node *kn_to,
  */
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
 {
-	unsigned long flags;
-	int ret;
+	struct kernfs_node *kn_parent;
 
-	read_lock_irqsave(&kernfs_rename_lock, flags);
-	ret = kernfs_name_locked(kn, buf, buflen);
-	read_unlock_irqrestore(&kernfs_rename_lock, flags);
-	return ret;
+	if (!kn)
+		return strscpy(buf, "(null)", buflen);
+
+	guard(rcu)();
+	/*
+	 * KERNFS_ROOT_INVARIANT_PARENT is ignored here. The name is RCU freed and
+	 * the parent is either existing or not.
+	 */
+	kn_parent = rcu_dereference(kn->__parent);
+	return strscpy(buf, kn_parent ? rcu_dereference(kn->name) : "/", buflen);
 }
 
 /**
@@ -224,14 +223,17 @@ int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
 int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
 			  char *buf, size_t buflen)
 {
-	unsigned long flags;
-	int ret;
+	struct kernfs_root *root;
 
 	guard(rcu)();
-	read_lock_irqsave(&kernfs_rename_lock, flags);
-	ret = kernfs_path_from_node_locked(to, from, buf, buflen);
-	read_unlock_irqrestore(&kernfs_rename_lock, flags);
-	return ret;
+	if (to) {
+		root = kernfs_root(to);
+		if (!(root->flags & KERNFS_ROOT_INVARIANT_PARENT)) {
+			guard(read_lock_irqsave)(&kernfs_rename_lock);
+			return kernfs_path_from_node_locked(to, from, buf, buflen);
+		}
+	}
+	return kernfs_path_from_node_locked(to, from, buf, buflen);
 }
 EXPORT_SYMBOL_GPL(kernfs_path_from_node);
 
@@ -338,13 +340,13 @@ static int kernfs_name_compare(unsigned int hash, const char *name,
 		return -1;
 	if (ns > kn->ns)
 		return 1;
-	return strcmp(name, kn->name);
+	return strcmp(name, kernfs_rcu_name(kn));
 }
 
 static int kernfs_sd_compare(const struct kernfs_node *left,
 			     const struct kernfs_node *right)
 {
-	return kernfs_name_compare(left->hash, left->name, left->ns, right);
+	return kernfs_name_compare(left->hash, kernfs_rcu_name(left), left->ns, right);
 }
 
 /**
@@ -542,7 +544,8 @@ static void kernfs_free_rcu(struct rcu_head *rcu)
 {
 	struct kernfs_node *kn = container_of(rcu, struct kernfs_node, rcu);
 
-	kfree_const(kn->name);
+	/* If the whole node goes away, then name can't be used outside */
+	kfree_const(rcu_access_pointer(kn->name));
 
 	if (kn->iattr) {
 		simple_xattrs_free(&kn->iattr->xattrs, NULL);
@@ -571,7 +574,8 @@ void kernfs_put(struct kernfs_node *kn)
 
 	WARN_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS,
 		  "kernfs_put: %s/%s: released with incorrect active_ref %d\n",
-		  parent ? parent->name : "", kn->name, atomic_read(&kn->active));
+		  parent ? rcu_dereference(parent->name) : "",
+		  rcu_dereference(kn->name), atomic_read(&kn->active));
 
 	if (kernfs_type(kn) == KERNFS_LINK)
 		kernfs_put(kn->symlink.target_kn);
@@ -648,7 +652,7 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 	atomic_set(&kn->active, KN_DEACTIVATED_BIAS);
 	RB_CLEAR_NODE(&kn->rb);
 
-	kn->name = name;
+	rcu_assign_pointer(kn->name, name);
 	kn->mode = mode;
 	kn->flags = flags;
 
@@ -786,7 +790,8 @@ int kernfs_add_one(struct kernfs_node *kn)
 	ret = -EINVAL;
 	has_ns = kernfs_ns_enabled(parent);
 	if (WARN(has_ns != (bool)kn->ns, KERN_WARNING "kernfs: ns %s in '%s' for '%s'\n",
-		 has_ns ? "required" : "invalid", parent->name, kn->name))
+		 has_ns ? "required" : "invalid",
+		 kernfs_rcu_name(parent), kernfs_rcu_name(kn)))
 		goto out_unlock;
 
 	if (kernfs_type(parent) != KERNFS_DIR)
@@ -796,7 +801,7 @@ int kernfs_add_one(struct kernfs_node *kn)
 	if (parent->flags & (KERNFS_REMOVING | KERNFS_EMPTY_DIR))
 		goto out_unlock;
 
-	kn->hash = kernfs_name_hash(kn->name, kn->ns);
+	kn->hash = kernfs_name_hash(kernfs_rcu_name(kn), kn->ns);
 
 	ret = kernfs_link_sibling(kn);
 	if (ret)
@@ -852,7 +857,7 @@ static struct kernfs_node *kernfs_find_ns(struct kernfs_node *parent,
 
 	if (has_ns != (bool)ns) {
 		WARN(1, KERN_WARNING "kernfs: ns %s in '%s' for '%s'\n",
-		     has_ns ? "required" : "invalid", parent->name, name);
+		     has_ns ? "required" : "invalid", kernfs_rcu_name(parent), name);
 		return NULL;
 	}
 
@@ -1130,8 +1135,6 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
 
 	/* Negative hashed dentry? */
 	if (d_really_is_negative(dentry)) {
-		struct kernfs_node *parent;
-
 		/* If the kernfs parent node has changed discard and
 		 * proceed to ->lookup.
 		 *
@@ -1179,7 +1182,7 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
 		goto out_bad;
 
 	/* The kernfs node has been renamed */
-	if (strcmp(dentry->d_name.name, kn->name) != 0)
+	if (strcmp(dentry->d_name.name, kernfs_rcu_name(kn)) != 0)
 		goto out_bad;
 
 	/* The kernfs node has been moved to a different namespace */
@@ -1473,7 +1476,7 @@ static void __kernfs_remove(struct kernfs_node *kn)
 	if (kernfs_parent(kn) && RB_EMPTY_NODE(&kn->rb))
 		return;
 
-	pr_debug("kernfs %s: removing\n", kn->name);
+	pr_debug("kernfs %s: removing\n", kernfs_rcu_name(kn));
 
 	/* prevent new usage by marking all nodes removing and deactivating */
 	pos = NULL;
@@ -1729,7 +1732,7 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
 {
 	struct kernfs_node *old_parent;
 	struct kernfs_root *root;
-	const char *old_name = NULL;
+	const char *old_name;
 	int error;
 
 	/* can't move or rename root */
@@ -1752,8 +1755,11 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
 	}
 
 	error = 0;
+	old_name = kernfs_rcu_name(kn);
+	if (!new_name)
+		new_name = old_name;
 	if ((old_parent == new_parent) && (kn->ns == new_ns) &&
-	    (strcmp(kn->name, new_name) == 0))
+	    (strcmp(old_name, new_name) == 0))
 		goto out;	/* nothing to rename */
 
 	error = -EEXIST;
@@ -1761,7 +1767,7 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
 		goto out;
 
 	/* rename kernfs_node */
-	if (strcmp(kn->name, new_name) != 0) {
+	if (strcmp(old_name, new_name) != 0) {
 		error = -ENOMEM;
 		new_name = kstrdup_const(new_name, GFP_KERNEL);
 		if (!new_name)
@@ -1774,27 +1780,32 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
 	 * Move to the appropriate place in the appropriate directories rbtree.
 	 */
 	kernfs_unlink_sibling(kn);
-	kernfs_get(new_parent);
 
-	/* rename_lock protects ->parent and ->name accessors */
-	write_lock_irq(&kernfs_rename_lock);
+	/* rename_lock protects ->parent accessors */
+	if (old_parent != new_parent) {
+		kernfs_get(new_parent);
+		write_lock_irq(&kernfs_rename_lock);
 
-	old_parent = kernfs_parent(kn);
-	rcu_assign_pointer(kn->__parent, new_parent);
+		rcu_assign_pointer(kn->__parent, new_parent);
 
-	kn->ns = new_ns;
-	if (new_name) {
-		old_name = kn->name;
-		kn->name = new_name;
+		kn->ns = new_ns;
+		if (new_name)
+			rcu_assign_pointer(kn->name, new_name);
+
+		write_unlock_irq(&kernfs_rename_lock);
+		kernfs_put(old_parent);
+	} else {
+		/* name assignment is RCU protected, parent is the same */
+		kn->ns = new_ns;
+		if (new_name)
+			rcu_assign_pointer(kn->name, new_name);
 	}
 
-	write_unlock_irq(&kernfs_rename_lock);
-
-	kn->hash = kernfs_name_hash(kn->name, kn->ns);
+	kn->hash = kernfs_name_hash(new_name ?: old_name, kn->ns);
 	kernfs_link_sibling(kn);
 
-	kernfs_put(old_parent);
-	kfree_const(old_name);
+	if (new_name && !is_kernel_rodata((unsigned long)old_name))
+		kfree_rcu_mightsleep(old_name);
 
 	error = 0;
  out:
@@ -1878,7 +1889,7 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx)
 	for (pos = kernfs_dir_pos(ns, parent, ctx->pos, pos);
 	     pos;
 	     pos = kernfs_dir_next_pos(ns, parent, ctx->pos, pos)) {
-		const char *name = pos->name;
+		const char *name = kernfs_rcu_name(pos);
 		unsigned int type = fs_umode_to_dtype(pos->mode);
 		int len = strlen(name);
 		ino_t ino = kernfs_ino(pos);
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 165d8e37976ba..970bf37a37fb7 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -915,6 +915,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
 	list_for_each_entry(info, &kernfs_root(kn)->supers, node) {
 		struct kernfs_node *parent;
 		struct inode *p_inode = NULL;
+		const char *kn_name;
 		struct inode *inode;
 		struct qstr name;
 
@@ -928,7 +929,8 @@ static void kernfs_notify_workfn(struct work_struct *work)
 		if (!inode)
 			continue;
 
-		name = (struct qstr)QSTR_INIT(kn->name, strlen(kn->name));
+		kn_name = kernfs_rcu_name(kn);
+		name = (struct qstr)QSTR_INIT(kn_name, strlen(kn_name));
 		parent = kernfs_get_parent(kn);
 		if (parent) {
 			p_inode = ilookup(info->sb, kernfs_ino(parent));
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index c43bee18b79f7..40a2a9cd819d0 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -107,6 +107,11 @@ static inline bool kernfs_root_is_locked(const struct kernfs_node *kn)
 	return lockdep_is_held(&kernfs_root(kn)->kernfs_rwsem);
 }
 
+static inline const char *kernfs_rcu_name(const struct kernfs_node *kn)
+{
+	return rcu_dereference_check(kn->name, kernfs_root_is_locked(kn));
+}
+
 static inline struct kernfs_node *kernfs_parent(const struct kernfs_node *kn)
 {
 	/*
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 2252b16e6ef0b..d1f512b7bf867 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -231,6 +231,7 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
 	do {
 		struct dentry *dtmp;
 		struct kernfs_node *kntmp;
+		const char *name;
 
 		if (kn == knparent)
 			return dentry;
@@ -239,8 +240,8 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
 			dput(dentry);
 			return ERR_PTR(-EINVAL);
 		}
-		dtmp = lookup_positive_unlocked(kntmp->name, dentry,
-					       strlen(kntmp->name));
+		name = rcu_dereference(kntmp->name);
+		dtmp = lookup_positive_unlocked(name, dentry, strlen(name));
 		dput(dentry);
 		if (IS_ERR(dtmp))
 			return dtmp;
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 05c62ca93c53d..0bd8a2143723d 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -81,7 +81,7 @@ static int kernfs_get_target_path(struct kernfs_node *parent,
 	/* determine end of target string for reverse fillup */
 	kn = target;
 	while (kernfs_parent(kn) && kn != base) {
-		len += strlen(kn->name) + 1;
+		len += strlen(kernfs_rcu_name(kn)) + 1;
 		kn = kernfs_parent(kn);
 	}
 
@@ -95,10 +95,11 @@ static int kernfs_get_target_path(struct kernfs_node *parent,
 	/* reverse fillup of target string from target to base */
 	kn = target;
 	while (kernfs_parent(kn) && kn != base) {
-		int slen = strlen(kn->name);
+		const char *name = kernfs_rcu_name(kn);
+		int slen = strlen(name);
 
 		len -= slen;
-		memcpy(s + len, kn->name, slen);
+		memcpy(s + len, name, slen);
 		if (len)
 			s[--len] = '/';
 
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 4df2afa551dc6..94e12efd92f21 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -123,7 +123,7 @@ int sysfs_move_dir_ns(struct kobject *kobj, struct kobject *new_parent_kobj,
 	new_parent = new_parent_kobj && new_parent_kobj->sd ?
 		new_parent_kobj->sd : sysfs_root_kn;
 
-	return kernfs_rename_ns(kn, new_parent, kn->name, new_ns);
+	return kernfs_rename_ns(kn, new_parent, NULL, new_ns);
 }
 
 /**
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 5dda9a268e44c..b5a5f32fdfd1a 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -204,8 +204,8 @@ struct kernfs_node {
 	 * never moved to a different parent, it is safe to access the
 	 * parent directly.
 	 */
-	const char		*name;
 	struct kernfs_node	__rcu *__parent;
+	const char		__rcu *name;
 
 	struct rb_node		rb;
 
@@ -400,7 +400,7 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 }
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
-int kernfs_path_from_node(struct kernfs_node *root_kn, struct kernfs_node *kn,
+int kernfs_path_from_node(struct kernfs_node *kn_to, struct kernfs_node *kn_from,
 			  char *buf, size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
 void pr_cont_kernfs_path(struct kernfs_node *kn);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 171dd7fceac54..9e92057d2caac 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3583,10 +3583,13 @@ static int selinux_kernfs_init_security(struct kernfs_node *kn_dir,
 		newsid = tsec->create_sid;
 	} else {
 		u16 secclass = inode_mode_to_security_class(kn->mode);
+		const char *kn_name;
 		struct qstr q;
 
-		q.name = kn->name;
-		q.hash_len = hashlen_string(kn_dir, kn->name);
+		/* kn is fresh, can't be renamed, name goes not away */
+		kn_name = rcu_dereference_check(kn->name, true);
+		q.name = kn_name;
+		q.hash_len = hashlen_string(kn_dir, kn_name);
 
 		rc = security_transition_sid(tsec->sid,
 					     parent_sid, secclass, &q,
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 5/6] kernfs: Use RCU to access kernfs_node::parent.
  2025-01-28  8:42 ` [PATCH v5 5/6] kernfs: Use RCU to access kernfs_node::parent Sebastian Andrzej Siewior
@ 2025-01-28 20:31   ` Tejun Heo
  2025-01-29 13:23     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2025-01-28 20:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: cgroups, linux-kernel, Michal Koutný, Paul E. McKenney,
	Boqun Feng, Greg Kroah-Hartman, Hillf Danton, Johannes Weiner,
	Marco Elver, tglx

Hello,

Mostly look great to me. Left mostly minor comments.

On Tue, Jan 28, 2025 at 09:42:25AM +0100, Sebastian Andrzej Siewior wrote:
> @@ -947,10 +947,20 @@ static int rdt_last_cmd_status_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +static void *rdt_get_kn_parent_priv(struct kernfs_node *kn)
> +{

nit: Rename rdt_kn_parent_priv() to be consistent with other accessors?

> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 5a1fea414996e..16d268345e3b7 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -64,9 +64,9 @@ static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
>  {
>  	size_t depth = 0;
>  
> -	while (to->parent && to != from) {
> +	while (rcu_dereference(to->__parent) && to != from) {

Why not use kernfs_parent() here and other places?

> @@ -226,6 +227,7 @@ int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
>  	unsigned long flags;
>  	int ret;
>  
> +	guard(rcu)();

Doesn't irqsave imply rcu?

> @@ -558,11 +567,7 @@ void kernfs_put(struct kernfs_node *kn)
>  		return;
>  	root = kernfs_root(kn);
>   repeat:
> -	/*
> -	 * Moving/renaming is always done while holding reference.
> -	 * kn->parent won't change beneath us.
> -	 */
> -	parent = kn->parent;
> +	parent = kernfs_parent(kn);

Not a strong opinion but I'd keep the comment. Reader can go read the
definition of kernfs_parent() but no harm in explaining the subtlety where
it's used.

> @@ -1376,7 +1388,7 @@ static void kernfs_activate_one(struct kernfs_node *kn)
>  	if (kernfs_active(kn) || (kn->flags & (KERNFS_HIDDEN | KERNFS_REMOVING)))
>  		return;
>  
> -	WARN_ON_ONCE(kn->parent && RB_EMPTY_NODE(&kn->rb));
> +	WARN_ON_ONCE(kernfs_parent(kn) && RB_EMPTY_NODE(&kn->rb));

Minor but this one can be rcu_access_pointer() too.

> @@ -1794,7 +1813,7 @@ static struct kernfs_node *kernfs_dir_pos(const void *ns,
>  {
>  	if (pos) {
>  		int valid = kernfs_active(pos) &&
> -			pos->parent == parent && hash == pos->hash;
> +			kernfs_parent(pos) == parent && hash == pos->hash;

Ditto with rcu_access_pointer(). Using kernfs_parent() here is fine too but
it's a bit messy to mix the two for similar cases. Let's stick to either
rcu_access_pointer() or kernfs_parent().

> diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
> index b42ee6547cdc1..c43bee18b79f7 100644
> --- a/fs/kernfs/kernfs-internal.h
> +++ b/fs/kernfs/kernfs-internal.h
> @@ -64,11 +66,14 @@ struct kernfs_root {
>   *
>   * Return: the kernfs_root @kn belongs to.
>   */
> -static inline struct kernfs_root *kernfs_root(struct kernfs_node *kn)
> +static inline struct kernfs_root *kernfs_root(const struct kernfs_node *kn)
>  {
> +	const struct kernfs_node *knp;
>  	/* if parent exists, it's always a dir; otherwise, @sd is a dir */
> -	if (kn->parent)
> -		kn = kn->parent;
> +	guard(rcu)();
> +	knp = rcu_dereference(kn->__parent);
> +	if (knp)
> +		kn = knp;
>  	return kn->dir.root;
>  }

This isn't a new problem but the addition of the rcu guard makes it stick
out more: What keeps the returned root safe to dereference?

> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index d9061bd55436b..214aa378936cd 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -633,9 +633,22 @@ int cgroup_task_count(const struct cgroup *cgrp)
>  	return count;
>  }
>  
> +static struct cgroup *kn_get_priv(struct kernfs_node *kn)
> +{
> +	struct kernfs_node *parent;
> +	/*
> +	 * The parent can not be replaced due to KERNFS_ROOT_INVARIANT_PARENT.
> +	 * Therefore it is always safe to dereference this pointer outside of a
> +	 * RCU section.
> +	 */
> +	parent = rcu_dereference_check(kn->__parent,
> +				       kernfs_root_flags(kn) & KERNFS_ROOT_INVARIANT_PARENT);
> +	return parent->priv;
> +}

kn_priv()?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 6/6] kernfs: Use RCU to access kernfs_node::name.
  2025-01-28  8:42 ` [PATCH v5 6/6] kernfs: Use RCU to access kernfs_node::name Sebastian Andrzej Siewior
@ 2025-01-28 20:40   ` Tejun Heo
  2025-01-29 14:21     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2025-01-28 20:40 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: cgroups, linux-kernel, Michal Koutný, Paul E. McKenney,
	Boqun Feng, Greg Kroah-Hartman, Hillf Danton, Johannes Weiner,
	Marco Elver, tglx, syzbot+6ea37e2e6ffccf41a7e6

On Tue, Jan 28, 2025 at 09:42:26AM +0100, Sebastian Andrzej Siewior wrote:
> Using RCU lifetime rules to access kernfs_node::name can avoid the
> trouble kernfs_rename_lock in kernfs_name() and kernfs_path_from_node()
> if the fs was created with KERNFS_ROOT_INVARIANT_PARENT. This is useful
> as it allows to implement kernfs_path_from_node() only with RCU
> protection and avoiding kernfs_rename_lock. The lock is only required if
> the __parent node can be changed and the function requires an unchanged
> hierarchy while it iterates from the node to its parent.

A short mention of how avoiding kernfs_rename_lock matters would be great -
ie. where did this show up?

> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 955999aecfca9..cc83dbc70a8ef 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -505,6 +505,11 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
>  
>  extern struct mutex rdtgroup_mutex;
>  
> +static inline const char *rdt_kn_get_name(const struct kernfs_node *kn)
> +{
> +	return rcu_dereference_check(kn->name, lockdep_is_held(&rdtgroup_mutex));
> +}

Maybe rdt_kn_name()?

Other than those,

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 5/6] kernfs: Use RCU to access kernfs_node::parent.
  2025-01-28 20:31   ` Tejun Heo
@ 2025-01-29 13:23     ` Sebastian Andrzej Siewior
  2025-01-29 16:54       ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-29 13:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups, linux-kernel, Michal Koutný, Paul E. McKenney,
	Boqun Feng, Greg Kroah-Hartman, Hillf Danton, Johannes Weiner,
	Marco Elver, tglx

On 2025-01-28 10:31:47 [-1000], Tejun Heo wrote:
> Hello,
Hi,

> Mostly look great to me. Left mostly minor comments.
> 
> On Tue, Jan 28, 2025 at 09:42:25AM +0100, Sebastian Andrzej Siewior wrote:
> > @@ -947,10 +947,20 @@ static int rdt_last_cmd_status_show(struct kernfs_open_file *of,
> >  	return 0;
> >  }
> >  
> > +static void *rdt_get_kn_parent_priv(struct kernfs_node *kn)
> > +{
> 
> nit: Rename rdt_kn_parent_priv() to be consistent with other accessors?

Oh, indeed.

> > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> > index 5a1fea414996e..16d268345e3b7 100644
> > --- a/fs/kernfs/dir.c
> > +++ b/fs/kernfs/dir.c
> > @@ -64,9 +64,9 @@ static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
> >  {
> >  	size_t depth = 0;
> >  
> > -	while (to->parent && to != from) {
> > +	while (rcu_dereference(to->__parent) && to != from) {
> 
> Why not use kernfs_parent() here and other places?

Because it is from within RCU section and the other checks are not
required. If you prefer this instead, I sure can update it.

> > @@ -226,6 +227,7 @@ int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
> >  	unsigned long flags;
> >  	int ret;
> >  
> > +	guard(rcu)();
> 
> Doesn't irqsave imply rcu?

hmm. It kind of does based on the current implementation but it is not
obvious. We had RCU-sched and RCU which got merged. From then on, the
(implied) preempt-off part of IRQSAVE should imply RCU (section).
It is good to be obvious about RCU.
Also, rcu_dereference() will complain about missing RCU annotation. On
PREEMPT_RT rcu_dereference_sched() will complain because irqsave (in
this case) will not disable interrupts.

> > @@ -558,11 +567,7 @@ void kernfs_put(struct kernfs_node *kn)
> >  		return;
> >  	root = kernfs_root(kn);
> >   repeat:
> > -	/*
> > -	 * Moving/renaming is always done while holding reference.
> > -	 * kn->parent won't change beneath us.
> > -	 */
> > -	parent = kn->parent;
> > +	parent = kernfs_parent(kn);
> 
> Not a strong opinion but I'd keep the comment. Reader can go read the
> definition of kernfs_parent() but no harm in explaining the subtlety where
> it's used.

Okay. will bring it back.

> > @@ -1376,7 +1388,7 @@ static void kernfs_activate_one(struct kernfs_node *kn)
> >  	if (kernfs_active(kn) || (kn->flags & (KERNFS_HIDDEN | KERNFS_REMOVING)))
> >  		return;
> >  
> > -	WARN_ON_ONCE(kn->parent && RB_EMPTY_NODE(&kn->rb));
> > +	WARN_ON_ONCE(kernfs_parent(kn) && RB_EMPTY_NODE(&kn->rb));
> 
> Minor but this one can be rcu_access_pointer() too.
ok.

> > @@ -1794,7 +1813,7 @@ static struct kernfs_node *kernfs_dir_pos(const void *ns,
> >  {
> >  	if (pos) {
> >  		int valid = kernfs_active(pos) &&
> > -			pos->parent == parent && hash == pos->hash;
> > +			kernfs_parent(pos) == parent && hash == pos->hash;
> 
> Ditto with rcu_access_pointer(). Using kernfs_parent() here is fine too but
> it's a bit messy to mix the two for similar cases. Let's stick to either
> rcu_access_pointer() or kernfs_parent().

I make both (kernfs_activate_one() and kernfs_dir_pos) use
rcu_access_pointer() then.

> > diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
> > index b42ee6547cdc1..c43bee18b79f7 100644
> > --- a/fs/kernfs/kernfs-internal.h
> > +++ b/fs/kernfs/kernfs-internal.h
> > @@ -64,11 +66,14 @@ struct kernfs_root {
> >   *
> >   * Return: the kernfs_root @kn belongs to.
> >   */
> > -static inline struct kernfs_root *kernfs_root(struct kernfs_node *kn)
> > +static inline struct kernfs_root *kernfs_root(const struct kernfs_node *kn)
> >  {
> > +	const struct kernfs_node *knp;
> >  	/* if parent exists, it's always a dir; otherwise, @sd is a dir */
> > -	if (kn->parent)
> > -		kn = kn->parent;
> > +	guard(rcu)();
> > +	knp = rcu_dereference(kn->__parent);
> > +	if (knp)
> > +		kn = knp;
> >  	return kn->dir.root;
> >  }
> 
> This isn't a new problem but the addition of the rcu guard makes it stick
> out more: What keeps the returned root safe to dereference?

As far as I understand it kernfs_root is around as long as the
filesystem itself is around which means at least one node needs to stay.
If you have a pointer to a kernfs_node you should own a reference.
The RCU section is only needed to ensure that the (current) __parent is
not replaced and then deallocated before the caller had a chance to
obtain the root pointer.

> > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> > index d9061bd55436b..214aa378936cd 100644
> > --- a/kernel/cgroup/cgroup.c
> > +++ b/kernel/cgroup/cgroup.c
> > @@ -633,9 +633,22 @@ int cgroup_task_count(const struct cgroup *cgrp)
> >  	return count;
> >  }
> >  
> > +static struct cgroup *kn_get_priv(struct kernfs_node *kn)
> > +{
> > +	struct kernfs_node *parent;
> > +	/*
> > +	 * The parent can not be replaced due to KERNFS_ROOT_INVARIANT_PARENT.
> > +	 * Therefore it is always safe to dereference this pointer outside of a
> > +	 * RCU section.
> > +	 */
> > +	parent = rcu_dereference_check(kn->__parent,
> > +				       kernfs_root_flags(kn) & KERNFS_ROOT_INVARIANT_PARENT);
> > +	return parent->priv;
> > +}
> 
> kn_priv()?

Oh, yes.

> Thanks.

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 6/6] kernfs: Use RCU to access kernfs_node::name.
  2025-01-28 20:40   ` Tejun Heo
@ 2025-01-29 14:21     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-29 14:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups, linux-kernel, Michal Koutný, Paul E. McKenney,
	Boqun Feng, Greg Kroah-Hartman, Hillf Danton, Johannes Weiner,
	Marco Elver, tglx, syzbot+6ea37e2e6ffccf41a7e6

On 2025-01-28 10:40:09 [-1000], Tejun Heo wrote:
> On Tue, Jan 28, 2025 at 09:42:26AM +0100, Sebastian Andrzej Siewior wrote:
> > Using RCU lifetime rules to access kernfs_node::name can avoid the
> > trouble kernfs_rename_lock in kernfs_name() and kernfs_path_from_node()
> > if the fs was created with KERNFS_ROOT_INVARIANT_PARENT. This is useful
> > as it allows to implement kernfs_path_from_node() only with RCU
> > protection and avoiding kernfs_rename_lock. The lock is only required if
> > the __parent node can be changed and the function requires an unchanged
> > hierarchy while it iterates from the node to its parent.
> 
> A short mention of how avoiding kernfs_rename_lock matters would be great -
> ie. where did this show up?

I extended it:
| Using RCU lifetime rules to access kernfs_node::name can avoid the
| trouble with kernfs_rename_lock in kernfs_name() and kernfs_path_from_node()
| if the fs was created with KERNFS_ROOT_INVARIANT_PARENT. This is usefull
| as it allows to implement kernfs_path_from_node() only with RCU
| protection and avoiding kernfs_rename_lock. The lock is only required if
| the __parent node can be changed and the function requires an unchanged
| hierarchy while it iterates from the node to its parent.
starting here->
| The change is needed to allow the lookup of the node's path
| (kernfs_path_from_node()) from context which runs always with disabled
| preemption and or interrutps even on PREEMPT_RT. The problem is that
| kernfs_rename_lock becomes a sleeping lock on PREEMPT_RT.

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 5/6] kernfs: Use RCU to access kernfs_node::parent.
  2025-01-29 13:23     ` Sebastian Andrzej Siewior
@ 2025-01-29 16:54       ` Tejun Heo
  2025-01-29 20:26         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2025-01-29 16:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: cgroups, linux-kernel, Michal Koutný, Paul E. McKenney,
	Boqun Feng, Greg Kroah-Hartman, Hillf Danton, Johannes Weiner,
	Marco Elver, tglx

Hello,

On Wed, Jan 29, 2025 at 02:23:11PM +0100, Sebastian Andrzej Siewior wrote:
> > > @@ -64,9 +64,9 @@ static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
> > >  {
> > >  	size_t depth = 0;
> > >  
> > > -	while (to->parent && to != from) {
> > > +	while (rcu_dereference(to->__parent) && to != from) {
> > 
> > Why not use kernfs_parent() here and other places?
> 
> Because it is from within RCU section and the other checks are not
> required. If you prefer this instead, I sure can update it.

Hmm... I would have gone with using the same accessor everywhere but am not
sure how strongly I feel about it. I don't think it's useful to worry about
the overhead of the extra lockdep annotations in debug builds. Ignoring that
and just considering code readability, what would you do?

> > > @@ -226,6 +227,7 @@ int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
> > >  	unsigned long flags;
> > >  	int ret;
> > >  
> > > +	guard(rcu)();
> > 
> > Doesn't irqsave imply rcu?
> 
> hmm. It kind of does based on the current implementation but it is not
> obvious. We had RCU-sched and RCU which got merged. From then on, the
> (implied) preempt-off part of IRQSAVE should imply RCU (section).
> It is good to be obvious about RCU.

There's that but it can also be confusing to have redundant annotations
especially because redundant things tend to become really inconsistent over
time. After the RCU type merges, ISTR removing annotations that became
redundant in several places.

> Also, rcu_dereference() will complain about missing RCU annotation. On
> PREEMPT_RT rcu_dereference_sched() will complain because irqsave (in
> this case) will not disable interrupts.

You know this a lot better than I do. If it's necessary for RT builds, it's
not redundant.

> > > diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
> > > index b42ee6547cdc1..c43bee18b79f7 100644
> > > --- a/fs/kernfs/kernfs-internal.h
> > > +++ b/fs/kernfs/kernfs-internal.h
> > > @@ -64,11 +66,14 @@ struct kernfs_root {
> > >   *
> > >   * Return: the kernfs_root @kn belongs to.
> > >   */
> > > -static inline struct kernfs_root *kernfs_root(struct kernfs_node *kn)
> > > +static inline struct kernfs_root *kernfs_root(const struct kernfs_node *kn)
> > >  {
> > > +	const struct kernfs_node *knp;
> > >  	/* if parent exists, it's always a dir; otherwise, @sd is a dir */
> > > -	if (kn->parent)
> > > -		kn = kn->parent;
> > > +	guard(rcu)();
> > > +	knp = rcu_dereference(kn->__parent);
> > > +	if (knp)
> > > +		kn = knp;
> > >  	return kn->dir.root;
> > >  }
> > 
> > This isn't a new problem but the addition of the rcu guard makes it stick
> > out more: What keeps the returned root safe to dereference?
> 
> As far as I understand it kernfs_root is around as long as the
> filesystem itself is around which means at least one node needs to stay.
> If you have a pointer to a kernfs_node you should own a reference.
> The RCU section is only needed to ensure that the (current) __parent is
> not replaced and then deallocated before the caller had a chance to
> obtain the root pointer.

That sounds reasonable to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 5/6] kernfs: Use RCU to access kernfs_node::parent.
  2025-01-29 16:54       ` Tejun Heo
@ 2025-01-29 20:26         ` Sebastian Andrzej Siewior
  2025-01-29 20:29           ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-29 20:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups, linux-kernel, Michal Koutný, Paul E. McKenney,
	Boqun Feng, Greg Kroah-Hartman, Hillf Danton, Johannes Weiner,
	Marco Elver, tglx

On 2025-01-29 06:54:33 [-1000], Tejun Heo wrote:
> Hello,
> 
> On Wed, Jan 29, 2025 at 02:23:11PM +0100, Sebastian Andrzej Siewior wrote:
> > > > @@ -64,9 +64,9 @@ static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
> > > >  {
> > > >  	size_t depth = 0;
> > > >  
> > > > -	while (to->parent && to != from) {
> > > > +	while (rcu_dereference(to->__parent) && to != from) {
> > > 
> > > Why not use kernfs_parent() here and other places?
> > 
> > Because it is from within RCU section and the other checks are not
> > required. If you prefer this instead, I sure can update it.
> 
> Hmm... I would have gone with using the same accessor everywhere but am not
> sure how strongly I feel about it. I don't think it's useful to worry about
> the overhead of the extra lockdep annotations in debug builds. Ignoring that
> and just considering code readability, what would you do?

It is your call. I would prefer to open code that part that we do only
rely on RCU here but sure understand that you don't care about the
details and want to have only one accessor.

> > > > @@ -226,6 +227,7 @@ int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
> > > >  	unsigned long flags;
> > > >  	int ret;
> > > >  
> > > > +	guard(rcu)();
> > > 
> > > Doesn't irqsave imply rcu?
> > Also, rcu_dereference() will complain about missing RCU annotation. On
> > PREEMPT_RT rcu_dereference_sched() will complain because irqsave (in
> > this case) will not disable interrupts.
> 
> You know this a lot better than I do. If it's necessary for RT builds, it's
> not redundant.

Good.

> Thanks.

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 5/6] kernfs: Use RCU to access kernfs_node::parent.
  2025-01-29 20:26         ` Sebastian Andrzej Siewior
@ 2025-01-29 20:29           ` Tejun Heo
  0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2025-01-29 20:29 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: cgroups, linux-kernel, Michal Koutný, Paul E. McKenney,
	Boqun Feng, Greg Kroah-Hartman, Hillf Danton, Johannes Weiner,
	Marco Elver, tglx

On Wed, Jan 29, 2025 at 09:26:14PM +0100, Sebastian Andrzej Siewior wrote:
...
> > Hmm... I would have gone with using the same accessor everywhere but am not
> > sure how strongly I feel about it. I don't think it's useful to worry about
> > the overhead of the extra lockdep annotations in debug builds. Ignoring that
> > and just considering code readability, what would you do?
> 
> It is your call. I would prefer to open code that part that we do only
> rely on RCU here but sure understand that you don't care about the
> details and want to have only one accessor.

It's kinda nitpicky. Let's just keep what you did.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-01-29 20:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-28  8:42 [PATCH v5 0/6] kernfs: Use RCU to access kernfs_node::{parent|name} Sebastian Andrzej Siewior
2025-01-28  8:42 ` [PATCH v5 1/6] kernfs: Acquire kernfs_rwsem in kernfs_notify_workfn() Sebastian Andrzej Siewior
2025-01-28  8:42 ` [PATCH v5 2/6] kernfs: Acquire kernfs_rwsem in kernfs_get_parent_dentry() Sebastian Andrzej Siewior
2025-01-28  8:42 ` [PATCH v5 3/6] kernfs: Acquire kernfs_rwsem in kernfs_node_dentry() Sebastian Andrzej Siewior
2025-01-28  8:42 ` [PATCH v5 4/6] kernfs: Don't re-lock kernfs_root::kernfs_rwsem in kernfs_fop_readdir() Sebastian Andrzej Siewior
2025-01-28  8:42 ` [PATCH v5 5/6] kernfs: Use RCU to access kernfs_node::parent Sebastian Andrzej Siewior
2025-01-28 20:31   ` Tejun Heo
2025-01-29 13:23     ` Sebastian Andrzej Siewior
2025-01-29 16:54       ` Tejun Heo
2025-01-29 20:26         ` Sebastian Andrzej Siewior
2025-01-29 20:29           ` Tejun Heo
2025-01-28  8:42 ` [PATCH v5 6/6] kernfs: Use RCU to access kernfs_node::name Sebastian Andrzej Siewior
2025-01-28 20:40   ` Tejun Heo
2025-01-29 14:21     ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).