public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU
@ 2024-01-09 21:48 Tejun Heo
  2024-01-09 21:48 ` [PATCH 1/3] Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock" Tejun Heo
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Tejun Heo @ 2024-01-09 21:48 UTC (permalink / raw)
  To: gregkh; +Cc: andrea.righi, ast, linux-kernel, geert

The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id()
which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock.
kernfs_idr_lock used to be a non-irq-safe lock which could lead to deadlocks
as bpf_cgroup_from_id() can be called from any BPF programs including e.g.
the ones that attach to functions which are holding the scheduler rq lock.

To resolve the situation dad3fb67ca1c ("kernfs: convert kernfs_idr_lock to
an irq safe raw spinlock") converted kernfs_idr_lock to an irq-safe raw
spinlock. However, this was also broken as we call idr_alloc*() while
holding the lock and idr itself uses an non-irq-safe lock and also calls
into memory allocator.

Let's instead RCU protect kernfs_node and kernfs_root so that
kernfs_find_and_get_node_by_id() can use rcu_read_lock() instead of
kernfs_idr_lock. While this unfortunately increases the size of kernfs_node,
it's the most straightforward thing to do and there likely are other places
that can take advantage of RCU protection and improve scalability too.

Please see the patch descriptions for more details.

This patchset is on top of the current driver-core-next - dad3fb67ca1c
("kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"), and also
available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git kernfs-use-rcu

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"
  2024-01-09 21:48 [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU Tejun Heo
@ 2024-01-09 21:48 ` Tejun Heo
  2024-01-09 21:48 ` [PATCH 2/3] kernfs: Rearrange kernfs_node fields to reduce its size on 64bit Tejun Heo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2024-01-09 21:48 UTC (permalink / raw)
  To: gregkh; +Cc: andrea.righi, ast, linux-kernel, geert, Tejun Heo

This reverts commit dad3fb67ca1cbef87ce700e83a55835e5921ce8a.

The commit converted kernfs_idr_lock to an IRQ-safe raw_spinlock because it
could be acquired while holding an rq lock through bpf_cgroup_from_id().
However, kernfs_idr_lock is held while doing GPF_NOWAIT allocations which
involves acquiring an non-IRQ-safe and non-raw lock leading to the following
lockdep warning:

  =============================
  [ BUG: Invalid wait context ]
  6.7.0-rc5-kzm9g-00251-g655022a45b1c #578 Not tainted
  -----------------------------
  swapper/0/0 is trying to lock:
  dfbcd488 (&c->lock){....}-{3:3}, at: local_lock_acquire+0x0/0xa4
  other info that might help us debug this:
  context-{5:5}
  2 locks held by swapper/0/0:
   #0: dfbc9c60 (lock){+.+.}-{3:3}, at: local_lock_acquire+0x0/0xa4
   #1: c0c012a8 (kernfs_idr_lock){....}-{2:2}, at: __kernfs_new_node.constprop.0+0x68/0x258
  stack backtrace:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc5-kzm9g-00251-g655022a45b1c #578
  Hardware name: Generic SH73A0 (Flattened Device Tree)
   unwind_backtrace from show_stack+0x10/0x14
   show_stack from dump_stack_lvl+0x68/0x90
   dump_stack_lvl from __lock_acquire+0x3cc/0x168c
   __lock_acquire from lock_acquire+0x274/0x30c
   lock_acquire from local_lock_acquire+0x28/0xa4
   local_lock_acquire from ___slab_alloc+0x234/0x8a8
   ___slab_alloc from __slab_alloc.constprop.0+0x30/0x44
   __slab_alloc.constprop.0 from kmem_cache_alloc+0x7c/0x148
   kmem_cache_alloc from radix_tree_node_alloc.constprop.0+0x44/0xdc
   radix_tree_node_alloc.constprop.0 from idr_get_free+0x110/0x2b8
   idr_get_free from idr_alloc_u32+0x9c/0x108
   idr_alloc_u32 from idr_alloc_cyclic+0x50/0xb8
   idr_alloc_cyclic from __kernfs_new_node.constprop.0+0x88/0x258
   __kernfs_new_node.constprop.0 from kernfs_create_root+0xbc/0x154
   kernfs_create_root from sysfs_init+0x18/0x5c
   sysfs_init from mnt_init+0xc4/0x220
   mnt_init from vfs_caches_init+0x6c/0x88
   vfs_caches_init from start_kernel+0x474/0x528
   start_kernel from 0x0

Let's rever the commit. It's undesirable to spread out raw spinlock usage
anyway and the problem can be solved by protecting the lookup path with RCU
instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrea Righi <andrea.righi@canonical.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/CAMuHMdV=AKt+mwY7svEq5gFPx41LoSQZ_USME5_MEdWQze13ww@mail.gmail.com
---
 fs/kernfs/dir.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index a62960fdf73f..bce1d7ac95ca 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -27,7 +27,7 @@ static DEFINE_RWLOCK(kernfs_rename_lock);	/* kn->parent and ->name */
  */
 static DEFINE_SPINLOCK(kernfs_pr_cont_lock);
 static char kernfs_pr_cont_buf[PATH_MAX];	/* protected by pr_cont_lock */
-static DEFINE_RAW_SPINLOCK(kernfs_idr_lock);	/* root->ino_idr */
+static DEFINE_SPINLOCK(kernfs_idr_lock);	/* root->ino_idr */
 
 #define rb_to_kn(X) rb_entry((X), struct kernfs_node, rb)
 
@@ -539,7 +539,6 @@ void kernfs_put(struct kernfs_node *kn)
 {
 	struct kernfs_node *parent;
 	struct kernfs_root *root;
-	unsigned long flags;
 
 	if (!kn || !atomic_dec_and_test(&kn->count))
 		return;
@@ -564,9 +563,9 @@ void kernfs_put(struct kernfs_node *kn)
 		simple_xattrs_free(&kn->iattr->xattrs, NULL);
 		kmem_cache_free(kernfs_iattrs_cache, kn->iattr);
 	}
-	raw_spin_lock_irqsave(&kernfs_idr_lock, flags);
+	spin_lock(&kernfs_idr_lock);
 	idr_remove(&root->ino_idr, (u32)kernfs_ino(kn));
-	raw_spin_unlock_irqrestore(&kernfs_idr_lock, flags);
+	spin_unlock(&kernfs_idr_lock);
 	kmem_cache_free(kernfs_node_cache, kn);
 
 	kn = parent;
@@ -608,7 +607,6 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 	struct kernfs_node *kn;
 	u32 id_highbits;
 	int ret;
-	unsigned long irqflags;
 
 	name = kstrdup_const(name, GFP_KERNEL);
 	if (!name)
@@ -619,13 +617,13 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 		goto err_out1;
 
 	idr_preload(GFP_KERNEL);
-	raw_spin_lock_irqsave(&kernfs_idr_lock, irqflags);
+	spin_lock(&kernfs_idr_lock);
 	ret = idr_alloc_cyclic(&root->ino_idr, kn, 1, 0, GFP_ATOMIC);
 	if (ret >= 0 && ret < root->last_id_lowbits)
 		root->id_highbits++;
 	id_highbits = root->id_highbits;
 	root->last_id_lowbits = ret;
-	raw_spin_unlock_irqrestore(&kernfs_idr_lock, irqflags);
+	spin_unlock(&kernfs_idr_lock);
 	idr_preload_end();
 	if (ret < 0)
 		goto err_out2;
@@ -661,9 +659,9 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 	return kn;
 
  err_out3:
-	raw_spin_lock_irqsave(&kernfs_idr_lock, irqflags);
+	spin_lock(&kernfs_idr_lock);
 	idr_remove(&root->ino_idr, (u32)kernfs_ino(kn));
-	raw_spin_unlock_irqrestore(&kernfs_idr_lock, irqflags);
+	spin_unlock(&kernfs_idr_lock);
  err_out2:
 	kmem_cache_free(kernfs_node_cache, kn);
  err_out1:
@@ -716,9 +714,8 @@ struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root,
 	struct kernfs_node *kn;
 	ino_t ino = kernfs_id_ino(id);
 	u32 gen = kernfs_id_gen(id);
-	unsigned long flags;
 
-	raw_spin_lock_irqsave(&kernfs_idr_lock, flags);
+	spin_lock(&kernfs_idr_lock);
 
 	kn = idr_find(&root->ino_idr, (u32)ino);
 	if (!kn)
@@ -742,10 +739,10 @@ struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root,
 	if (unlikely(!__kernfs_active(kn) || !atomic_inc_not_zero(&kn->count)))
 		goto err_unlock;
 
-	raw_spin_unlock_irqrestore(&kernfs_idr_lock, flags);
+	spin_unlock(&kernfs_idr_lock);
 	return kn;
 err_unlock:
-	raw_spin_unlock_irqrestore(&kernfs_idr_lock, flags);
+	spin_unlock(&kernfs_idr_lock);
 	return NULL;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] kernfs: Rearrange kernfs_node fields to reduce its size on 64bit
  2024-01-09 21:48 [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU Tejun Heo
  2024-01-09 21:48 ` [PATCH 1/3] Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock" Tejun Heo
@ 2024-01-09 21:48 ` Tejun Heo
  2024-01-10 15:18   ` Geert Uytterhoeven
  2024-01-10 18:28   ` [PATCH v2 " Tejun Heo
  2024-01-09 21:48 ` [PATCH 3/3] kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id() Tejun Heo
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 10+ messages in thread
From: Tejun Heo @ 2024-01-09 21:48 UTC (permalink / raw)
  To: gregkh; +Cc: andrea.righi, ast, linux-kernel, geert, Tejun Heo

Moving .flags and .mode right below .hash makes kernfs_node smaller by 8
bytes on 64bit.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/kernfs.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 99aaa050ccb7..03c3fb83ab9e 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -206,6 +206,9 @@ struct kernfs_node {
 
 	const void		*ns;	/* namespace tag */
 	unsigned int		hash;	/* ns + name hash */
+	unsigned short		flags;
+	umode_t			mode;
+
 	union {
 		struct kernfs_elem_dir		dir;
 		struct kernfs_elem_symlink	symlink;
@@ -220,8 +223,6 @@ struct kernfs_node {
 	 */
 	u64			id;
 
-	unsigned short		flags;
-	umode_t			mode;
 	struct kernfs_iattrs	*iattr;
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id()
  2024-01-09 21:48 [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU Tejun Heo
  2024-01-09 21:48 ` [PATCH 1/3] Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock" Tejun Heo
  2024-01-09 21:48 ` [PATCH 2/3] kernfs: Rearrange kernfs_node fields to reduce its size on 64bit Tejun Heo
@ 2024-01-09 21:48 ` Tejun Heo
  2024-01-10  7:40 ` [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU Andrea Righi
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2024-01-09 21:48 UTC (permalink / raw)
  To: gregkh; +Cc: andrea.righi, ast, linux-kernel, geert, Tejun Heo

The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id()
which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock. This
can lead to deadlocks as bpf_cgroup_from_id() can be called from any BPF
programs including e.g. the ones that attach to functions which are holding
the scheduler rq lock.

Consider the following BPF program:

  SEC("fentry/__set_cpus_allowed_ptr_locked")
  int BPF_PROG(__set_cpus_allowed_ptr_locked, struct task_struct *p,
	       struct affinity_context *affn_ctx, struct rq *rq, struct rq_flags *rf)
  {
	  struct cgroup *cgrp = bpf_cgroup_from_id(p->cgroups->dfl_cgrp->kn->id);

	  if (cgrp) {
		  bpf_printk("%d[%s] in %s", p->pid, p->comm, cgrp->kn->name);
		  bpf_cgroup_release(cgrp);
	  }
	  return 0;
  }

__set_cpus_allowed_ptr_locked() is called with rq lock held and the above
BPF program calls bpf_cgroup_from_id() within leading to the following
lockdep warning:

  =====================================================
  WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
  6.7.0-rc3-work-00053-g07124366a1d7-dirty #147 Not tainted
  -----------------------------------------------------
  repro/1620 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
  ffffffff833b3688 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1e/0x70

		and this task is already holding:
  ffff888237ced698 (&rq->__lock){-.-.}-{2:2}, at: task_rq_lock+0x4e/0xf0
  which would create a new lock dependency:
   (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2}
  ...
   Possible interrupt unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(kernfs_idr_lock);
				 local_irq_disable();
				 lock(&rq->__lock);
				 lock(kernfs_idr_lock);
    <Interrupt>
      lock(&rq->__lock);

		 *** DEADLOCK ***
  ...
  Call Trace:
   dump_stack_lvl+0x55/0x70
   dump_stack+0x10/0x20
   __lock_acquire+0x781/0x2a40
   lock_acquire+0xbf/0x1f0
   _raw_spin_lock+0x2f/0x40
   kernfs_find_and_get_node_by_id+0x1e/0x70
   cgroup_get_from_id+0x21/0x240
   bpf_cgroup_from_id+0xe/0x20
   bpf_prog_98652316e9337a5a___set_cpus_allowed_ptr_locked+0x96/0x11a
   bpf_trampoline_6442545632+0x4f/0x1000
   __set_cpus_allowed_ptr_locked+0x5/0x5a0
   sched_setaffinity+0x1b3/0x290
   __x64_sys_sched_setaffinity+0x4f/0x60
   do_syscall_64+0x40/0xe0
   entry_SYSCALL_64_after_hwframe+0x46/0x4e

Let's fix it by protecting kernfs_node and kernfs_root with RCU and making
kernfs_find_and_get_node_by_id() acquire rcu_read_lock() instead of
kernfs_idr_lock.

This adds an rcu_head to kernfs_node making it larger by 16 bytes on 64bit.
Combined with the preceding rearrange patch, the net increase is 8 bytes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
---
 fs/kernfs/dir.c             | 31 ++++++++++++++++++++-----------
 fs/kernfs/kernfs-internal.h |  2 ++
 include/linux/kernfs.h      |  2 ++
 3 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index bce1d7ac95ca..458519e416fe 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -529,6 +529,20 @@ void kernfs_get(struct kernfs_node *kn)
 }
 EXPORT_SYMBOL_GPL(kernfs_get);
 
+static void kernfs_free_rcu(struct rcu_head *rcu)
+{
+	struct kernfs_node *kn = container_of(rcu, struct kernfs_node, rcu);
+
+	kfree_const(kn->name);
+
+	if (kn->iattr) {
+		simple_xattrs_free(&kn->iattr->xattrs, NULL);
+		kmem_cache_free(kernfs_iattrs_cache, kn->iattr);
+	}
+
+	kmem_cache_free(kernfs_node_cache, kn);
+}
+
 /**
  * kernfs_put - put a reference count on a kernfs_node
  * @kn: the target kernfs_node
@@ -557,16 +571,11 @@ void kernfs_put(struct kernfs_node *kn)
 	if (kernfs_type(kn) == KERNFS_LINK)
 		kernfs_put(kn->symlink.target_kn);
 
-	kfree_const(kn->name);
-
-	if (kn->iattr) {
-		simple_xattrs_free(&kn->iattr->xattrs, NULL);
-		kmem_cache_free(kernfs_iattrs_cache, kn->iattr);
-	}
 	spin_lock(&kernfs_idr_lock);
 	idr_remove(&root->ino_idr, (u32)kernfs_ino(kn));
 	spin_unlock(&kernfs_idr_lock);
-	kmem_cache_free(kernfs_node_cache, kn);
+
+	call_rcu(&kn->rcu, kernfs_free_rcu);
 
 	kn = parent;
 	if (kn) {
@@ -575,7 +584,7 @@ void kernfs_put(struct kernfs_node *kn)
 	} else {
 		/* just released the root kn, free @root too */
 		idr_destroy(&root->ino_idr);
-		kfree(root);
+		kfree_rcu(root, rcu);
 	}
 }
 EXPORT_SYMBOL_GPL(kernfs_put);
@@ -715,7 +724,7 @@ struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root,
 	ino_t ino = kernfs_id_ino(id);
 	u32 gen = kernfs_id_gen(id);
 
-	spin_lock(&kernfs_idr_lock);
+	rcu_read_lock();
 
 	kn = idr_find(&root->ino_idr, (u32)ino);
 	if (!kn)
@@ -739,10 +748,10 @@ struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root,
 	if (unlikely(!__kernfs_active(kn) || !atomic_inc_not_zero(&kn->count)))
 		goto err_unlock;
 
-	spin_unlock(&kernfs_idr_lock);
+	rcu_read_unlock();
 	return kn;
 err_unlock:
-	spin_unlock(&kernfs_idr_lock);
+	rcu_read_unlock();
 	return NULL;
 }
 
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index 237f2764b941..b42ee6547cdc 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -49,6 +49,8 @@ struct kernfs_root {
 	struct rw_semaphore	kernfs_rwsem;
 	struct rw_semaphore	kernfs_iattr_rwsem;
 	struct rw_semaphore	kernfs_supers_rwsem;
+
+	struct rcu_head		rcu;
 };
 
 /* +1 to avoid triggering overflow warning when negating it */
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 03c3fb83ab9e..05dcbae7ecbf 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -224,6 +224,8 @@ struct kernfs_node {
 	u64			id;
 
 	struct kernfs_iattrs	*iattr;
+
+	struct rcu_head		rcu;
 };
 
 /*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU
  2024-01-09 21:48 [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU Tejun Heo
                   ` (2 preceding siblings ...)
  2024-01-09 21:48 ` [PATCH 3/3] kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id() Tejun Heo
@ 2024-01-10  7:40 ` Andrea Righi
  2024-01-10 15:52 ` Greg KH
  2024-01-12 12:34 ` Geert Uytterhoeven
  5 siblings, 0 replies; 10+ messages in thread
From: Andrea Righi @ 2024-01-10  7:40 UTC (permalink / raw)
  To: Tejun Heo; +Cc: gregkh, ast, linux-kernel, geert

On Tue, Jan 09, 2024 at 11:48:01AM -1000, Tejun Heo wrote:
> The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id()
> which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock.
> kernfs_idr_lock used to be a non-irq-safe lock which could lead to deadlocks
> as bpf_cgroup_from_id() can be called from any BPF programs including e.g.
> the ones that attach to functions which are holding the scheduler rq lock.
> 
> To resolve the situation dad3fb67ca1c ("kernfs: convert kernfs_idr_lock to
> an irq safe raw spinlock") converted kernfs_idr_lock to an irq-safe raw
> spinlock. However, this was also broken as we call idr_alloc*() while
> holding the lock and idr itself uses an non-irq-safe lock and also calls
> into memory allocator.
> 
> Let's instead RCU protect kernfs_node and kernfs_root so that
> kernfs_find_and_get_node_by_id() can use rcu_read_lock() instead of
> kernfs_idr_lock. While this unfortunately increases the size of kernfs_node,
> it's the most straightforward thing to do and there likely are other places
> that can take advantage of RCU protection and improve scalability too.
> 
> Please see the patch descriptions for more details.
> 
> This patchset is on top of the current driver-core-next - dad3fb67ca1c
> ("kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"), and also
> available in the following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git kernfs-use-rcu
> 
> Thanks.
> 
> --
> tejun

Everything looks good to me and I can't trigger any oops with this one
applied. You can add my:

Tested-by: Andrea Righi <andrea.righi@canonical.com>

Thanks!
-Andrea

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] kernfs: Rearrange kernfs_node fields to reduce its size on 64bit
  2024-01-09 21:48 ` [PATCH 2/3] kernfs: Rearrange kernfs_node fields to reduce its size on 64bit Tejun Heo
@ 2024-01-10 15:18   ` Geert Uytterhoeven
  2024-01-10 18:26     ` Tejun Heo
  2024-01-10 18:28   ` [PATCH v2 " Tejun Heo
  1 sibling, 1 reply; 10+ messages in thread
From: Geert Uytterhoeven @ 2024-01-10 15:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: gregkh, andrea.righi, ast, linux-kernel

Hi Tejun,

On Tue, Jan 9, 2024 at 10:49 PM Tejun Heo <tj@kernel.org> wrote:
> Moving .flags and .mode right below .hash makes kernfs_node smaller by 8
> bytes on 64bit.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>

Thanks for your patch!

> --- a/include/linux/kernfs.h
> +++ b/include/linux/kernfs.h
> @@ -206,6 +206,9 @@ struct kernfs_node {
>
>         const void              *ns;    /* namespace tag */
>         unsigned int            hash;   /* ns + name hash */
> +       unsigned short          flags;
> +       umode_t                 mode;
> +
>         union {
>                 struct kernfs_elem_dir          dir;
>                 struct kernfs_elem_symlink      symlink;
> @@ -220,8 +223,6 @@ struct kernfs_node {
>          */
>         u64                     id;
>
> -       unsigned short          flags;
> -       umode_t                 mode;
>         struct kernfs_iattrs    *iattr;

Note that there is now a hole at the end of the structure on 32-bit
architectures
where the alignment of u64 is 8 bytes.

Hence, sizeof(struct kernfs_node) grew from 104 to 112 bytes on (at
least) arm32 and rv32.
It did shrink by 8 bytes on amd64, arm64, mips64, and rv64.
Size is unchanged on ia32, m68k and sh.

I didn't check any other architectures.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU
  2024-01-09 21:48 [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU Tejun Heo
                   ` (3 preceding siblings ...)
  2024-01-10  7:40 ` [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU Andrea Righi
@ 2024-01-10 15:52 ` Greg KH
  2024-01-12 12:34 ` Geert Uytterhoeven
  5 siblings, 0 replies; 10+ messages in thread
From: Greg KH @ 2024-01-10 15:52 UTC (permalink / raw)
  To: Tejun Heo; +Cc: andrea.righi, ast, linux-kernel, geert

On Tue, Jan 09, 2024 at 11:48:01AM -1000, Tejun Heo wrote:
> The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id()
> which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock.
> kernfs_idr_lock used to be a non-irq-safe lock which could lead to deadlocks
> as bpf_cgroup_from_id() can be called from any BPF programs including e.g.
> the ones that attach to functions which are holding the scheduler rq lock.
> 
> To resolve the situation dad3fb67ca1c ("kernfs: convert kernfs_idr_lock to
> an irq safe raw spinlock") converted kernfs_idr_lock to an irq-safe raw
> spinlock. However, this was also broken as we call idr_alloc*() while
> holding the lock and idr itself uses an non-irq-safe lock and also calls
> into memory allocator.
> 
> Let's instead RCU protect kernfs_node and kernfs_root so that
> kernfs_find_and_get_node_by_id() can use rcu_read_lock() instead of
> kernfs_idr_lock. While this unfortunately increases the size of kernfs_node,
> it's the most straightforward thing to do and there likely are other places
> that can take advantage of RCU protection and improve scalability too.
> 
> Please see the patch descriptions for more details.
> 
> This patchset is on top of the current driver-core-next - dad3fb67ca1c
> ("kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"), and also
> available in the following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git kernfs-use-rcu

Thanks, I'll take the first patch now, for -rc1, and then the other 2
after that to give some testing time.

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] kernfs: Rearrange kernfs_node fields to reduce its size on 64bit
  2024-01-10 15:18   ` Geert Uytterhoeven
@ 2024-01-10 18:26     ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2024-01-10 18:26 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: gregkh, andrea.righi, ast, linux-kernel

On Wed, Jan 10, 2024 at 04:18:36PM +0100, Geert Uytterhoeven wrote:
> Hi Tejun,
> 
> On Tue, Jan 9, 2024 at 10:49 PM Tejun Heo <tj@kernel.org> wrote:
> > Moving .flags and .mode right below .hash makes kernfs_node smaller by 8
> > bytes on 64bit.
> >
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> 
> Thanks for your patch!
> 
> > --- a/include/linux/kernfs.h
> > +++ b/include/linux/kernfs.h
> > @@ -206,6 +206,9 @@ struct kernfs_node {
> >
> >         const void              *ns;    /* namespace tag */
> >         unsigned int            hash;   /* ns + name hash */
> > +       unsigned short          flags;
> > +       umode_t                 mode;
> > +
> >         union {
> >                 struct kernfs_elem_dir          dir;
> >                 struct kernfs_elem_symlink      symlink;
> > @@ -220,8 +223,6 @@ struct kernfs_node {
> >          */
> >         u64                     id;
> >
> > -       unsigned short          flags;
> > -       umode_t                 mode;
> >         struct kernfs_iattrs    *iattr;
> 
> Note that there is now a hole at the end of the structure on 32-bit
> architectures
> where the alignment of u64 is 8 bytes.
> 
> Hence, sizeof(struct kernfs_node) grew from 104 to 112 bytes on (at
> least) arm32 and rv32.
> It did shrink by 8 bytes on amd64, arm64, mips64, and rv64.
> Size is unchanged on ia32, m68k and sh.
> 
> I didn't check any other architectures.

Ah, thanks. I'll shuffle things a bit more so that the size doesn't increase
for 32bits.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] kernfs: Rearrange kernfs_node fields to reduce its size on 64bit
  2024-01-09 21:48 ` [PATCH 2/3] kernfs: Rearrange kernfs_node fields to reduce its size on 64bit Tejun Heo
  2024-01-10 15:18   ` Geert Uytterhoeven
@ 2024-01-10 18:28   ` Tejun Heo
  1 sibling, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2024-01-10 18:28 UTC (permalink / raw)
  To: gregkh; +Cc: andrea.righi, ast, linux-kernel, geert

From: Tejun Heo <tj@kernel.org>
Subject: kernfs: Rearrange kernfs_node fields to reduce its size on 64bit

Moving .flags and .mode right below .hash makes kernfs_node smaller by 8
bytes on 64bit. To avoid creating a hole from 8 bytes alignment on 32bit
archs, .priv is moved below so that there are two 32bit pointers after the
64bit .id field.

v2: Updated to avoid size increase on 32bit noticed by Geert.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
---
The third patch applies fine without update.

 include/linux/kernfs.h |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -206,22 +206,22 @@ struct kernfs_node {
 
 	const void		*ns;	/* namespace tag */
 	unsigned int		hash;	/* ns + name hash */
+	unsigned short		flags;
+	umode_t			mode;
+
 	union {
 		struct kernfs_elem_dir		dir;
 		struct kernfs_elem_symlink	symlink;
 		struct kernfs_elem_attr		attr;
 	};
 
-	void			*priv;
-
 	/*
 	 * 64bit unique ID.  On 64bit ino setups, id is the ino.  On 32bit,
 	 * the low 32bits are ino and upper generation.
 	 */
 	u64			id;
 
-	unsigned short		flags;
-	umode_t			mode;
+	void			*priv;
 	struct kernfs_iattrs	*iattr;
 };
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU
  2024-01-09 21:48 [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU Tejun Heo
                   ` (4 preceding siblings ...)
  2024-01-10 15:52 ` Greg KH
@ 2024-01-12 12:34 ` Geert Uytterhoeven
  5 siblings, 0 replies; 10+ messages in thread
From: Geert Uytterhoeven @ 2024-01-12 12:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: gregkh, andrea.righi, ast, linux-kernel

Hi Tejun,

On Tue, Jan 9, 2024 at 10:48 PM Tejun Heo <tj@kernel.org> wrote:
> The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id()
> which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock.
> kernfs_idr_lock used to be a non-irq-safe lock which could lead to deadlocks
> as bpf_cgroup_from_id() can be called from any BPF programs including e.g.
> the ones that attach to functions which are holding the scheduler rq lock.
>
> To resolve the situation dad3fb67ca1c ("kernfs: convert kernfs_idr_lock to
> an irq safe raw spinlock") converted kernfs_idr_lock to an irq-safe raw
> spinlock. However, this was also broken as we call idr_alloc*() while
> holding the lock and idr itself uses an non-irq-safe lock and also calls
> into memory allocator.
>
> Let's instead RCU protect kernfs_node and kernfs_root so that
> kernfs_find_and_get_node_by_id() can use rcu_read_lock() instead of
> kernfs_idr_lock. While this unfortunately increases the size of kernfs_node,
> it's the most straightforward thing to do and there likely are other places
> that can take advantage of RCU protection and improve scalability too.
>
> Please see the patch descriptions for more details.
>
> This patchset is on top of the current driver-core-next - dad3fb67ca1c
> ("kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"), and also
> available in the following git branch.
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git kernfs-use-rcu

No more BUGs seen (with v2) on the few platforms I tried, so
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-01-12 12:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-09 21:48 [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU Tejun Heo
2024-01-09 21:48 ` [PATCH 1/3] Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock" Tejun Heo
2024-01-09 21:48 ` [PATCH 2/3] kernfs: Rearrange kernfs_node fields to reduce its size on 64bit Tejun Heo
2024-01-10 15:18   ` Geert Uytterhoeven
2024-01-10 18:26     ` Tejun Heo
2024-01-10 18:28   ` [PATCH v2 " Tejun Heo
2024-01-09 21:48 ` [PATCH 3/3] kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id() Tejun Heo
2024-01-10  7:40 ` [PATCHSET driver-core-next] kernfs: Protect kernfs_find_and_get_node_by_id() with RCU Andrea Righi
2024-01-10 15:52 ` Greg KH
2024-01-12 12:34 ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox