[PATCH RFC v2 0/2] pidfs: use maple tree

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC v2 0/2] pidfs: use maple tree
@ 2024-12-09 13:46 Christian Brauner
  2024-12-09 13:46 ` [PATCH RFC v2 1/2] maple_tree: make MT_FLAGS_LOCK_IRQ do something Christian Brauner
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Christian Brauner @ 2024-12-09 13:46 UTC (permalink / raw)
  To: Liam R. Howlett, Matthew Wilcox
  Cc: linux-fsdevel, maple-tree, Christian Brauner

Hey,

Ok, I wanted to give this another try as I'd really like to rely on the
maple tree supporting ULONG_MAX when BITS_PER_LONG is 64 as it makes
things a lot simpler overall.

As Willy didn't want additional users relying on an external lock I made
it so that we don't have to and can just use the mtree lock.

However, I need an irq safe variant which is why I added support for
this into the maple tree.

This is pullable from
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git work.pidfs.maple_tree

Thanks!
Christian

---
Christian Brauner (2):
      maple_tree: make MT_FLAGS_LOCK_IRQ do something
      pidfs: use maple tree

 fs/pidfs.c                 | 52 +++++++++++++++++++++++++++-------------------
 include/linux/maple_tree.h | 16 ++++++++++++--
 kernel/pid.c               | 34 +++++++++++++++---------------
 3 files changed, 62 insertions(+), 40 deletions(-)
---
base-commit: 963c8e506c6d4769d04fcb64d4bf783e4ef6093e
change-id: 20241206-work-pidfs-maple_tree-b322ff5399b7


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH RFC v2 1/2] maple_tree: make MT_FLAGS_LOCK_IRQ do something
  2024-12-09 13:46 [PATCH RFC v2 0/2] pidfs: use maple tree Christian Brauner
@ 2024-12-09 13:46 ` Christian Brauner
  2024-12-09 13:46 ` [PATCH RFC v2 2/2] pidfs: use maple tree Christian Brauner
  2024-12-13 14:40 ` [PATCH RFC v2 0/2] " Liam R. Howlett
  2 siblings, 0 replies; 19+ messages in thread
From: Christian Brauner @ 2024-12-09 13:46 UTC (permalink / raw)
  To: Liam R. Howlett, Matthew Wilcox
  Cc: linux-fsdevel, maple-tree, Christian Brauner

I'm not sure what the original intention was but I take it that it wsas
to indicate that the lock to be taken must be irq safe. I need this for
pidfs as it's called from alloc_pid() which expects irq safe locking.
Make mtree_{un}lock() check MT_FLAGS_LOCK_IRQ and if present call
spin_{un}lock_irq().

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 include/linux/maple_tree.h | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index cbbcd18d418684c36a61a1439c3eb04cd17480b0..5cb9a48731f97e56b2fe43228808043e2f7e98bc 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -268,10 +268,22 @@ struct maple_tree {
 #define DEFINE_MTREE(name)						\
 	struct maple_tree name = MTREE_INIT(name, 0)
 
-#define mtree_lock(mt)		spin_lock((&(mt)->ma_lock))
+static __always_inline void mtree_lock(struct maple_tree *mt)
+{
+	if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
+		spin_lock_irq(&mt->ma_lock);
+	else
+		spin_lock(&mt->ma_lock);
+}
+static __always_inline void mtree_unlock(struct maple_tree *mt)
+{
+	if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
+		spin_unlock_irq(&mt->ma_lock);
+	else
+		spin_unlock(&mt->ma_lock);
+}
 #define mtree_lock_nested(mas, subclass) \
 		spin_lock_nested((&(mt)->ma_lock), subclass)
-#define mtree_unlock(mt)	spin_unlock((&(mt)->ma_lock))
 
 /*
  * The Maple Tree squeezes various bits in at various points which aren't

-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC v2 2/2] pidfs: use maple tree
  2024-12-09 13:46 [PATCH RFC v2 0/2] pidfs: use maple tree Christian Brauner
  2024-12-09 13:46 ` [PATCH RFC v2 1/2] maple_tree: make MT_FLAGS_LOCK_IRQ do something Christian Brauner
@ 2024-12-09 13:46 ` Christian Brauner
  2024-12-13 10:35   ` Marek Szyprowski
  2024-12-13 14:40 ` [PATCH RFC v2 0/2] " Liam R. Howlett
  2 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2024-12-09 13:46 UTC (permalink / raw)
  To: Liam R. Howlett, Matthew Wilcox
  Cc: linux-fsdevel, maple-tree, Christian Brauner

So far we've been using an idr to track pidfs inodes. For some time now
each struct pid has a unique 64bit value that is used as the inode
number on 64 bit. That unique inode couldn't be used for looking up a
specific struct pid though.

Now that we support file handles we need this ability while avoiding to
leak actual pid identifiers into userspace which can be problematic in
containers.

So far I had used an idr-based mechanism where the idr is used to
generate a 32 bit number and each time it wraps we increment an upper
bit value and generate a unique 64 bit value. The lower 32 bits are used
to lookup the pid.

I've been looking at the maple tree because it now has
mas_alloc_cyclic(). Since it uses unsigned long it would simplify the
64bit implementation and its dense node mode supposedly also helps to
mitigate fragmentation.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/pidfs.c   | 52 +++++++++++++++++++++++++++++++---------------------
 kernel/pid.c | 34 +++++++++++++++++-----------------
 2 files changed, 48 insertions(+), 38 deletions(-)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index 7a1d606b09d7b315e780c81fc7341f4ec82f2639..4a622f906fc210d5e81ba2ac856cfe0ca930f219 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -19,14 +19,15 @@
 #include <linux/ipc_namespace.h>
 #include <linux/time_namespace.h>
 #include <linux/utsname.h>
+#include <linux/maple_tree.h>
 #include <net/net_namespace.h>
 
 #include "internal.h"
 #include "mount.h"
 
-static DEFINE_IDR(pidfs_ino_idr);
-
-static u32 pidfs_ino_upper_32_bits = 0;
+static struct maple_tree pidfs_ino_mtree = MTREE_INIT(pidfs_ino_mtree,
+						      MT_FLAGS_ALLOC_RANGE |
+						      MT_FLAGS_LOCK_IRQ);
 
 #if BITS_PER_LONG == 32
 /*
@@ -34,8 +35,6 @@ static u32 pidfs_ino_upper_32_bits = 0;
  * the higher 32 bits are the generation number. The starting
  * value for the inode number and the generation number is one.
  */
-static u32 pidfs_ino_lower_32_bits = 1;
-
 static inline unsigned long pidfs_ino(u64 ino)
 {
 	return lower_32_bits(ino);
@@ -49,8 +48,6 @@ static inline u32 pidfs_gen(u64 ino)
 
 #else
 
-static u32 pidfs_ino_lower_32_bits = 0;
-
 /* On 64 bit simply return ino. */
 static inline unsigned long pidfs_ino(u64 ino)
 {
@@ -71,30 +68,43 @@ static inline u32 pidfs_gen(u64 ino)
  */
 int pidfs_add_pid(struct pid *pid)
 {
-	u32 upper;
-	int lower;
+	static unsigned long lower_next = 0;
+	static u32 pidfs_ino_upper_32_bits = 0;
+	unsigned long lower;
+	int ret;
+	MA_STATE(mas, &pidfs_ino_mtree, 0, 0);
 
         /*
 	 * Inode numbering for pidfs start at 2. This avoids collisions
 	 * with the root inode which is 1 for pseudo filesystems.
          */
-	lower = idr_alloc_cyclic(&pidfs_ino_idr, pid, 2, 0, GFP_ATOMIC);
-	if (lower >= 0 && lower < pidfs_ino_lower_32_bits)
-		pidfs_ino_upper_32_bits++;
-	upper = pidfs_ino_upper_32_bits;
-	pidfs_ino_lower_32_bits = lower;
-	if (lower < 0)
-		return lower;
-
-	pid->ino = ((u64)upper << 32) | lower;
+	mtree_lock(&pidfs_ino_mtree);
+	ret = mas_alloc_cyclic(&mas, &lower, pid, 2, ULONG_MAX, &lower_next,
+			       GFP_KERNEL);
+	if (ret < 0)
+		goto out_unlock;
+
+#if BITS_PER_LONG == 32
+	if (ret == 1) {
+		u32 higher;
+
+		if (check_add_overflow(pidfs_ino_upper_32_bits, 1, &higher))
+			goto out_unlock;
+		pidfs_ino_upper_32_bits = higher;
+	}
+#endif
+	pid->ino = ((u64)pidfs_ino_upper_32_bits << 32) | lower;
 	pid->stashed = NULL;
-	return 0;
+
+out_unlock:
+	mtree_unlock(&pidfs_ino_mtree);
+	return ret;
 }
 
 /* The idr number to remove is the lower 32 bits of the inode. */
 void pidfs_remove_pid(struct pid *pid)
 {
-	idr_remove(&pidfs_ino_idr, lower_32_bits(pid->ino));
+	mtree_erase(&pidfs_ino_mtree, pidfs_ino(pid->ino));
 }
 
 #ifdef CONFIG_PROC_FS
@@ -522,7 +532,7 @@ static struct pid *pidfs_ino_get_pid(u64 ino)
 
 	guard(rcu)();
 
-	pid = idr_find(&pidfs_ino_idr, lower_32_bits(pid_ino));
+	pid = mtree_load(&pidfs_ino_mtree, pid_ino);
 	if (!pid)
 		return NULL;
 
diff --git a/kernel/pid.c b/kernel/pid.c
index 6131543e7c090c164a2bac014f8eeee61926b13d..28100bbac4c130e192abf65d88cca9d330971c5c 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -131,6 +131,8 @@ void free_pid(struct pid *pid)
 	int i;
 	unsigned long flags;
 
+	pidfs_remove_pid(pid);
+
 	spin_lock_irqsave(&pidmap_lock, flags);
 	for (i = 0; i <= pid->level; i++) {
 		struct upid *upid = pid->numbers + i;
@@ -152,7 +154,6 @@ void free_pid(struct pid *pid)
 		}
 
 		idr_remove(&ns->idr, upid->nr);
-		pidfs_remove_pid(pid);
 	}
 	spin_unlock_irqrestore(&pidmap_lock, flags);
 
@@ -249,16 +250,6 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 		tmp = tmp->parent;
 	}
 
-	/*
-	 * ENOMEM is not the most obvious choice especially for the case
-	 * where the child subreaper has already exited and the pid
-	 * namespace denies the creation of any new processes. But ENOMEM
-	 * is what we have exposed to userspace for a long time and it is
-	 * documented behavior for pid namespaces. So we can't easily
-	 * change it even if there were an error code better suited.
-	 */
-	retval = -ENOMEM;
-
 	get_pid_ns(ns);
 	refcount_set(&pid->count, 1);
 	spin_lock_init(&pid->lock);
@@ -269,12 +260,23 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 	INIT_HLIST_HEAD(&pid->inodes);
 
 	upid = pid->numbers + ns->level;
-	idr_preload(GFP_KERNEL);
-	spin_lock_irq(&pidmap_lock);
-	if (!(ns->pid_allocated & PIDNS_ADDING))
-		goto out_unlock;
+
 	retval = pidfs_add_pid(pid);
 	if (retval)
+		goto out_free;
+
+	/*
+	 * ENOMEM is not the most obvious choice especially for the case
+	 * where the child subreaper has already exited and the pid
+	 * namespace denies the creation of any new processes. But ENOMEM
+	 * is what we have exposed to userspace for a long time and it is
+	 * documented behavior for pid namespaces. So we can't easily
+	 * change it even if there were an error code better suited.
+	 */
+	retval = -ENOMEM;
+
+	spin_lock_irq(&pidmap_lock);
+	if (!(ns->pid_allocated & PIDNS_ADDING))
 		goto out_unlock;
 	for ( ; upid >= pid->numbers; --upid) {
 		/* Make the PID visible to find_pid_ns. */
@@ -282,13 +284,11 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 		upid->ns->pid_allocated++;
 	}
 	spin_unlock_irq(&pidmap_lock);
-	idr_preload_end();
 
 	return pid;
 
 out_unlock:
 	spin_unlock_irq(&pidmap_lock);
-	idr_preload_end();
 	put_pid_ns(ns);
 
 out_free:

-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 2/2] pidfs: use maple tree
  2024-12-09 13:46 ` [PATCH RFC v2 2/2] pidfs: use maple tree Christian Brauner
@ 2024-12-13 10:35   ` Marek Szyprowski
  2024-12-13 13:07     ` Christian Brauner
  0 siblings, 1 reply; 19+ messages in thread
From: Marek Szyprowski @ 2024-12-13 10:35 UTC (permalink / raw)
  To: Christian Brauner, Liam R. Howlett, Matthew Wilcox
  Cc: linux-fsdevel, maple-tree

On 09.12.2024 14:46, Christian Brauner wrote:
> So far we've been using an idr to track pidfs inodes. For some time now
> each struct pid has a unique 64bit value that is used as the inode
> number on 64 bit. That unique inode couldn't be used for looking up a
> specific struct pid though.
>
> Now that we support file handles we need this ability while avoiding to
> leak actual pid identifiers into userspace which can be problematic in
> containers.
>
> So far I had used an idr-based mechanism where the idr is used to
> generate a 32 bit number and each time it wraps we increment an upper
> bit value and generate a unique 64 bit value. The lower 32 bits are used
> to lookup the pid.
>
> I've been looking at the maple tree because it now has
> mas_alloc_cyclic(). Since it uses unsigned long it would simplify the
> 64bit implementation and its dense node mode supposedly also helps to
> mitigate fragmentation.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

This patch landed in today's linux-next as commit a2c8e88a30f7 ("pidfs: 
use maple tree"). In my tests I found that it triggers the following 
lockdep warning, what probably means that something has not been 
properly initialized:

================================
WARNING: inconsistent lock state
6.13.0-rc1-00015-ga2c8e88a30f7 #15489 Not tainted
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
c25d4d10 (&sighand->siglock){?.+.}-{3:3}, at: __lock_task_sighand+0x80/0x1bc
{HARDIRQ-ON-W} state was registered at:
   lockdep_hardirqs_on_prepare+0x1a4/0x2c0
   trace_hardirqs_on+0x94/0xa0
   _raw_spin_unlock_irq+0x20/0x50
   mtree_erase+0x88/0xbc
   free_pid+0xc/0xd4
   release_task+0x630/0x7a8
   do_exit+0x6b8/0xa64
   call_usermodehelper_exec_async+0x120/0x144
   ret_from_fork+0x14/0x28
irq event stamp: 1017892
hardirqs last  enabled at (1017891): [<c0c8e510>] 
default_idle_call+0x18/0x13c
hardirqs last disabled at (1017892): [<c0100b94>] __irq_svc+0x54/0xd0
softirqs last  enabled at (1017868): [<c013b410>] 
handle_softirqs+0x328/0x520
softirqs last disabled at (1017835): [<c013b7b4>] __irq_exit_rcu+0x144/0x1f0

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&sighand->siglock);
   <Interrupt>
     lock(&sighand->siglock);

  *** DEADLOCK ***

2 locks held by swapper/0/0:
  #0: c137b4d0 (rcu_read_lock){....}-{1:3}, at: 
kill_pid_info_type+0x2c/0x168
  #1: c137b4d0 (rcu_read_lock){....}-{1:3}, at: 
__lock_task_sighand+0x0/0x1bc

stack backtrace:
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 
6.13.0-rc1-00015-ga2c8e88a30f7 #15489
Hardware name: Samsung Exynos (Flattened Device Tree)
Call trace:
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x68/0x88
  dump_stack_lvl from print_usage_bug.part.0+0x24c/0x270
  print_usage_bug.part.0 from mark_lock.part.0+0xc20/0x129c
  mark_lock.part.0 from __lock_acquire+0xae8/0x2a00
  __lock_acquire from lock_acquire+0x134/0x384
  lock_acquire from _raw_spin_lock_irqsave+0x4c/0x68
  _raw_spin_lock_irqsave from __lock_task_sighand+0x80/0x1bc
  __lock_task_sighand from group_send_sig_info+0x120/0x1b4
  group_send_sig_info from kill_pid_info_type+0x8c/0x168
  kill_pid_info_type from it_real_fn+0x5c/0x120
  it_real_fn from __hrtimer_run_queues+0xcc/0x538
  __hrtimer_run_queues from hrtimer_interrupt+0x128/0x2c4
  hrtimer_interrupt from exynos4_mct_tick_isr+0x44/0x7c
  exynos4_mct_tick_isr from handle_percpu_devid_irq+0x84/0x158
  handle_percpu_devid_irq from generic_handle_domain_irq+0x24/0x34
  generic_handle_domain_irq from gic_handle_irq+0x88/0xa8
  gic_handle_irq from generic_handle_arch_irq+0x34/0x44
  generic_handle_arch_irq from __irq_svc+0x8c/0xd0
Exception stack(0xc1301f20 to 0xc1301f68)
...
  __irq_svc from default_idle_call+0x1c/0x13c
  default_idle_call from do_idle+0x23c/0x2ac
  do_idle from cpu_startup_entry+0x28/0x2c
  cpu_startup_entry from kernel_init+0x0/0x12c

--->8---

> ---
>   fs/pidfs.c   | 52 +++++++++++++++++++++++++++++++---------------------
>   kernel/pid.c | 34 +++++++++++++++++-----------------
>   2 files changed, 48 insertions(+), 38 deletions(-)
>
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index 7a1d606b09d7b315e780c81fc7341f4ec82f2639..4a622f906fc210d5e81ba2ac856cfe0ca930f219 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
> @@ -19,14 +19,15 @@
>   #include <linux/ipc_namespace.h>
>   #include <linux/time_namespace.h>
>   #include <linux/utsname.h>
> +#include <linux/maple_tree.h>
>   #include <net/net_namespace.h>
>   
>   #include "internal.h"
>   #include "mount.h"
>   
> -static DEFINE_IDR(pidfs_ino_idr);
> -
> -static u32 pidfs_ino_upper_32_bits = 0;
> +static struct maple_tree pidfs_ino_mtree = MTREE_INIT(pidfs_ino_mtree,
> +						      MT_FLAGS_ALLOC_RANGE |
> +						      MT_FLAGS_LOCK_IRQ);
>   
>   #if BITS_PER_LONG == 32
>   /*
> @@ -34,8 +35,6 @@ static u32 pidfs_ino_upper_32_bits = 0;
>    * the higher 32 bits are the generation number. The starting
>    * value for the inode number and the generation number is one.
>    */
> -static u32 pidfs_ino_lower_32_bits = 1;
> -
>   static inline unsigned long pidfs_ino(u64 ino)
>   {
>   	return lower_32_bits(ino);
> @@ -49,8 +48,6 @@ static inline u32 pidfs_gen(u64 ino)
>   
>   #else
>   
> -static u32 pidfs_ino_lower_32_bits = 0;
> -
>   /* On 64 bit simply return ino. */
>   static inline unsigned long pidfs_ino(u64 ino)
>   {
> @@ -71,30 +68,43 @@ static inline u32 pidfs_gen(u64 ino)
>    */
>   int pidfs_add_pid(struct pid *pid)
>   {
> -	u32 upper;
> -	int lower;
> +	static unsigned long lower_next = 0;
> +	static u32 pidfs_ino_upper_32_bits = 0;
> +	unsigned long lower;
> +	int ret;
> +	MA_STATE(mas, &pidfs_ino_mtree, 0, 0);
>   
>           /*
>   	 * Inode numbering for pidfs start at 2. This avoids collisions
>   	 * with the root inode which is 1 for pseudo filesystems.
>            */
> -	lower = idr_alloc_cyclic(&pidfs_ino_idr, pid, 2, 0, GFP_ATOMIC);
> -	if (lower >= 0 && lower < pidfs_ino_lower_32_bits)
> -		pidfs_ino_upper_32_bits++;
> -	upper = pidfs_ino_upper_32_bits;
> -	pidfs_ino_lower_32_bits = lower;
> -	if (lower < 0)
> -		return lower;
> -
> -	pid->ino = ((u64)upper << 32) | lower;
> +	mtree_lock(&pidfs_ino_mtree);
> +	ret = mas_alloc_cyclic(&mas, &lower, pid, 2, ULONG_MAX, &lower_next,
> +			       GFP_KERNEL);
> +	if (ret < 0)
> +		goto out_unlock;
> +
> +#if BITS_PER_LONG == 32
> +	if (ret == 1) {
> +		u32 higher;
> +
> +		if (check_add_overflow(pidfs_ino_upper_32_bits, 1, &higher))
> +			goto out_unlock;
> +		pidfs_ino_upper_32_bits = higher;
> +	}
> +#endif
> +	pid->ino = ((u64)pidfs_ino_upper_32_bits << 32) | lower;
>   	pid->stashed = NULL;
> -	return 0;
> +
> +out_unlock:
> +	mtree_unlock(&pidfs_ino_mtree);
> +	return ret;
>   }
>   
>   /* The idr number to remove is the lower 32 bits of the inode. */
>   void pidfs_remove_pid(struct pid *pid)
>   {
> -	idr_remove(&pidfs_ino_idr, lower_32_bits(pid->ino));
> +	mtree_erase(&pidfs_ino_mtree, pidfs_ino(pid->ino));
>   }
>   
>   #ifdef CONFIG_PROC_FS
> @@ -522,7 +532,7 @@ static struct pid *pidfs_ino_get_pid(u64 ino)
>   
>   	guard(rcu)();
>   
> -	pid = idr_find(&pidfs_ino_idr, lower_32_bits(pid_ino));
> +	pid = mtree_load(&pidfs_ino_mtree, pid_ino);
>   	if (!pid)
>   		return NULL;
>   
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 6131543e7c090c164a2bac014f8eeee61926b13d..28100bbac4c130e192abf65d88cca9d330971c5c 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -131,6 +131,8 @@ void free_pid(struct pid *pid)
>   	int i;
>   	unsigned long flags;
>   
> +	pidfs_remove_pid(pid);
> +
>   	spin_lock_irqsave(&pidmap_lock, flags);
>   	for (i = 0; i <= pid->level; i++) {
>   		struct upid *upid = pid->numbers + i;
> @@ -152,7 +154,6 @@ void free_pid(struct pid *pid)
>   		}
>   
>   		idr_remove(&ns->idr, upid->nr);
> -		pidfs_remove_pid(pid);
>   	}
>   	spin_unlock_irqrestore(&pidmap_lock, flags);
>   
> @@ -249,16 +250,6 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
>   		tmp = tmp->parent;
>   	}
>   
> -	/*
> -	 * ENOMEM is not the most obvious choice especially for the case
> -	 * where the child subreaper has already exited and the pid
> -	 * namespace denies the creation of any new processes. But ENOMEM
> -	 * is what we have exposed to userspace for a long time and it is
> -	 * documented behavior for pid namespaces. So we can't easily
> -	 * change it even if there were an error code better suited.
> -	 */
> -	retval = -ENOMEM;
> -
>   	get_pid_ns(ns);
>   	refcount_set(&pid->count, 1);
>   	spin_lock_init(&pid->lock);
> @@ -269,12 +260,23 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
>   	INIT_HLIST_HEAD(&pid->inodes);
>   
>   	upid = pid->numbers + ns->level;
> -	idr_preload(GFP_KERNEL);
> -	spin_lock_irq(&pidmap_lock);
> -	if (!(ns->pid_allocated & PIDNS_ADDING))
> -		goto out_unlock;
> +
>   	retval = pidfs_add_pid(pid);
>   	if (retval)
> +		goto out_free;
> +
> +	/*
> +	 * ENOMEM is not the most obvious choice especially for the case
> +	 * where the child subreaper has already exited and the pid
> +	 * namespace denies the creation of any new processes. But ENOMEM
> +	 * is what we have exposed to userspace for a long time and it is
> +	 * documented behavior for pid namespaces. So we can't easily
> +	 * change it even if there were an error code better suited.
> +	 */
> +	retval = -ENOMEM;
> +
> +	spin_lock_irq(&pidmap_lock);
> +	if (!(ns->pid_allocated & PIDNS_ADDING))
>   		goto out_unlock;
>   	for ( ; upid >= pid->numbers; --upid) {
>   		/* Make the PID visible to find_pid_ns. */
> @@ -282,13 +284,11 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
>   		upid->ns->pid_allocated++;
>   	}
>   	spin_unlock_irq(&pidmap_lock);
> -	idr_preload_end();
>   
>   	return pid;
>   
>   out_unlock:
>   	spin_unlock_irq(&pidmap_lock);
> -	idr_preload_end();
>   	put_pid_ns(ns);
>   
>   out_free:
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 2/2] pidfs: use maple tree
  2024-12-13 10:35   ` Marek Szyprowski
@ 2024-12-13 13:07     ` Christian Brauner
  2024-12-13 14:16       ` Marek Szyprowski
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2024-12-13 13:07 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Liam R. Howlett, Matthew Wilcox, linux-fsdevel, maple-tree

On Fri, Dec 13, 2024 at 11:35:48AM +0100, Marek Szyprowski wrote:
> On 09.12.2024 14:46, Christian Brauner wrote:
> > So far we've been using an idr to track pidfs inodes. For some time now
> > each struct pid has a unique 64bit value that is used as the inode
> > number on 64 bit. That unique inode couldn't be used for looking up a
> > specific struct pid though.
> >
> > Now that we support file handles we need this ability while avoiding to
> > leak actual pid identifiers into userspace which can be problematic in
> > containers.
> >
> > So far I had used an idr-based mechanism where the idr is used to
> > generate a 32 bit number and each time it wraps we increment an upper
> > bit value and generate a unique 64 bit value. The lower 32 bits are used
> > to lookup the pid.
> >
> > I've been looking at the maple tree because it now has
> > mas_alloc_cyclic(). Since it uses unsigned long it would simplify the
> > 64bit implementation and its dense node mode supposedly also helps to
> > mitigate fragmentation.
> >
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> 
> This patch landed in today's linux-next as commit a2c8e88a30f7 ("pidfs: 
> use maple tree"). In my tests I found that it triggers the following 
> lockdep warning, what probably means that something has not been 
> properly initialized:

Ah, no, I think the issue that it didn't use irq{save,restore} spin lock
variants in that codepath as this is free_pid() which needs it.

I pushed a fix. Please yell if this issue persists.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 2/2] pidfs: use maple tree
  2024-12-13 13:07     ` Christian Brauner
@ 2024-12-13 14:16       ` Marek Szyprowski
  0 siblings, 0 replies; 19+ messages in thread
From: Marek Szyprowski @ 2024-12-13 14:16 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Liam R. Howlett, Matthew Wilcox, linux-fsdevel, maple-tree

On 13.12.2024 14:07, Christian Brauner wrote:
> On Fri, Dec 13, 2024 at 11:35:48AM +0100, Marek Szyprowski wrote:
>> On 09.12.2024 14:46, Christian Brauner wrote:
>>> So far we've been using an idr to track pidfs inodes. For some time now
>>> each struct pid has a unique 64bit value that is used as the inode
>>> number on 64 bit. That unique inode couldn't be used for looking up a
>>> specific struct pid though.
>>>
>>> Now that we support file handles we need this ability while avoiding to
>>> leak actual pid identifiers into userspace which can be problematic in
>>> containers.
>>>
>>> So far I had used an idr-based mechanism where the idr is used to
>>> generate a 32 bit number and each time it wraps we increment an upper
>>> bit value and generate a unique 64 bit value. The lower 32 bits are used
>>> to lookup the pid.
>>>
>>> I've been looking at the maple tree because it now has
>>> mas_alloc_cyclic(). Since it uses unsigned long it would simplify the
>>> 64bit implementation and its dense node mode supposedly also helps to
>>> mitigate fragmentation.
>>>
>>> Signed-off-by: Christian Brauner <brauner@kernel.org>
>> This patch landed in today's linux-next as commit a2c8e88a30f7 ("pidfs:
>> use maple tree"). In my tests I found that it triggers the following
>> lockdep warning, what probably means that something has not been
>> properly initialized:
> Ah, no, I think the issue that it didn't use irq{save,restore} spin lock
> variants in that codepath as this is free_pid() which needs it.
>
> I pushed a fix. Please yell if this issue persists.

I've applied this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.14.pidfs&id=34a0a75fd0887b599d68088b1dd40b3e48cfdc42

onto next-20241213 and it fixed my issue. Thanks!

Feel free to add:

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-09 13:46 [PATCH RFC v2 0/2] pidfs: use maple tree Christian Brauner
  2024-12-09 13:46 ` [PATCH RFC v2 1/2] maple_tree: make MT_FLAGS_LOCK_IRQ do something Christian Brauner
  2024-12-09 13:46 ` [PATCH RFC v2 2/2] pidfs: use maple tree Christian Brauner
@ 2024-12-13 14:40 ` Liam R. Howlett
  2024-12-13 18:51   ` Christian Brauner
  2 siblings, 1 reply; 19+ messages in thread
From: Liam R. Howlett @ 2024-12-13 14:40 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Matthew Wilcox, linux-fsdevel, maple-tree

* Christian Brauner <brauner@kernel.org> [241209 08:47]:
> Hey,
> 
> Ok, I wanted to give this another try as I'd really like to rely on the
> maple tree supporting ULONG_MAX when BITS_PER_LONG is 64 as it makes
> things a lot simpler overall.
> 
> As Willy didn't want additional users relying on an external lock I made
> it so that we don't have to and can just use the mtree lock.
> 
> However, I need an irq safe variant which is why I added support for
> this into the maple tree.
> 
> This is pullable from
> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git work.pidfs.maple_tree


I've been meaning to respond to this thread.

I believe the flag is to tell the internal code what lock to use.  If
you look at mas_nomem(), there is a retry loop that will drop the lock
to allocate and retry the operation.  That function needs to support the
flag and use the correct lock/unlock.

The mas_lock()/mas_unlock() needs a mas_lock_irq()/mas_unlock_irq()
variant, which would be used in the correct context.

Does that make sense?

Thanks,
Liam

> 
> Thanks!
> Christian
> 
> ---
> Christian Brauner (2):
>       maple_tree: make MT_FLAGS_LOCK_IRQ do something
>       pidfs: use maple tree
> 
>  fs/pidfs.c                 | 52 +++++++++++++++++++++++++++-------------------
>  include/linux/maple_tree.h | 16 ++++++++++++--
>  kernel/pid.c               | 34 +++++++++++++++---------------
>  3 files changed, 62 insertions(+), 40 deletions(-)
> ---
> base-commit: 963c8e506c6d4769d04fcb64d4bf783e4ef6093e
> change-id: 20241206-work-pidfs-maple_tree-b322ff5399b7
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-13 14:40 ` [PATCH RFC v2 0/2] " Liam R. Howlett
@ 2024-12-13 18:51   ` Christian Brauner
  2024-12-13 18:53     ` Matthew Wilcox
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2024-12-13 18:51 UTC (permalink / raw)
  To: Liam R. Howlett; +Cc: Matthew Wilcox, linux-fsdevel, maple-tree

On Fri, Dec 13, 2024 at 09:40:49AM -0500, Liam R. Howlett wrote:
> * Christian Brauner <brauner@kernel.org> [241209 08:47]:
> > Hey,
> > 
> > Ok, I wanted to give this another try as I'd really like to rely on the
> > maple tree supporting ULONG_MAX when BITS_PER_LONG is 64 as it makes
> > things a lot simpler overall.
> > 
> > As Willy didn't want additional users relying on an external lock I made
> > it so that we don't have to and can just use the mtree lock.
> > 
> > However, I need an irq safe variant which is why I added support for
> > this into the maple tree.
> > 
> > This is pullable from
> > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git work.pidfs.maple_tree
> 
> 
> I've been meaning to respond to this thread.
> 
> I believe the flag is to tell the internal code what lock to use.  If
> you look at mas_nomem(), there is a retry loop that will drop the lock
> to allocate and retry the operation.  That function needs to support the
> flag and use the correct lock/unlock.
> 
> The mas_lock()/mas_unlock() needs a mas_lock_irq()/mas_unlock_irq()
> variant, which would be used in the correct context.

Yeah, it does. Did you see the patch that is included in the series?
I've replaced the macro with always inline functions that select the
lock based on the flag:

static __always_inline void mtree_lock(struct maple_tree *mt)
{
        if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
                spin_lock_irq(&mt->ma_lock);
        else
                spin_lock(&mt->ma_lock);
}
static __always_inline void mtree_unlock(struct maple_tree *mt)
{
        if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
                spin_unlock_irq(&mt->ma_lock);
        else
                spin_unlock(&mt->ma_lock);
}

Does that work for you?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-13 18:51   ` Christian Brauner
@ 2024-12-13 18:53     ` Matthew Wilcox
  2024-12-13 19:01       ` Christian Brauner
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2024-12-13 18:53 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Liam R. Howlett, linux-fsdevel, maple-tree

On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> Yeah, it does. Did you see the patch that is included in the series?
> I've replaced the macro with always inline functions that select the
> lock based on the flag:
> 
> static __always_inline void mtree_lock(struct maple_tree *mt)
> {
>         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
>                 spin_lock_irq(&mt->ma_lock);
>         else
>                 spin_lock(&mt->ma_lock);
> }
> static __always_inline void mtree_unlock(struct maple_tree *mt)
> {
>         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
>                 spin_unlock_irq(&mt->ma_lock);
>         else
>                 spin_unlock(&mt->ma_lock);
> }
> 
> Does that work for you?

See the way the XArray works; we're trying to keep the two APIs as
close as possible.

The caller should use mtree_lock_irq() or mtree_lock_irqsave()
as appropriate.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-13 18:53     ` Matthew Wilcox
@ 2024-12-13 19:01       ` Christian Brauner
  2024-12-13 19:25         ` Christian Brauner
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2024-12-13 19:01 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Liam R. Howlett, linux-fsdevel, maple-tree

On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > Yeah, it does. Did you see the patch that is included in the series?
> > I've replaced the macro with always inline functions that select the
> > lock based on the flag:
> > 
> > static __always_inline void mtree_lock(struct maple_tree *mt)
> > {
> >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> >                 spin_lock_irq(&mt->ma_lock);
> >         else
> >                 spin_lock(&mt->ma_lock);
> > }
> > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > {
> >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> >                 spin_unlock_irq(&mt->ma_lock);
> >         else
> >                 spin_unlock(&mt->ma_lock);
> > }
> > 
> > Does that work for you?
> 
> See the way the XArray works; we're trying to keep the two APIs as
> close as possible.
> 
> The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> as appropriate.

Say I need:

spin_lock_irqsave(&mt->ma_lock, flags);
mas_erase(...);
-> mas_nomem()
   -> mtree_unlock() // uses spin_unlock();
      // allocate
   -> mtree_lock() // uses spin_lock();
spin_lock_irqrestore(&mt->ma_lock, flags);

So that doesn't work, right? IOW, the maple tree does internal drop and
retake locks and they need to match the locks of the outer context.

So, I think I need a way to communicate to mas_*() what type of lock to
take, no? Any idea how you would like me to do this in case I'm not
wrong?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-13 19:01       ` Christian Brauner
@ 2024-12-13 19:25         ` Christian Brauner
  2024-12-13 20:11           ` Christian Brauner
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2024-12-13 19:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Liam R. Howlett, linux-fsdevel, maple-tree

On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > Yeah, it does. Did you see the patch that is included in the series?
> > > I've replaced the macro with always inline functions that select the
> > > lock based on the flag:
> > > 
> > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > {
> > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > >                 spin_lock_irq(&mt->ma_lock);
> > >         else
> > >                 spin_lock(&mt->ma_lock);
> > > }
> > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > {
> > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > >                 spin_unlock_irq(&mt->ma_lock);
> > >         else
> > >                 spin_unlock(&mt->ma_lock);
> > > }
> > > 
> > > Does that work for you?
> > 
> > See the way the XArray works; we're trying to keep the two APIs as
> > close as possible.
> > 
> > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > as appropriate.
> 
> Say I need:
> 
> spin_lock_irqsave(&mt->ma_lock, flags);
> mas_erase(...);
> -> mas_nomem()
>    -> mtree_unlock() // uses spin_unlock();
>       // allocate
>    -> mtree_lock() // uses spin_lock();
> spin_lock_irqrestore(&mt->ma_lock, flags);
> 
> So that doesn't work, right? IOW, the maple tree does internal drop and
> retake locks and they need to match the locks of the outer context.
> 
> So, I think I need a way to communicate to mas_*() what type of lock to
> take, no? Any idea how you would like me to do this in case I'm not
> wrong?

My first inclination has been to do it via MA_STATE() and the mas_flag
value but I'm open to any other ideas.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-13 19:25         ` Christian Brauner
@ 2024-12-13 20:11           ` Christian Brauner
  2024-12-13 20:50             ` Christian Brauner
  2024-12-13 21:04             ` Liam R. Howlett
  0 siblings, 2 replies; 19+ messages in thread
From: Christian Brauner @ 2024-12-13 20:11 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Liam R. Howlett, linux-fsdevel, maple-tree

On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > I've replaced the macro with always inline functions that select the
> > > > lock based on the flag:
> > > > 
> > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > {
> > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > >                 spin_lock_irq(&mt->ma_lock);
> > > >         else
> > > >                 spin_lock(&mt->ma_lock);
> > > > }
> > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > {
> > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > >                 spin_unlock_irq(&mt->ma_lock);
> > > >         else
> > > >                 spin_unlock(&mt->ma_lock);
> > > > }
> > > > 
> > > > Does that work for you?
> > > 
> > > See the way the XArray works; we're trying to keep the two APIs as
> > > close as possible.
> > > 
> > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > as appropriate.
> > 
> > Say I need:
> > 
> > spin_lock_irqsave(&mt->ma_lock, flags);
> > mas_erase(...);
> > -> mas_nomem()
> >    -> mtree_unlock() // uses spin_unlock();
> >       // allocate
> >    -> mtree_lock() // uses spin_lock();
> > spin_lock_irqrestore(&mt->ma_lock, flags);
> > 
> > So that doesn't work, right? IOW, the maple tree does internal drop and
> > retake locks and they need to match the locks of the outer context.
> > 
> > So, I think I need a way to communicate to mas_*() what type of lock to
> > take, no? Any idea how you would like me to do this in case I'm not
> > wrong?
> 
> My first inclination has been to do it via MA_STATE() and the mas_flag
> value but I'm open to any other ideas.

Braino on my part as free_pid() can be called with write_lock_irq() held.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-13 20:11           ` Christian Brauner
@ 2024-12-13 20:50             ` Christian Brauner
  2024-12-13 21:07               ` Christian Brauner
  2024-12-13 21:13               ` Liam R. Howlett
  2024-12-13 21:04             ` Liam R. Howlett
  1 sibling, 2 replies; 19+ messages in thread
From: Christian Brauner @ 2024-12-13 20:50 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Liam R. Howlett, linux-fsdevel, maple-tree

On Fri, Dec 13, 2024 at 09:11:04PM +0100, Christian Brauner wrote:
> On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > I've replaced the macro with always inline functions that select the
> > > > > lock based on the flag:
> > > > > 
> > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > {
> > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > >         else
> > > > >                 spin_lock(&mt->ma_lock);
> > > > > }
> > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > {
> > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > >         else
> > > > >                 spin_unlock(&mt->ma_lock);
> > > > > }
> > > > > 
> > > > > Does that work for you?
> > > > 
> > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > close as possible.
> > > > 
> > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > as appropriate.
> > > 
> > > Say I need:
> > > 
> > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > mas_erase(...);
> > > -> mas_nomem()
> > >    -> mtree_unlock() // uses spin_unlock();
> > >       // allocate
> > >    -> mtree_lock() // uses spin_lock();
> > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > 
> > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > retake locks and they need to match the locks of the outer context.
> > > 
> > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > take, no? Any idea how you would like me to do this in case I'm not
> > > wrong?
> > 
> > My first inclination has been to do it via MA_STATE() and the mas_flag
> > value but I'm open to any other ideas.
> 
> Braino on my part as free_pid() can be called with write_lock_irq() held.

I don't think I can use the maple tree because even an mas_erase()
operation may allocate memory and that just makes it rather unpleasant
to use in e.g., free_pid(). So I think I'm going to explore using the
xarray to get the benefits of ULONG_MAX indices and I see that btrfs is
using it already for similar purposes.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-13 20:50             ` Christian Brauner
@ 2024-12-13 21:07               ` Christian Brauner
  2024-12-13 21:16                 ` Liam R. Howlett
  2024-12-13 21:13               ` Liam R. Howlett
  1 sibling, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2024-12-13 21:07 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Liam R. Howlett, linux-fsdevel, maple-tree

On Fri, Dec 13, 2024 at 09:50:56PM +0100, Christian Brauner wrote:
> On Fri, Dec 13, 2024 at 09:11:04PM +0100, Christian Brauner wrote:
> > On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > > I've replaced the macro with always inline functions that select the
> > > > > > lock based on the flag:
> > > > > > 
> > > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > > {
> > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > > >         else
> > > > > >                 spin_lock(&mt->ma_lock);
> > > > > > }
> > > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > > {
> > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > > >         else
> > > > > >                 spin_unlock(&mt->ma_lock);
> > > > > > }
> > > > > > 
> > > > > > Does that work for you?
> > > > > 
> > > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > > close as possible.
> > > > > 
> > > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > > as appropriate.
> > > > 
> > > > Say I need:
> > > > 
> > > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > > mas_erase(...);
> > > > -> mas_nomem()
> > > >    -> mtree_unlock() // uses spin_unlock();
> > > >       // allocate
> > > >    -> mtree_lock() // uses spin_lock();
> > > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > > 
> > > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > > retake locks and they need to match the locks of the outer context.
> > > > 
> > > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > > take, no? Any idea how you would like me to do this in case I'm not
> > > > wrong?
> > > 
> > > My first inclination has been to do it via MA_STATE() and the mas_flag
> > > value but I'm open to any other ideas.
> > 
> > Braino on my part as free_pid() can be called with write_lock_irq() held.
> 
> I don't think I can use the maple tree because even an mas_erase()
> operation may allocate memory and that just makes it rather unpleasant
> to use in e.g., free_pid(). So I think I'm going to explore using the
> xarray to get the benefits of ULONG_MAX indices and I see that btrfs is
> using it already for similar purposes.

Hm, __xa_alloc_cyclic() doesn't support ULONG_MAX indices. So any ideas
how I can proceed? Can I use the maple tree to wipe an entry at a given
index and have the guarantee it won't have to allocate memory?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-13 21:07               ` Christian Brauner
@ 2024-12-13 21:16                 ` Liam R. Howlett
  0 siblings, 0 replies; 19+ messages in thread
From: Liam R. Howlett @ 2024-12-13 21:16 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Matthew Wilcox, linux-fsdevel, maple-tree

* Christian Brauner <brauner@kernel.org> [241213 16:07]:
> On Fri, Dec 13, 2024 at 09:50:56PM +0100, Christian Brauner wrote:
> > On Fri, Dec 13, 2024 at 09:11:04PM +0100, Christian Brauner wrote:
> > > On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > > > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > > > I've replaced the macro with always inline functions that select the
> > > > > > > lock based on the flag:
> > > > > > > 
> > > > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > > > {
> > > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > > > >         else
> > > > > > >                 spin_lock(&mt->ma_lock);
> > > > > > > }
> > > > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > > > {
> > > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > > > >         else
> > > > > > >                 spin_unlock(&mt->ma_lock);
> > > > > > > }
> > > > > > > 
> > > > > > > Does that work for you?
> > > > > > 
> > > > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > > > close as possible.
> > > > > > 
> > > > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > > > as appropriate.
> > > > > 
> > > > > Say I need:
> > > > > 
> > > > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > > > mas_erase(...);
> > > > > -> mas_nomem()
> > > > >    -> mtree_unlock() // uses spin_unlock();
> > > > >       // allocate
> > > > >    -> mtree_lock() // uses spin_lock();
> > > > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > > > 
> > > > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > > > retake locks and they need to match the locks of the outer context.
> > > > > 
> > > > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > > > take, no? Any idea how you would like me to do this in case I'm not
> > > > > wrong?
> > > > 
> > > > My first inclination has been to do it via MA_STATE() and the mas_flag
> > > > value but I'm open to any other ideas.
> > > 
> > > Braino on my part as free_pid() can be called with write_lock_irq() held.
> > 
> > I don't think I can use the maple tree because even an mas_erase()
> > operation may allocate memory and that just makes it rather unpleasant
> > to use in e.g., free_pid(). So I think I'm going to explore using the
> > xarray to get the benefits of ULONG_MAX indices and I see that btrfs is
> > using it already for similar purposes.
> 
> Hm, __xa_alloc_cyclic() doesn't support ULONG_MAX indices. So any ideas
> how I can proceed? Can I use the maple tree to wipe an entry at a given
> index and have the guarantee it won't have to allocate memory?

There are two ways you can do this:
1. preallocate, then store in the locked area.
2. store XA_ZERO_ENTRY and translate that to NULL on reading.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-13 20:50             ` Christian Brauner
  2024-12-13 21:07               ` Christian Brauner
@ 2024-12-13 21:13               ` Liam R. Howlett
  1 sibling, 0 replies; 19+ messages in thread
From: Liam R. Howlett @ 2024-12-13 21:13 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Matthew Wilcox, linux-fsdevel, maple-tree

* Christian Brauner <brauner@kernel.org> [241213 15:51]:
> On Fri, Dec 13, 2024 at 09:11:04PM +0100, Christian Brauner wrote:
> > On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > > I've replaced the macro with always inline functions that select the
> > > > > > lock based on the flag:
> > > > > > 
> > > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > > {
> > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > > >         else
> > > > > >                 spin_lock(&mt->ma_lock);
> > > > > > }
> > > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > > {
> > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > > >         else
> > > > > >                 spin_unlock(&mt->ma_lock);
> > > > > > }
> > > > > > 
> > > > > > Does that work for you?
> > > > > 
> > > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > > close as possible.
> > > > > 
> > > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > > as appropriate.
> > > > 
> > > > Say I need:
> > > > 
> > > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > > mas_erase(...);
> > > > -> mas_nomem()
> > > >    -> mtree_unlock() // uses spin_unlock();
> > > >       // allocate
> > > >    -> mtree_lock() // uses spin_lock();
> > > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > > 
> > > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > > retake locks and they need to match the locks of the outer context.
> > > > 
> > > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > > take, no? Any idea how you would like me to do this in case I'm not
> > > > wrong?
> > > 
> > > My first inclination has been to do it via MA_STATE() and the mas_flag
> > > value but I'm open to any other ideas.
> > 
> > Braino on my part as free_pid() can be called with write_lock_irq() held.
> 
> I don't think I can use the maple tree because even an mas_erase()
> operation may allocate memory and that just makes it rather unpleasant
> to use in e.g., free_pid(). So I think I'm going to explore using the
> xarray to get the benefits of ULONG_MAX indices and I see that btrfs is
> using it already for similar purposes.

Can you point to the code that concerns you?  I'd like to understand the
problem better and see if there's a way around it.

By the way, I rarely use erase as that's for when people don't know the
ranges of the store.  I use a store (which overwrites) of a NULL to the
range.  This won't solve your allocation concerns.

We do have the preallocation support for a known range and value.

Thanks,
Liam


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-13 20:11           ` Christian Brauner
  2024-12-13 20:50             ` Christian Brauner
@ 2024-12-13 21:04             ` Liam R. Howlett
  2024-12-14 11:48               ` Christian Brauner
  1 sibling, 1 reply; 19+ messages in thread
From: Liam R. Howlett @ 2024-12-13 21:04 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Matthew Wilcox, linux-fsdevel, maple-tree

* Christian Brauner <brauner@kernel.org> [241213 15:11]:
> On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > I've replaced the macro with always inline functions that select the
> > > > > lock based on the flag:
> > > > > 
> > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > {
> > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > >         else
> > > > >                 spin_lock(&mt->ma_lock);
> > > > > }
> > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > {
> > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > >         else
> > > > >                 spin_unlock(&mt->ma_lock);
> > > > > }
> > > > > 
> > > > > Does that work for you?
> > > > 
> > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > close as possible.
> > > > 
> > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > as appropriate.
> > > 
> > > Say I need:
> > > 
> > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > mas_erase(...);
> > > -> mas_nomem()
> > >    -> mtree_unlock() // uses spin_unlock();
> > >       // allocate
> > >    -> mtree_lock() // uses spin_lock();
> > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > 
> > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > retake locks and they need to match the locks of the outer context.
> > > 
> > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > take, no? Any idea how you would like me to do this in case I'm not
> > > wrong?
> > 
> > My first inclination has been to do it via MA_STATE() and the mas_flag
> > value but I'm open to any other ideas.
> 
> Braino on my part as free_pid() can be called with write_lock_irq() held.

Instead of checking the flag inside mas_lock()/mas_unlock(), the flag is
checked in mas_nomem(), and the correct mas_lock_irq() pair would be
called there.  External callers would use the mas_lock_irq() pair
directly instead of checking the flag.

To keep the API as close as possible, we'd keep the mas_lock() the same
and add the mas_lock_irq() as well as mas_lock_type(mas, lock_type).
__xas_nomem() uses the (static) xas_lock_type() to lock/unlock for
internal translations.

Thanks,
Liam

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-13 21:04             ` Liam R. Howlett
@ 2024-12-14 11:48               ` Christian Brauner
  2024-12-17 17:41                 ` Liam R. Howlett
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2024-12-14 11:48 UTC (permalink / raw)
  To: Liam R. Howlett; +Cc: Matthew Wilcox, linux-fsdevel, maple-tree

On Fri, Dec 13, 2024 at 04:04:40PM -0500, Liam R. Howlett wrote:
> * Christian Brauner <brauner@kernel.org> [241213 15:11]:
> > On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > > I've replaced the macro with always inline functions that select the
> > > > > > lock based on the flag:
> > > > > > 
> > > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > > {
> > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > > >         else
> > > > > >                 spin_lock(&mt->ma_lock);
> > > > > > }
> > > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > > {
> > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > > >         else
> > > > > >                 spin_unlock(&mt->ma_lock);
> > > > > > }
> > > > > > 
> > > > > > Does that work for you?
> > > > > 
> > > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > > close as possible.
> > > > > 
> > > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > > as appropriate.
> > > > 
> > > > Say I need:
> > > > 
> > > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > > mas_erase(...);
> > > > -> mas_nomem()
> > > >    -> mtree_unlock() // uses spin_unlock();
> > > >       // allocate
> > > >    -> mtree_lock() // uses spin_lock();
> > > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > > 
> > > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > > retake locks and they need to match the locks of the outer context.
> > > > 
> > > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > > take, no? Any idea how you would like me to do this in case I'm not
> > > > wrong?
> > > 
> > > My first inclination has been to do it via MA_STATE() and the mas_flag
> > > value but I'm open to any other ideas.
> > 
> > Braino on my part as free_pid() can be called with write_lock_irq() held.
> 
> Instead of checking the flag inside mas_lock()/mas_unlock(), the flag is
> checked in mas_nomem(), and the correct mas_lock_irq() pair would be
> called there.  External callers would use the mas_lock_irq() pair
> directly instead of checking the flag.

I'm probably being dense but say I have two different locations with two
different locking requirements - as is the case with alloc_pid() and
free_pid(). alloc_pid() just uses spin_lock_irq() and spin_unlock_irq()
but free_pid() requires spin_lock_irqsave() and
spin_unlock_irqrestore(). If the whole mtree is marked with a flag that
tells the mtree_lock() to use spin_lock_irq() then it will use that also
in free_pid() where it should use spin_lock_irqsave().

So if I'm not completely off-track then we'd need a way to tell the
mtree to use different locks for different callsites. Or at least
override the locking requirements for specific calls by e.g., allowing a
flag to be specified in the MA_STATE() struct that's checke by e.g.,
mas_nomem().

> To keep the API as close as possible, we'd keep the mas_lock() the same
> and add the mas_lock_irq() as well as mas_lock_type(mas, lock_type).
> __xas_nomem() uses the (static) xas_lock_type() to lock/unlock for
> internal translations.
> 
> Thanks,
> Liam

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/2] pidfs: use maple tree
  2024-12-14 11:48               ` Christian Brauner
@ 2024-12-17 17:41                 ` Liam R. Howlett
  0 siblings, 0 replies; 19+ messages in thread
From: Liam R. Howlett @ 2024-12-17 17:41 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Matthew Wilcox, linux-fsdevel, maple-tree

* Christian Brauner <brauner@kernel.org> [241214 06:48]:
> On Fri, Dec 13, 2024 at 04:04:40PM -0500, Liam R. Howlett wrote:
> > * Christian Brauner <brauner@kernel.org> [241213 15:11]:
> > > On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > > > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > > > I've replaced the macro with always inline functions that select the
> > > > > > > lock based on the flag:
> > > > > > > 
> > > > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > > > {
> > > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > > > >         else
> > > > > > >                 spin_lock(&mt->ma_lock);
> > > > > > > }
> > > > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > > > {
> > > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > > > >         else
> > > > > > >                 spin_unlock(&mt->ma_lock);
> > > > > > > }
> > > > > > > 
> > > > > > > Does that work for you?
> > > > > > 
> > > > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > > > close as possible.
> > > > > > 
> > > > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > > > as appropriate.
> > > > > 
> > > > > Say I need:
> > > > > 
> > > > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > > > mas_erase(...);
> > > > > -> mas_nomem()
> > > > >    -> mtree_unlock() // uses spin_unlock();
> > > > >       // allocate
> > > > >    -> mtree_lock() // uses spin_lock();
> > > > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > > > 
> > > > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > > > retake locks and they need to match the locks of the outer context.
> > > > > 
> > > > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > > > take, no? Any idea how you would like me to do this in case I'm not
> > > > > wrong?
> > > > 
> > > > My first inclination has been to do it via MA_STATE() and the mas_flag
> > > > value but I'm open to any other ideas.
> > > 
> > > Braino on my part as free_pid() can be called with write_lock_irq() held.
> > 
> > Instead of checking the flag inside mas_lock()/mas_unlock(), the flag is
> > checked in mas_nomem(), and the correct mas_lock_irq() pair would be
> > called there.  External callers would use the mas_lock_irq() pair
> > directly instead of checking the flag.
> 
> I'm probably being dense but say I have two different locations with two
> different locking requirements - as is the case with alloc_pid() and
> free_pid(). alloc_pid() just uses spin_lock_irq() and spin_unlock_irq()
> but free_pid() requires spin_lock_irqsave() and
> spin_unlock_irqrestore(). If the whole mtree is marked with a flag that
> tells the mtree_lock() to use spin_lock_irq() then it will use that also
> in free_pid() where it should use spin_lock_irqsave().
> 
> So if I'm not completely off-track then we'd need a way to tell the
> mtree to use different locks for different callsites. Or at least
> override the locking requirements for specific calls by e.g., allowing a
> flag to be specified in the MA_STATE() struct that's checke by e.g.,
> mas_nomem().

The xarray seems to not support different lock types like this.

It seems to support IRQ, BH, and just the spinlock.

xa_lock_irqsave() exists and is used.  I think it enables/disables
without restoring then restores at the end.  Either that, or there
aren't writes during the irqsave/irqrestore locking sections that needs
to allocate..

You could preallocate for the free_pid() side and handle it there?

ie:

retry:
mas_set(mas, pid);
mas_lock_irqsave(mas, saved);
if (mas_preallocate(mas, NULL, gfp)) {
    mas_unlock_irqrestore(mas, saved);
    /* something about reclaim */
    goto retry;
}
mas_erase(mas); /* preallocated, nomem won't happen */
mas_unlock_irqrestores(mas, saved);



> 
> > To keep the API as close as possible, we'd keep the mas_lock() the same
> > and add the mas_lock_irq() as well as mas_lock_type(mas, lock_type).
> > __xas_nomem() uses the (static) xas_lock_type() to lock/unlock for
> > internal translations.
> > 
> > Thanks,
> > Liam

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-12-17 17:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-09 13:46 [PATCH RFC v2 0/2] pidfs: use maple tree Christian Brauner
2024-12-09 13:46 ` [PATCH RFC v2 1/2] maple_tree: make MT_FLAGS_LOCK_IRQ do something Christian Brauner
2024-12-09 13:46 ` [PATCH RFC v2 2/2] pidfs: use maple tree Christian Brauner
2024-12-13 10:35   ` Marek Szyprowski
2024-12-13 13:07     ` Christian Brauner
2024-12-13 14:16       ` Marek Szyprowski
2024-12-13 14:40 ` [PATCH RFC v2 0/2] " Liam R. Howlett
2024-12-13 18:51   ` Christian Brauner
2024-12-13 18:53     ` Matthew Wilcox
2024-12-13 19:01       ` Christian Brauner
2024-12-13 19:25         ` Christian Brauner
2024-12-13 20:11           ` Christian Brauner
2024-12-13 20:50             ` Christian Brauner
2024-12-13 21:07               ` Christian Brauner
2024-12-13 21:16                 ` Liam R. Howlett
2024-12-13 21:13               ` Liam R. Howlett
2024-12-13 21:04             ` Liam R. Howlett
2024-12-14 11:48               ` Christian Brauner
2024-12-17 17:41                 ` Liam R. Howlett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox