public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/14] futex: Add support task local hash maps.
@ 2024-12-15 23:00 Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 01/14] futex: Create helper function to initialize a hash slot Sebastian Andrzej Siewior
                   ` (13 more replies)
  0 siblings, 14 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long

Hi,

this is a follow up on
	https://lore.kernel.org/ZwVOMgBMxrw7BU9A@jlelli-thinkpadt14gen4.remote.csb

and adds support for task local futex_hash_bucket. It can be created via
prctl().

This version supports resize at runtime, auto resize while creating
threads. The upper limit is at 256 * num_possible_cpus() but I guess we
can lower that.

I posted performance numbers of "perf bench futex hash"
	https://lore.kernel.org/all/20241101110810.R3AnEqdu@linutronix.de/

I didn't do any new but the lkp bot reported an improvement
	https://lore.kernel.org/all/202412122315.1a745d81-lkp@intel.com/

While the performance of the 16 default bucket look worse than the 512
(after that the performance hardly changes while before that doubles) be
aware those are now task local (and not shared with others) and it seems
to be sufficient in general.
For the systems with 512CPUs and one db application we can have the
resize. So either the application needs to resize it or we offer auto
resize based on threads and CPUs. But be aware that workloads like
"xz huge_file.tar" will happily acquire all CPUs in the system and only
use a few locks in total and not very often. So it would probably
perform with two hash buckets as good as 512 in this scenario.

v4…v5: https://lore.kernel.org/all/20241203164335.1125381-1-bigeasy@linutronix.de/
  - Changed the the reference-tracking scheme: The reference is now
    dropped once the lock is dropped. The resize operation also requeues
    all users on the hash bucket from the old one to the new one.

v3…v4: https://lore.kernel.org/all/20241115172035.795842-1-bigeasy@linutronix.de/
  - Completed resize. Tested with wait/wake, lock_pi, requeue and
    requeue_pi.
  - Added auto resize during thread creation.
  - Fixed bucket initialisation of the global hash bucket resilting in a
    crash sometimes.

v2…v3 https://lore.kernel.org/all/20241028121921.1264150-1-bigeasy@linutronix.de/
  - The default auto size for auto creation is 16.
  - For the private hash jhash2 is used and only for the address.
  - My "perf bench futex hash" hacks have been added.
  - The structure moved from signal's struct to mm.
  - It is possible resize it at runtime.

v1…v2 https://lore.kernel.org/all/20241026224306.982896-1-bigeasy@linutronix.de/:
  - Moved to struct signal_struct and is used process wide.
  - Automaticly allocated once the first thread is created.

Sebastian


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 01/14] futex: Create helper function to initialize a hash slot.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 02/14] futex: Add basic infrastructure for local task local hash Sebastian Andrzej Siewior
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

Factor out the futex_hash_bucket initialisation into a helpr function.
The helper function will be used in a follow up patch implementing
process private hash buckets.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/futex/core.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index ebdd76b4ecbba..d1d3c7b358b23 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1124,6 +1124,13 @@ void futex_exit_release(struct task_struct *tsk)
 	futex_cleanup_end(tsk, FUTEX_STATE_DEAD);
 }
 
+static void futex_hash_bucket_init(struct futex_hash_bucket *fhb)
+{
+	atomic_set(&fhb->waiters, 0);
+	plist_head_init(&fhb->chain);
+	spin_lock_init(&fhb->lock);
+}
+
 static int __init futex_init(void)
 {
 	unsigned int futex_shift;
@@ -1141,11 +1148,8 @@ static int __init futex_init(void)
 					       futex_hashsize, futex_hashsize);
 	futex_hashsize = 1UL << futex_shift;
 
-	for (i = 0; i < futex_hashsize; i++) {
-		atomic_set(&futex_queues[i].waiters, 0);
-		plist_head_init(&futex_queues[i].chain);
-		spin_lock_init(&futex_queues[i].lock);
-	}
+	for (i = 0; i < futex_hashsize; i++)
+		futex_hash_bucket_init(&futex_queues[i]);
 
 	return 0;
 }
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 02/14] futex: Add basic infrastructure for local task local hash.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 01/14] futex: Create helper function to initialize a hash slot Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 03/14] futex: Allow automatic allocation of process wide futex hash Sebastian Andrzej Siewior
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

The futex hashmap is system wide and shared by random tasks. Each slot
is hashed based on its address and VMA. Due to randomized VMAs (and
memory allocations) the same logical lock (pointer) can end up in a
different hash bucket on each invocation of the application. This in
turn means that different applications may share a hash bucket on the
first invocation but not on the second an it is not always clear which
applications will be involved. This can result in high latency's to
acquire the futex_hash_bucket::lock especially if the lock owner is
limited to a CPU and not be effectively PI boosted.

Introduce a task local hash map. The hashmap can be allocated via
	prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, 0)

The `0' argument allocates a default number of 16 slots, a higher number
can be specified if desired. The current upper limit is 131072.
The allocated hashmap is used by all threads within a process.
A thread can check if the private map has been allocated via
	prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS);

Which return the current number of slots.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/futex.h      | 20 ++++++++
 include/linux/mm_types.h   |  5 ++
 include/uapi/linux/prctl.h |  5 ++
 kernel/fork.c              |  2 +
 kernel/futex/core.c        | 99 ++++++++++++++++++++++++++++++++++++--
 kernel/sys.c               |  4 ++
 6 files changed, 132 insertions(+), 3 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index b70df27d7e85c..943828db52234 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -77,6 +77,15 @@ void futex_exec_release(struct task_struct *tsk);
 
 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 	      u32 __user *uaddr2, u32 val2, u32 val3);
+int futex_hash_prctl(unsigned long arg2, unsigned long arg3);
+int futex_hash_allocate_default(void);
+void futex_hash_free(struct mm_struct *mm);
+
+static inline void futex_mm_init(struct mm_struct *mm)
+{
+	mm->futex_hash_bucket = NULL;
+}
+
 #else
 static inline void futex_init_task(struct task_struct *tsk) { }
 static inline void futex_exit_recursive(struct task_struct *tsk) { }
@@ -88,6 +97,17 @@ static inline long do_futex(u32 __user *uaddr, int op, u32 val,
 {
 	return -EINVAL;
 }
+static inline int futex_hash_prctl(unsigned long arg2, unsigned long arg3)
+{
+	return -EINVAL;
+}
+static inline int futex_hash_allocate_default(void)
+{
+	return 0;
+}
+static inline void futex_hash_free(struct mm_struct *mm) { }
+static inline void futex_mm_init(struct mm_struct *mm) { }
+
 #endif
 
 #endif
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 7361a8f3ab68e..2337a2e481fd0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -30,6 +30,7 @@
 #define INIT_PASID	0
 
 struct address_space;
+struct futex_hash_bucket;
 struct mem_cgroup;
 
 /*
@@ -902,6 +903,10 @@ struct mm_struct {
 		int mm_lock_seq;
 #endif
 
+#ifdef CONFIG_FUTEX
+		unsigned int			futex_hash_mask;
+		struct futex_hash_bucket	*futex_hash_bucket;
+#endif
 
 		unsigned long hiwater_rss; /* High-watermark of RSS usage */
 		unsigned long hiwater_vm;  /* High-water virtual memory usage */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 5c6080680cb27..55b843644c51a 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -353,4 +353,9 @@ struct prctl_mm_map {
  */
 #define PR_LOCK_SHADOW_STACK_STATUS      76
 
+/* FUTEX hash management */
+#define PR_FUTEX_HASH			77
+# define PR_FUTEX_HASH_SET_SLOTS	1
+# define PR_FUTEX_HASH_GET_SLOTS	2
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 1450b461d196a..cda8886f3a1d7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1284,6 +1284,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	RCU_INIT_POINTER(mm->exe_file, NULL);
 	mmu_notifier_subscriptions_init(mm);
 	init_tlb_flush_pending(mm);
+	futex_mm_init(mm);
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(CONFIG_SPLIT_PMD_PTLOCKS)
 	mm->pmd_huge_pte = NULL;
 #endif
@@ -1361,6 +1362,7 @@ static inline void __mmput(struct mm_struct *mm)
 	if (mm->binfmt)
 		module_put(mm->binfmt->module);
 	lru_gen_del_mm(mm);
+	futex_hash_free(mm);
 	mmdrop(mm);
 }
 
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index d1d3c7b358b23..b87bd27b73707 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -39,6 +39,7 @@
 #include <linux/memblock.h>
 #include <linux/fault-inject.h>
 #include <linux/slab.h>
+#include <linux/prctl.h>
 
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
@@ -107,18 +108,40 @@ late_initcall(fail_futex_debugfs);
 
 #endif /* CONFIG_FAIL_FUTEX */
 
+static inline bool futex_key_is_private(union futex_key *key)
+{
+	/*
+	 * Relies on get_futex_key() to set either bit for shared
+	 * futexes -- see comment with union futex_key.
+	 */
+	return !(key->both.offset & (FUT_OFF_INODE | FUT_OFF_MMSHARED));
+}
+
 /**
  * futex_hash - Return the hash bucket in the global hash
  * @key:	Pointer to the futex key for which the hash is calculated
  *
  * We hash on the keys returned from get_futex_key (see below) and return the
- * corresponding hash bucket in the global hash.
+ * corresponding hash bucket in the global hash. If the FUTEX is private and
+ * a local hash table is privated then this one is used.
  */
 struct futex_hash_bucket *futex_hash(union futex_key *key)
 {
-	u32 hash = jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4,
-			  key->both.offset);
+	struct futex_hash_bucket *fhb;
+	u32 hash;
 
+	fhb = current->mm->futex_hash_bucket;
+	if (fhb && futex_key_is_private(key)) {
+		u32 hash_mask = current->mm->futex_hash_mask;
+
+		hash = jhash2((u32 *)key,
+			      offsetof(typeof(*key), both.offset) / 4,
+			      key->both.offset);
+		return &fhb[hash & hash_mask];
+	}
+	hash = jhash2((u32 *)key,
+		      offsetof(typeof(*key), both.offset) / 4,
+		      key->both.offset);
 	return &futex_queues[hash & (futex_hashsize - 1)];
 }
 
@@ -1131,6 +1154,76 @@ static void futex_hash_bucket_init(struct futex_hash_bucket *fhb)
 	spin_lock_init(&fhb->lock);
 }
 
+void futex_hash_free(struct mm_struct *mm)
+{
+	kvfree(mm->futex_hash_bucket);
+}
+
+static int futex_hash_allocate(unsigned int hash_slots)
+{
+	struct futex_hash_bucket *fhb;
+	int i;
+
+	if (current->mm->futex_hash_bucket)
+		return -EALREADY;
+
+	if (!thread_group_leader(current))
+		return -EINVAL;
+
+	if (hash_slots == 0)
+		hash_slots = 16;
+	if (hash_slots < 2)
+		hash_slots = 2;
+	if (hash_slots > 131072)
+		hash_slots = 131072;
+	if (!is_power_of_2(hash_slots))
+		hash_slots = rounddown_pow_of_two(hash_slots);
+
+	fhb = kvmalloc_array(hash_slots, sizeof(struct futex_hash_bucket), GFP_KERNEL_ACCOUNT);
+	if (!fhb)
+		return -ENOMEM;
+
+	current->mm->futex_hash_mask = hash_slots - 1;
+
+	for (i = 0; i < hash_slots; i++)
+		futex_hash_bucket_init(&fhb[i]);
+
+	current->mm->futex_hash_bucket = fhb;
+	return 0;
+}
+
+int futex_hash_allocate_default(void)
+{
+	return futex_hash_allocate(0);
+}
+
+static int futex_hash_get_slots(void)
+{
+	if (current->mm->futex_hash_bucket)
+		return current->mm->futex_hash_mask + 1;
+	return 0;
+}
+
+int futex_hash_prctl(unsigned long arg2, unsigned long arg3)
+{
+	int ret;
+
+	switch (arg2) {
+	case PR_FUTEX_HASH_SET_SLOTS:
+		ret = futex_hash_allocate(arg3);
+		break;
+
+	case PR_FUTEX_HASH_GET_SLOTS:
+		ret = futex_hash_get_slots();
+		break;
+
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
 static int __init futex_init(void)
 {
 	unsigned int futex_shift;
diff --git a/kernel/sys.c b/kernel/sys.c
index c4c701c6f0b4d..d8081f1d07d11 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -52,6 +52,7 @@
 #include <linux/user_namespace.h>
 #include <linux/time_namespace.h>
 #include <linux/binfmts.h>
+#include <linux/futex.h>
 
 #include <linux/sched.h>
 #include <linux/sched/autogroup.h>
@@ -2809,6 +2810,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			return -EINVAL;
 		error = arch_lock_shadow_stack_status(me, arg2);
 		break;
+	case PR_FUTEX_HASH:
+		error = futex_hash_prctl(arg2, arg3);
+		break;
 	default:
 		error = -EINVAL;
 		break;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 03/14] futex: Allow automatic allocation of process wide futex hash.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 01/14] futex: Create helper function to initialize a hash slot Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 02/14] futex: Add basic infrastructure for local task local hash Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 04/14] futex: Hash only the address for private futexes Sebastian Andrzej Siewior
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

Allocate a default futex hash if a task forks its first thread.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/futex.h | 12 ++++++++++++
 kernel/fork.c         | 24 ++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 943828db52234..bad377c30de5e 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -86,6 +86,13 @@ static inline void futex_mm_init(struct mm_struct *mm)
 	mm->futex_hash_bucket = NULL;
 }
 
+static inline bool futex_hash_requires_allocation(void)
+{
+	if (current->mm->futex_hash_bucket)
+		return false;
+	return true;
+}
+
 #else
 static inline void futex_init_task(struct task_struct *tsk) { }
 static inline void futex_exit_recursive(struct task_struct *tsk) { }
@@ -108,6 +115,11 @@ static inline int futex_hash_allocate_default(void)
 static inline void futex_hash_free(struct mm_struct *mm) { }
 static inline void futex_mm_init(struct mm_struct *mm) { }
 
+static inline bool futex_hash_requires_allocation(void)
+{
+	return false;
+}
+
 #endif
 
 #endif
diff --git a/kernel/fork.c b/kernel/fork.c
index cda8886f3a1d7..e34bb2a107a9d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2130,6 +2130,15 @@ static void rv_task_fork(struct task_struct *p)
 #define rv_task_fork(p) do {} while (0)
 #endif
 
+static bool need_futex_hash_allocate_default(u64 clone_flags)
+{
+	if ((clone_flags & (CLONE_THREAD | CLONE_VM)) != (CLONE_THREAD | CLONE_VM))
+		return false;
+	if (!thread_group_empty(current))
+		return false;
+	return futex_hash_requires_allocation();
+}
+
 /*
  * This creates a new process as a copy of the old one,
  * but does not actually start it yet.
@@ -2507,6 +2516,21 @@ __latent_entropy struct task_struct *copy_process(
 	if (retval)
 		goto bad_fork_cancel_cgroup;
 
+	/*
+	 * Allocate a default futex hash for the user process once the first
+	 * thread spawns.
+	 */
+	if (need_futex_hash_allocate_default(clone_flags)) {
+		retval = futex_hash_allocate_default();
+		if (retval)
+			goto bad_fork_core_free;
+		/*
+		 * If we fail beyond this point we don't free the allocated
+		 * futex hash map. We assume that another thread will created
+		 * and makes use of it The hash map will be freed once the main
+		 * thread terminates.
+		 */
+	}
 	/*
 	 * From this point on we must avoid any synchronous user-space
 	 * communication until we take the tasklist-lock. In particular, we do
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 04/14] futex: Hash only the address for private futexes.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2024-12-15 23:00 ` [PATCH v5 03/14] futex: Allow automatic allocation of process wide futex hash Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 05/14] futex: Move private hashing into its own function Sebastian Andrzej Siewior
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

futex_hash() passes the whole futex_key to jhash2. The first two member
are passed as the first argument and the offset as the "initial value".

For private futexes, the mm-part is always the same and it is used only
within the process. By excluding the mm part from the hash, we reduce
the length passed to jhash2 from 4 (16 / 4) to 2 (8 / 2). This avoids
the __jhash_mix() part of jhash.

The resulting code is smaller and based on testing this variant performs
as good as the original or slightly better.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/futex/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index b87bd27b73707..583a0149d62c9 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -134,8 +134,8 @@ struct futex_hash_bucket *futex_hash(union futex_key *key)
 	if (fhb && futex_key_is_private(key)) {
 		u32 hash_mask = current->mm->futex_hash_mask;
 
-		hash = jhash2((u32 *)key,
-			      offsetof(typeof(*key), both.offset) / 4,
+		hash = jhash2((void *)&key->private.address,
+			      sizeof(key->private.address) / 4,
 			      key->both.offset);
 		return &fhb[hash & hash_mask];
 	}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 05/14] futex: Move private hashing into its own function.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2024-12-15 23:00 ` [PATCH v5 04/14] futex: Hash only the address for private futexes Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 06/14] futex: Add helper which include the put of a hb after end of operation Sebastian Andrzej Siewior
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

The hashing of the private is slightly different and will be needed
again while moving a futex_q entry to a different hash bucket after the
resize.

Move the private hashing into its own function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/futex/core.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 583a0149d62c9..907b76590df16 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -117,6 +117,18 @@ static inline bool futex_key_is_private(union futex_key *key)
 	return !(key->both.offset & (FUT_OFF_INODE | FUT_OFF_MMSHARED));
 }
 
+static struct futex_hash_bucket *futex_hash_private(union futex_key *key,
+						    struct futex_hash_bucket *fhb,
+						    u32 hash_mask)
+{
+	u32 hash;
+
+	hash = jhash2((void *)&key->private.address,
+		      sizeof(key->private.address) / 4,
+		      key->both.offset);
+	return &fhb[hash & hash_mask];
+}
+
 /**
  * futex_hash - Return the hash bucket in the global hash
  * @key:	Pointer to the futex key for which the hash is calculated
@@ -131,14 +143,9 @@ struct futex_hash_bucket *futex_hash(union futex_key *key)
 	u32 hash;
 
 	fhb = current->mm->futex_hash_bucket;
-	if (fhb && futex_key_is_private(key)) {
-		u32 hash_mask = current->mm->futex_hash_mask;
+	if (fhb && futex_key_is_private(key))
+		return futex_hash_private(key, fhb, current->mm->futex_hash_mask);
 
-		hash = jhash2((void *)&key->private.address,
-			      sizeof(key->private.address) / 4,
-			      key->both.offset);
-		return &fhb[hash & hash_mask];
-	}
 	hash = jhash2((u32 *)key,
 		      offsetof(typeof(*key), both.offset) / 4,
 		      key->both.offset);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 06/14] futex: Add helper which include the put of a hb after end of operation.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2024-12-15 23:00 ` [PATCH v5 05/14] futex: Move private hashing into its own function Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-16 18:48   ` Thomas Gleixner
  2024-12-15 23:00 ` [PATCH v5 07/14] futex: Move the retry_private label Sebastian Andrzej Siewior
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

With the planned schema of resize of hb, a reference count will be
obtained during futex_hash() and will be dropped after the hb is
unlocked. Once the reference is dropped, the hb must not be used because
it will disappear after a resize.
To prepare the integration, rename
- futex_hb_unlock() to futex_hb_unlock_put()
- futex_queue() to futex_queue_put()
- futex_q_unlock() to futex_q_unlock_put()
- double_unlock_hb() to double_unlock_hb_put()

which is additionally includes futex_hb_unlock_put(), an empty stub.
Introduce futex_hb_unlock_put() which is the unlock plus the reference
drop. Move futex_hb_waiters_dec() before the reference drop, if needed
before the unlock.
Update comments referring to the functions accordingly.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 io_uring/futex.c        |  2 +-
 kernel/futex/core.c     | 12 ++++++++----
 kernel/futex/futex.h    | 31 ++++++++++++++++++++-----------
 kernel/futex/pi.c       | 19 ++++++++++---------
 kernel/futex/requeue.c  | 15 ++++++++-------
 kernel/futex/waitwake.c | 23 ++++++++++++-----------
 6 files changed, 59 insertions(+), 43 deletions(-)

diff --git a/io_uring/futex.c b/io_uring/futex.c
index e29662f039e1a..67246438da228 100644
--- a/io_uring/futex.c
+++ b/io_uring/futex.c
@@ -349,7 +349,7 @@ int io_futex_wait(struct io_kiocb *req, unsigned int issue_flags)
 		hlist_add_head(&req->hash_node, &ctx->futex_list);
 		io_ring_submit_unlock(ctx, issue_flags);
 
-		futex_queue(&ifd->q, hb);
+		futex_queue_put(&ifd->q, hb);
 		return IOU_ISSUE_SKIP_COMPLETE;
 	}
 
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 907b76590df16..3cfdd4c02f261 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -152,6 +152,9 @@ struct futex_hash_bucket *futex_hash(union futex_key *key)
 	return &futex_queues[hash & (futex_hashsize - 1)];
 }
 
+void futex_hash_put(struct futex_hash_bucket *hb)
+{
+}
 
 /**
  * futex_setup_timer - set up the sleeping hrtimer.
@@ -543,8 +546,8 @@ struct futex_hash_bucket *futex_q_lock(struct futex_q *q)
 	 * Increment the counter before taking the lock so that
 	 * a potential waker won't miss a to-be-slept task that is
 	 * waiting for the spinlock. This is safe as all futex_q_lock()
-	 * users end up calling futex_queue(). Similarly, for housekeeping,
-	 * decrement the counter at futex_q_unlock() when some error has
+	 * users end up calling futex_queue_put(). Similarly, for housekeeping,
+	 * decrement the counter at futex_q_unlock_put() when some error has
 	 * occurred and we don't end up adding the task to the list.
 	 */
 	futex_hb_waiters_inc(hb); /* implies smp_mb(); (A) */
@@ -555,11 +558,12 @@ struct futex_hash_bucket *futex_q_lock(struct futex_q *q)
 	return hb;
 }
 
-void futex_q_unlock(struct futex_hash_bucket *hb)
+void futex_q_unlock_put(struct futex_hash_bucket *hb)
 	__releases(&hb->lock)
 {
 	spin_unlock(&hb->lock);
 	futex_hb_waiters_dec(hb);
+	futex_hash_put(hb);
 }
 
 void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb)
@@ -586,7 +590,7 @@ void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb)
  * @q:	The futex_q to unqueue
  *
  * The q->lock_ptr must not be held by the caller. A call to futex_unqueue() must
- * be paired with exactly one earlier call to futex_queue().
+ * be paired with exactly one earlier call to futex_queue_put().
  *
  * Return:
  *  - 1 - if the futex_q was still queued (and we removed unqueued it);
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index 618ce1fe870e9..5793546a48ebf 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -202,6 +202,7 @@ futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout,
 		  int flags, u64 range_ns);
 
 extern struct futex_hash_bucket *futex_hash(union futex_key *key);
+extern void futex_hash_put(struct futex_hash_bucket *hb);
 
 /**
  * futex_match - Check whether two futex keys are equal
@@ -288,23 +289,29 @@ extern void __futex_unqueue(struct futex_q *q);
 extern void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb);
 extern int futex_unqueue(struct futex_q *q);
 
+static inline void futex_hb_unlock_put(struct futex_hash_bucket *hb)
+{
+	spin_unlock(&hb->lock);
+	futex_hash_put(hb);
+}
+
 /**
- * futex_queue() - Enqueue the futex_q on the futex_hash_bucket
+ * futex_queue_put() - Enqueue the futex_q on the futex_hash_bucket
  * @q:	The futex_q to enqueue
  * @hb:	The destination hash bucket
  *
- * The hb->lock must be held by the caller, and is released here. A call to
- * futex_queue() is typically paired with exactly one call to futex_unqueue().  The
- * exceptions involve the PI related operations, which may use futex_unqueue_pi()
- * or nothing if the unqueue is done as part of the wake process and the unqueue
- * state is implicit in the state of woken task (see futex_wait_requeue_pi() for
- * an example).
+ * The hb->lock must be held by the caller, and is released here and the reference
+ * on the hb is droppedV. A call to futex_queue_put() is typically paired with
+ * exactly one call to futex_unqueue(). The exceptions involve the PI related
+ * operations, which may use futex_unqueue_pi() or nothing if the unqueue is
+ * done as part of the wake process and the unqueue state is implicit in the
+ * state of woken task (see futex_wait_requeue_pi() for an example).
  */
-static inline void futex_queue(struct futex_q *q, struct futex_hash_bucket *hb)
+static inline void futex_queue_put(struct futex_q *q, struct futex_hash_bucket *hb)
 	__releases(&hb->lock)
 {
 	__futex_queue(q, hb);
-	spin_unlock(&hb->lock);
+	futex_hb_unlock_put(hb);
 }
 
 extern void futex_unqueue_pi(struct futex_q *q);
@@ -350,7 +357,7 @@ static inline int futex_hb_waiters_pending(struct futex_hash_bucket *hb)
 }
 
 extern struct futex_hash_bucket *futex_q_lock(struct futex_q *q);
-extern void futex_q_unlock(struct futex_hash_bucket *hb);
+extern void futex_q_unlock_put(struct futex_hash_bucket *hb);
 
 
 extern int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
@@ -380,11 +387,13 @@ double_lock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
 }
 
 static inline void
-double_unlock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
+double_unlock_hb_put(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
 {
 	spin_unlock(&hb1->lock);
 	if (hb1 != hb2)
 		spin_unlock(&hb2->lock);
+	futex_hash_put(hb1);
+	futex_hash_put(hb2);
 }
 
 /* syscalls */
diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c
index d62cca5ed8f4c..8561f94f21ed9 100644
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -217,9 +217,9 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
 	/*
 	 * We get here with hb->lock held, and having found a
 	 * futex_top_waiter(). This means that futex_lock_pi() of said futex_q
-	 * has dropped the hb->lock in between futex_queue() and futex_unqueue_pi(),
-	 * which in turn means that futex_lock_pi() still has a reference on
-	 * our pi_state.
+	 * has dropped the hb->lock in between futex_queue_put() and
+	 * futex_unqueue_pi(), which in turn means that futex_lock_pi() still
+	 * has a reference on our pi_state.
 	 *
 	 * The waiter holding a reference on @pi_state also protects against
 	 * the unlocked put_pi_state() in futex_unlock_pi(), futex_lock_pi()
@@ -963,7 +963,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 			 *   exit to complete.
 			 * - EAGAIN: The user space value changed.
 			 */
-			futex_q_unlock(hb);
+			futex_q_unlock_put(hb);
 			/*
 			 * Handle the case where the owner is in the middle of
 			 * exiting. Wait for the exit to complete otherwise
@@ -1086,7 +1086,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 	goto out;
 
 out_unlock_put_key:
-	futex_q_unlock(hb);
+	futex_q_unlock_put(hb);
 
 out:
 	if (to) {
@@ -1096,7 +1096,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 	return ret != -EINTR ? ret : -ERESTARTNOINTR;
 
 uaddr_faulted:
-	futex_q_unlock(hb);
+	futex_q_unlock_put(hb);
 
 	ret = fault_in_user_writeable(uaddr);
 	if (ret)
@@ -1196,7 +1196,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 		}
 
 		get_pi_state(pi_state);
-		spin_unlock(&hb->lock);
+		futex_hb_unlock_put(hb);
 
 		/* drops pi_state->pi_mutex.wait_lock */
 		ret = wake_futex_pi(uaddr, uval, pi_state, rt_waiter);
@@ -1235,7 +1235,8 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 	 * owner.
 	 */
 	if ((ret = futex_cmpxchg_value_locked(&curval, uaddr, uval, 0))) {
-		spin_unlock(&hb->lock);
+		futex_hb_unlock_put(hb);
+
 		switch (ret) {
 		case -EFAULT:
 			goto pi_faulted;
@@ -1255,7 +1256,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 	ret = (curval == uval) ? 0 : -EAGAIN;
 
 out_unlock:
-	spin_unlock(&hb->lock);
+	futex_hb_unlock_put(hb);
 	return ret;
 
 pi_retry:
diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c
index b47bb764b3520..80e99a498de28 100644
--- a/kernel/futex/requeue.c
+++ b/kernel/futex/requeue.c
@@ -58,7 +58,7 @@ enum {
 };
 
 const struct futex_q futex_q_init = {
-	/* list gets initialized in futex_queue()*/
+	/* list gets initialized in futex_queue_put()*/
 	.wake		= futex_wake_mark,
 	.key		= FUTEX_KEY_INIT,
 	.bitset		= FUTEX_BITSET_MATCH_ANY,
@@ -456,8 +456,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
 		ret = futex_get_value_locked(&curval, uaddr1);
 
 		if (unlikely(ret)) {
-			double_unlock_hb(hb1, hb2);
 			futex_hb_waiters_dec(hb2);
+			double_unlock_hb_put(hb1, hb2);
 
 			ret = get_user(curval, uaddr1);
 			if (ret)
@@ -542,8 +542,9 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
 		 * waiter::requeue_state is correct.
 		 */
 		case -EFAULT:
-			double_unlock_hb(hb1, hb2);
 			futex_hb_waiters_dec(hb2);
+			double_unlock_hb_put(hb1, hb2);
+
 			ret = fault_in_user_writeable(uaddr2);
 			if (!ret)
 				goto retry;
@@ -556,8 +557,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
 			 *   exit to complete.
 			 * - EAGAIN: The user space value changed.
 			 */
-			double_unlock_hb(hb1, hb2);
 			futex_hb_waiters_dec(hb2);
+			double_unlock_hb_put(hb1, hb2);
 			/*
 			 * Handle the case where the owner is in the middle of
 			 * exiting. Wait for the exit to complete otherwise
@@ -674,9 +675,9 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
 	put_pi_state(pi_state);
 
 out_unlock:
-	double_unlock_hb(hb1, hb2);
-	wake_up_q(&wake_q);
 	futex_hb_waiters_dec(hb2);
+	double_unlock_hb_put(hb1, hb2);
+	wake_up_q(&wake_q);
 	return ret ? ret : task_count;
 }
 
@@ -814,7 +815,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
 	 * shared futexes. We need to compare the keys:
 	 */
 	if (futex_match(&q.key, &key2)) {
-		futex_q_unlock(hb);
+		futex_q_unlock_put(hb);
 		ret = -EINVAL;
 		goto out;
 	}
diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c
index 3a10375d95218..fdb9fcaaf9fba 100644
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -195,7 +195,7 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 		}
 	}
 
-	spin_unlock(&hb->lock);
+	futex_hb_unlock_put(hb);
 	wake_up_q(&wake_q);
 	return ret;
 }
@@ -274,7 +274,7 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
 	double_lock_hb(hb1, hb2);
 	op_ret = futex_atomic_op_inuser(op, uaddr2);
 	if (unlikely(op_ret < 0)) {
-		double_unlock_hb(hb1, hb2);
+		double_unlock_hb_put(hb1, hb2);
 
 		if (!IS_ENABLED(CONFIG_MMU) ||
 		    unlikely(op_ret != -EFAULT && op_ret != -EAGAIN)) {
@@ -327,7 +327,7 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
 	}
 
 out_unlock:
-	double_unlock_hb(hb1, hb2);
+	double_unlock_hb_put(hb1, hb2);
 	wake_up_q(&wake_q);
 	return ret;
 }
@@ -335,7 +335,7 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
 static long futex_wait_restart(struct restart_block *restart);
 
 /**
- * futex_wait_queue() - futex_queue() and wait for wakeup, timeout, or signal
+ * futex_wait_queue() - futex_queue_put() and wait for wakeup, timeout, or signal
  * @hb:		the futex hash bucket, must be locked by the caller
  * @q:		the futex_q to queue up on
  * @timeout:	the prepared hrtimer_sleeper, or null for no timeout
@@ -346,11 +346,11 @@ void futex_wait_queue(struct futex_hash_bucket *hb, struct futex_q *q,
 	/*
 	 * The task state is guaranteed to be set before another task can
 	 * wake it. set_current_state() is implemented using smp_store_mb() and
-	 * futex_queue() calls spin_unlock() upon completion, both serializing
+	 * futex_queue_put() calls spin_unlock() upon completion, both serializing
 	 * access to the hash list and forcing another memory barrier.
 	 */
 	set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
-	futex_queue(q, hb);
+	futex_queue_put(q, hb);
 
 	/* Arm the timer */
 	if (timeout)
@@ -461,11 +461,12 @@ int futex_wait_multiple_setup(struct futex_vector *vs, int count, int *woken)
 			 * next futex. Queue each futex at this moment so hb can
 			 * be unlocked.
 			 */
-			futex_queue(q, hb);
+			futex_queue_put(q, hb);
 			continue;
 		}
 
-		futex_q_unlock(hb);
+		futex_q_unlock_put(hb);
+
 		__set_current_state(TASK_RUNNING);
 
 		/*
@@ -624,7 +625,7 @@ int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
 	ret = futex_get_value_locked(&uval, uaddr);
 
 	if (ret) {
-		futex_q_unlock(*hb);
+		futex_q_unlock_put(*hb);
 
 		ret = get_user(uval, uaddr);
 		if (ret)
@@ -637,7 +638,7 @@ int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
 	}
 
 	if (uval != val) {
-		futex_q_unlock(*hb);
+		futex_q_unlock_put(*hb);
 		ret = -EWOULDBLOCK;
 	}
 
@@ -665,7 +666,7 @@ int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
 	if (ret)
 		return ret;
 
-	/* futex_queue and wait for wakeup, timeout, or a signal. */
+	/* futex_queue_put and wait for wakeup, timeout, or a signal. */
 	futex_wait_queue(hb, &q, to);
 
 	/* If we were woken (and unqueued), we succeeded, whatever. */
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 07/14] futex: Move the retry_private label.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2024-12-15 23:00 ` [PATCH v5 06/14] futex: Add helper which include the put of a hb after end of operation Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-16 20:41   ` Thomas Gleixner
  2024-12-15 23:00 ` [PATCH v5 08/14] futex: Introduce futex_get_locked_hb() Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

The label futex_requeue in futex_requeue() and futex_wake_op() is jumped
after the lock is dropped in a retry operation. This assumes that the hb
does not need to be hashed again. If hb is resized then the hb can
change if the reference is dropped.

Move the retry_private label before the hashing operation.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/futex/requeue.c  | 2 +-
 kernel/futex/waitwake.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c
index 80e99a498de28..0395740ce5e71 100644
--- a/kernel/futex/requeue.c
+++ b/kernel/futex/requeue.c
@@ -443,10 +443,10 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
 	if (requeue_pi && futex_match(&key1, &key2))
 		return -EINVAL;
 
+retry_private:
 	hb1 = futex_hash(&key1);
 	hb2 = futex_hash(&key2);
 
-retry_private:
 	futex_hb_waiters_inc(hb2);
 	double_lock_hb(hb1, hb2);
 
diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c
index fdb9fcaaf9fba..ec73a6ea7462a 100644
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -267,10 +267,10 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
 	if (unlikely(ret != 0))
 		return ret;
 
+retry_private:
 	hb1 = futex_hash(&key1);
 	hb2 = futex_hash(&key2);
 
-retry_private:
 	double_lock_hb(hb1, hb2);
 	op_ret = futex_atomic_op_inuser(op, uaddr2);
 	if (unlikely(op_ret < 0)) {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 08/14] futex: Introduce futex_get_locked_hb().
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2024-12-15 23:00 ` [PATCH v5 07/14] futex: Move the retry_private label Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 09/14] futex: Allow to re-allocate the private hash bucket Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

futex_lock_pi() and __fixup_pi_state_owner() acquire the
futex_q::lock_ptr without holding a reference assuming the previously
obtained hb and the assigned lock_ptr are still valid. This isn't the
case once the hb can be resized and becomes invalid after the reference
drop.

Introduce futex_get_locked_hb() to lock the hb recorded in
futex_q::lock_ptr. The lock pointer is read in a RCU section to ensure
that it does not go away if the hb has been replaced and the old pointer
has been observed. After locking the pointer needs to be compared
to check if it changed. If so then the hb has been replaced and the user
has been moved to the new one and lock_ptr has been updated. The lock
operation needs to be redone in this case.
Once the lock_ptr is the same, we can return the futex_hash_bucket it
belongs to as the hb for the caller locked. This is important because we
don't own a reference so the hb is valid as long as we hold the lock.
This means if the hb is resized then this (old) hb remains valid as long
as we hold the lock because it all user need to be moved to the new
lock. So the task performing the resize will block.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/futex/core.c    | 27 +++++++++++++++++++++++++++
 kernel/futex/futex.h   |  2 +-
 kernel/futex/pi.c      |  9 +++++++--
 kernel/futex/requeue.c |  8 +++++---
 4 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 3cfdd4c02f261..6bccf48cdb049 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -639,6 +639,33 @@ int futex_unqueue(struct futex_q *q)
 	return ret;
 }
 
+struct futex_hash_bucket *futex_get_locked_hb(struct futex_q *q)
+{
+	struct futex_hash_bucket *hb;
+	spinlock_t *lock_ptr;
+
+	/*
+	 * See futex_unqueue() why lock_ptr can change.
+	 */
+	guard(rcu)();
+retry:
+	lock_ptr = READ_ONCE(q->lock_ptr);
+	spin_lock(lock_ptr);
+
+	if (unlikely(lock_ptr != q->lock_ptr)) {
+		spin_unlock(lock_ptr);
+		goto retry;
+	}
+
+	hb = container_of(lock_ptr, struct futex_hash_bucket, lock);
+	/*
+	 * We don't acquire a reference on the hb because we don't get it
+	 * if a resize is in progress and we got the old hb->lock before the
+	 * other task got it which meant to move us to the new hb.
+	 */
+	return hb;
+}
+
 /*
  * PI futexes can not be requeued and must remove themselves from the hash
  * bucket. The hash bucket lock (i.e. lock_ptr) is held.
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index 5793546a48ebf..143bf1523fa4a 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -196,7 +196,7 @@ enum futex_access {
 
 extern int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
 			 enum futex_access rw);
-
+extern struct futex_hash_bucket *futex_get_locked_hb(struct futex_q *q);
 extern struct hrtimer_sleeper *
 futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout,
 		  int flags, u64 range_ns);
diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c
index 8561f94f21ed9..506ba1ad8ff23 100644
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -806,7 +806,7 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
 		break;
 	}
 
-	spin_lock(q->lock_ptr);
+	futex_get_locked_hb(q);
 	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
 
 	/*
@@ -922,6 +922,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 	struct rt_mutex_waiter rt_waiter;
 	struct futex_hash_bucket *hb;
 	struct futex_q q = futex_q_init;
+	bool no_block_fp = false;
 	DEFINE_WAKE_Q(wake_q);
 	int res, ret;
 
@@ -988,6 +989,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 		ret = rt_mutex_futex_trylock(&q.pi_state->pi_mutex);
 		/* Fixup the trylock return value: */
 		ret = ret ? 0 : -EWOULDBLOCK;
+		no_block_fp = true;
 		goto no_block;
 	}
 
@@ -1024,6 +1026,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 	raw_spin_unlock_irq(&q.pi_state->pi_mutex.wait_lock);
 	wake_up_q(&wake_q);
 	preempt_enable();
+	futex_hash_put(hb);
 
 	if (ret) {
 		if (ret == 1)
@@ -1063,7 +1066,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 	 * spinlock/rtlock (which might enqueue its own rt_waiter) and fix up
 	 * the
 	 */
-	spin_lock(q.lock_ptr);
+	hb = futex_get_locked_hb(&q);
 	/*
 	 * Waiter is unqueued.
 	 */
@@ -1083,6 +1086,8 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 
 	futex_unqueue_pi(&q);
 	spin_unlock(q.lock_ptr);
+	if (no_block_fp)
+		futex_hash_put(hb);
 	goto out;
 
 out_unlock_put_key:
diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c
index 0395740ce5e71..1f3ac76ce1229 100644
--- a/kernel/futex/requeue.c
+++ b/kernel/futex/requeue.c
@@ -826,15 +826,17 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
 	switch (futex_requeue_pi_wakeup_sync(&q)) {
 	case Q_REQUEUE_PI_IGNORE:
 		/* The waiter is still on uaddr1 */
-		spin_lock(&hb->lock);
+		hb = futex_get_locked_hb(&q);
+
 		ret = handle_early_requeue_pi_wakeup(hb, &q, to);
 		spin_unlock(&hb->lock);
+
 		break;
 
 	case Q_REQUEUE_PI_LOCKED:
 		/* The requeue acquired the lock */
 		if (q.pi_state && (q.pi_state->owner != current)) {
-			spin_lock(q.lock_ptr);
+			futex_get_locked_hb(&q);
 			ret = fixup_pi_owner(uaddr2, &q, true);
 			/*
 			 * Drop the reference to the pi state which the
@@ -861,7 +863,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
 		if (ret && !rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter))
 			ret = 0;
 
-		spin_lock(q.lock_ptr);
+		futex_get_locked_hb(&q);
 		debug_rt_mutex_free_waiter(&rt_waiter);
 		/*
 		 * Fixup the pi_state owner and possibly acquire the lock if we
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 09/14] futex: Allow to re-allocate the private hash bucket.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
                   ` (7 preceding siblings ...)
  2024-12-15 23:00 ` [PATCH v5 08/14] futex: Introduce futex_get_locked_hb() Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-17 14:58   ` Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 10/14] futex: Resize futex hash table based on number of threads Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

The mm_struct::futex_hash_lock guards the futex_hash_bucket assignment/
replacement. The futex_hash_allocate()/ PR_FUTEX_HASH_SET_SLOTS
operation can now be invoked at runtime and resize an already existing
internal private futex_hash_bucket to another size.

The reallocation is based on an idea by Thomas Gleixner: The initial
allocation of struct futex_hash_bucket_private set the reference count
to one. Every user acquires a reference on the hb before using it and
drops it after it enqueued itself on the hash bucket. This means that
there is no reference held while the task is scheduled out while waiting
for the wake up.
The resize allocates a new futex_hash_bucket_private and drops the
initial reference under the mm_struct::futex_hash_lock. If the reference
drop results in destruction of the object then users currently queued on
the hash bucket(s) will be requeued on the new hash bucket. At the end
mm_struct::futex_hash_bucket is updated, the old pointer is RCU freed
and the mutex is dropped.
If the reference drop does not result in destruction of the object then
the new pointer is saved as mm_struct::futex_hash_new. In this case
replacement is delayed to the user that drops the last reference. All
new user, that fail to acquire a reference, block on
mm_struct::futex_hash_lock and attempt to perform the replacement.

This scheme keeps the requirement that during a lock/ unlock operation
all waiter block on the same futex_hash_bucket::lock.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/futex.h    |   3 +-
 include/linux/mm_types.h |   7 +-
 kernel/futex/core.c      | 243 +++++++++++++++++++++++++++++++++++----
 kernel/futex/futex.h     |   1 +
 kernel/futex/requeue.c   |   5 +
 kernel/futex/waitwake.c  |   4 +-
 6 files changed, 237 insertions(+), 26 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index bad377c30de5e..3ced01a9c5218 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -83,7 +83,8 @@ void futex_hash_free(struct mm_struct *mm);
 
 static inline void futex_mm_init(struct mm_struct *mm)
 {
-	mm->futex_hash_bucket = NULL;
+	rcu_assign_pointer(mm->futex_hash_bucket, NULL);
+	mutex_init(&mm->futex_hash_lock);
 }
 
 static inline bool futex_hash_requires_allocation(void)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 2337a2e481fd0..62fe872b381f8 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -30,7 +30,7 @@
 #define INIT_PASID	0
 
 struct address_space;
-struct futex_hash_bucket;
+struct futex_hash_bucket_private;
 struct mem_cgroup;
 
 /*
@@ -904,8 +904,9 @@ struct mm_struct {
 #endif
 
 #ifdef CONFIG_FUTEX
-		unsigned int			futex_hash_mask;
-		struct futex_hash_bucket	*futex_hash_bucket;
+		struct mutex				futex_hash_lock;
+		struct futex_hash_bucket_private	__rcu *futex_hash_bucket;
+		struct futex_hash_bucket_private	*futex_hash_new;
 #endif
 
 		unsigned long hiwater_rss; /* High-watermark of RSS usage */
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 6bccf48cdb049..f80ae39f2a83a 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -40,6 +40,7 @@
 #include <linux/fault-inject.h>
 #include <linux/slab.h>
 #include <linux/prctl.h>
+#include <linux/rcuref.h>
 
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
@@ -56,6 +57,12 @@ static struct {
 #define futex_queues   (__futex_data.queues)
 #define futex_hashsize (__futex_data.hashsize)
 
+struct futex_hash_bucket_private {
+	rcuref_t	users;
+	unsigned int	hash_mask;
+	struct rcu_head rcu;
+	struct futex_hash_bucket queues[];
+};
 
 /*
  * Fault injections for futexes.
@@ -129,6 +136,148 @@ static struct futex_hash_bucket *futex_hash_private(union futex_key *key,
 	return &fhb[hash & hash_mask];
 }
 
+static void futex_rehash_current_users(struct futex_hash_bucket_private *old,
+				       struct futex_hash_bucket_private *new)
+{
+	struct futex_hash_bucket *hb_old, *hb_new;
+	unsigned int slots = old->hash_mask + 1;
+	u32 hash_mask = new->hash_mask;
+	unsigned int i;
+
+	for (i = 0; i < slots; i++) {
+		struct futex_q *this, *tmp;
+
+		hb_old = &old->queues[i];
+
+		spin_lock(&hb_old->lock);
+		plist_for_each_entry_safe(this, tmp, &hb_old->chain, list) {
+
+			plist_del(&this->list, &hb_old->chain);
+			futex_hb_waiters_dec(hb_old);
+
+			WARN_ON_ONCE(this->lock_ptr != &hb_old->lock);
+
+			hb_new = futex_hash_private(&this->key, new->queues, hash_mask);
+			futex_hb_waiters_inc(hb_new);
+			/*
+			 * The new pointer isn't published yet but an already
+			 * moved user can unqueue itself so locking is needed.
+			 */
+			spin_lock_nested(&hb_new->lock, SINGLE_DEPTH_NESTING);
+			plist_add(&this->list, &hb_new->chain);
+			this->lock_ptr = &hb_new->lock;
+			spin_unlock(&hb_new->lock);
+		}
+		spin_unlock(&hb_old->lock);
+	}
+}
+
+struct futex_assign_cleanup {
+	struct futex_hash_bucket_private *rcu;
+	struct futex_hash_bucket_private *free;
+	struct futex_hash_bucket_private *free2;
+};
+
+static void __futex_assign_new_hb(struct futex_hash_bucket_private *hb_p_new,
+				  struct mm_struct *mm,
+				  struct futex_assign_cleanup *fu_cleanup)
+{
+	struct futex_hash_bucket_private *hb_p;
+	bool drop_last_ref = hb_p_new != NULL;
+
+	/*
+	 * If the supplied hb_p is NULL then we must have one in mm. We might
+	 * have both. The pointer with the larger amount of slots is
+	 * considered. If we are late, we have none and someone else did the
+	 * work while we blocked on the lock.
+	 */
+	if (mm->futex_hash_new) {
+		if (hb_p_new) {
+			if (mm->futex_hash_new->hash_mask <= hb_p_new->hash_mask) {
+				fu_cleanup->free = mm->futex_hash_new;
+			} else {
+				fu_cleanup->free = hb_p_new;
+				hb_p_new = mm->futex_hash_new;
+			}
+		} else {
+			hb_p_new = mm->futex_hash_new;
+		}
+		mm->futex_hash_new = NULL;
+	}
+
+	/* Someone was quicker, the current mask is valid */
+	if (!hb_p_new)
+		return;
+
+	hb_p = rcu_dereference_check(mm->futex_hash_bucket,
+				     lockdep_is_held(&mm->futex_hash_lock));
+	if (hb_p) {
+		if (hb_p->hash_mask >= hb_p_new->hash_mask) {
+			/* It was increased again while we were waiting */
+			fu_cleanup->free2 = hb_p_new;
+			return;
+		}
+
+		if (drop_last_ref && !rcuref_put(&hb_p->users)) {
+			/* We are not the last user, let the last one continue */
+			mm->futex_hash_new = hb_p_new;
+			return;
+		}
+
+		futex_rehash_current_users(hb_p, hb_p_new);
+		fu_cleanup->rcu = hb_p;
+	}
+	rcu_assign_pointer(mm->futex_hash_bucket, hb_p_new);
+}
+
+static void futex_assign_cleanup(struct futex_assign_cleanup *fu_cleanup)
+{
+	kvfree(fu_cleanup->free);
+	kvfree(fu_cleanup->free2);
+	kvfree_rcu(fu_cleanup->rcu, rcu);
+}
+
+static void futex_assign_new_hb(struct futex_hash_bucket_private *hb_p_new)
+{
+	struct futex_assign_cleanup fu_cleanup = {};
+	struct mm_struct *mm = current->mm;
+
+	scoped_guard(mutex, &mm->futex_hash_lock)
+		__futex_assign_new_hb(hb_p_new, mm, &fu_cleanup);
+	futex_assign_cleanup(&fu_cleanup);
+}
+
+static struct futex_hash_bucket_private *futex_get_private_hb(union futex_key *key)
+{
+	struct mm_struct *mm = current->mm;
+
+	if (!futex_key_is_private(key))
+		return NULL;
+	/*
+	 * Ideally we don't loop. If there is a replacement in progress
+	 * then a new HB is already prepared. We fail to obtain a
+	 * reference only after the last user returned its referefence.
+	 * In that case futex_assign_new_hb() blocks on futex_hash_bucket
+	 * and we either have to performon the replacement or wait
+	 * while someone else is doing the job. Eitherway, after we
+	 * return we can acquire a reference on the new hash bucket
+	 * (unless it is replaced again).
+	 */
+again:
+	scoped_guard(rcu) {
+		struct futex_hash_bucket_private *hb_p;
+
+		hb_p = rcu_dereference(mm->futex_hash_bucket);
+		if (!hb_p)
+			return NULL;
+
+		if (rcuref_get(&hb_p->users))
+			return hb_p;
+	}
+	futex_assign_new_hb(NULL);
+	goto again;
+}
+
 /**
  * futex_hash - Return the hash bucket in the global hash
  * @key:	Pointer to the futex key for which the hash is calculated
@@ -139,12 +288,12 @@ static struct futex_hash_bucket *futex_hash_private(union futex_key *key,
  */
 struct futex_hash_bucket *futex_hash(union futex_key *key)
 {
-	struct futex_hash_bucket *fhb;
+	struct futex_hash_bucket_private *hb_p = NULL;
 	u32 hash;
 
-	fhb = current->mm->futex_hash_bucket;
-	if (fhb && futex_key_is_private(key))
-		return futex_hash_private(key, fhb, current->mm->futex_hash_mask);
+	hb_p = futex_get_private_hb(key);
+	if (hb_p)
+		return futex_hash_private(key, hb_p->queues, hb_p->hash_mask);
 
 	hash = jhash2((u32 *)key,
 		      offsetof(typeof(*key), both.offset) / 4,
@@ -154,6 +303,17 @@ struct futex_hash_bucket *futex_hash(union futex_key *key)
 
 void futex_hash_put(struct futex_hash_bucket *hb)
 {
+	struct futex_hash_bucket_private *hb_p;
+
+	if (hb->hb_slot == 0)
+		return;
+	hb_p = container_of(hb, struct futex_hash_bucket_private,
+			    queues[hb->hb_slot - 1]);
+
+	if (!rcuref_put(&hb_p->users))
+		return;
+
+	futex_assign_new_hb(NULL);
 }
 
 /**
@@ -601,6 +761,8 @@ int futex_unqueue(struct futex_q *q)
 	spinlock_t *lock_ptr;
 	int ret = 0;
 
+	/* RCU so lock_ptr is not going away during locking. */
+	guard(rcu)();
 	/* In the common case we don't take the spinlock, which is nice. */
 retry:
 	/*
@@ -1008,10 +1170,27 @@ static void compat_exit_robust_list(struct task_struct *curr)
 static void exit_pi_state_list(struct task_struct *curr)
 {
 	struct list_head *next, *head = &curr->pi_state_list;
+	struct futex_hash_bucket_private *hb_p;
 	struct futex_pi_state *pi_state;
 	struct futex_hash_bucket *hb;
 	union futex_key key = FUTEX_KEY_INIT;
 
+	/*
+	 * Lock the futex_hash_bucket to ensure that the hb remains unchanged.
+	 * This is important so we can invoke futex_hash() under the pi_lock.
+	 */
+	guard(mutex)(&curr->mm->futex_hash_lock);
+	hb_p = rcu_dereference_check(curr->mm->futex_hash_bucket,
+				     lockdep_is_held(&curr->mm->futex_hash_lock));
+	if (hb_p) {
+		if (rcuref_read(&hb_p->users) == 0) {
+			struct futex_assign_cleanup fu_cleanup = {};
+
+			__futex_assign_new_hb(NULL, curr->mm, &fu_cleanup);
+			futex_assign_cleanup(&fu_cleanup);
+		}
+	}
+
 	/*
 	 * We are a ZOMBIE and nobody can enqueue itself on
 	 * pi_state_list anymore, but we have to be careful
@@ -1037,6 +1216,7 @@ static void exit_pi_state_list(struct task_struct *curr)
 		if (!refcount_inc_not_zero(&pi_state->refcount)) {
 			raw_spin_unlock_irq(&curr->pi_lock);
 			cpu_relax();
+			futex_hash_put(hb);
 			raw_spin_lock_irq(&curr->pi_lock);
 			continue;
 		}
@@ -1053,6 +1233,7 @@ static void exit_pi_state_list(struct task_struct *curr)
 			/* retain curr->pi_lock for the loop invariant */
 			raw_spin_unlock(&pi_state->pi_mutex.wait_lock);
 			spin_unlock(&hb->lock);
+			futex_hash_put(hb);
 			put_pi_state(pi_state);
 			continue;
 		}
@@ -1065,6 +1246,7 @@ static void exit_pi_state_list(struct task_struct *curr)
 		raw_spin_unlock(&curr->pi_lock);
 		raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
 		spin_unlock(&hb->lock);
+		futex_hash_put(hb);
 
 		rt_mutex_futex_unlock(&pi_state->pi_mutex);
 		put_pi_state(pi_state);
@@ -1185,8 +1367,9 @@ void futex_exit_release(struct task_struct *tsk)
 	futex_cleanup_end(tsk, FUTEX_STATE_DEAD);
 }
 
-static void futex_hash_bucket_init(struct futex_hash_bucket *fhb)
+static void futex_hash_bucket_init(struct futex_hash_bucket *fhb, unsigned int slot)
 {
+	fhb->hb_slot = slot;
 	atomic_set(&fhb->waiters, 0);
 	plist_head_init(&fhb->chain);
 	spin_lock_init(&fhb->lock);
@@ -1194,20 +1377,25 @@ static void futex_hash_bucket_init(struct futex_hash_bucket *fhb)
 
 void futex_hash_free(struct mm_struct *mm)
 {
-	kvfree(mm->futex_hash_bucket);
+	struct futex_hash_bucket_private *hb_p;
+
+	/* We are the last one and we hold the initial reference */
+	hb_p = rcu_dereference_check(mm->futex_hash_bucket, true);
+	if (!hb_p)
+		return;
+
+	if (WARN_ON(!rcuref_put(&hb_p->users)))
+		return;
+
+	kvfree(hb_p);
 }
 
 static int futex_hash_allocate(unsigned int hash_slots)
 {
-	struct futex_hash_bucket *fhb;
+	struct futex_hash_bucket_private *hb_p;
+	size_t alloc_size;
 	int i;
 
-	if (current->mm->futex_hash_bucket)
-		return -EALREADY;
-
-	if (!thread_group_leader(current))
-		return -EINVAL;
-
 	if (hash_slots == 0)
 		hash_slots = 16;
 	if (hash_slots < 2)
@@ -1217,16 +1405,25 @@ static int futex_hash_allocate(unsigned int hash_slots)
 	if (!is_power_of_2(hash_slots))
 		hash_slots = rounddown_pow_of_two(hash_slots);
 
-	fhb = kvmalloc_array(hash_slots, sizeof(struct futex_hash_bucket), GFP_KERNEL_ACCOUNT);
-	if (!fhb)
+	if (unlikely(check_mul_overflow(hash_slots, sizeof(struct futex_hash_bucket),
+					&alloc_size)))
 		return -ENOMEM;
 
-	current->mm->futex_hash_mask = hash_slots - 1;
+	if (unlikely(check_add_overflow(alloc_size, sizeof(struct futex_hash_bucket_private),
+					&alloc_size)))
+		return -ENOMEM;
+
+	hb_p = kvmalloc(alloc_size, GFP_KERNEL_ACCOUNT);
+	if (!hb_p)
+		return -ENOMEM;
+
+	rcuref_init(&hb_p->users, 1);
+	hb_p->hash_mask = hash_slots - 1;
 
 	for (i = 0; i < hash_slots; i++)
-		futex_hash_bucket_init(&fhb[i]);
+		futex_hash_bucket_init(&hb_p->queues[i], i + 1);
 
-	current->mm->futex_hash_bucket = fhb;
+	futex_assign_new_hb(hb_p);
 	return 0;
 }
 
@@ -1237,8 +1434,12 @@ int futex_hash_allocate_default(void)
 
 static int futex_hash_get_slots(void)
 {
-	if (current->mm->futex_hash_bucket)
-		return current->mm->futex_hash_mask + 1;
+	struct futex_hash_bucket_private *hb_p;
+
+	guard(rcu)();
+	hb_p = rcu_dereference(current->mm->futex_hash_bucket);
+	if (hb_p)
+		return hb_p->hash_mask + 1;
 	return 0;
 }
 
@@ -1280,7 +1481,7 @@ static int __init futex_init(void)
 	futex_hashsize = 1UL << futex_shift;
 
 	for (i = 0; i < futex_hashsize; i++)
-		futex_hash_bucket_init(&futex_queues[i]);
+		futex_hash_bucket_init(&futex_queues[i], 0);
 
 	return 0;
 }
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index 143bf1523fa4a..7de1117c2eab0 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -115,6 +115,7 @@ static inline bool should_fail_futex(bool fshared)
  */
 struct futex_hash_bucket {
 	atomic_t waiters;
+	unsigned int hb_slot;
 	spinlock_t lock;
 	struct plist_head chain;
 } ____cacheline_aligned_in_smp;
diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c
index 1f3ac76ce1229..684f4eff20854 100644
--- a/kernel/futex/requeue.c
+++ b/kernel/futex/requeue.c
@@ -87,6 +87,11 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
 		futex_hb_waiters_inc(hb2);
 		plist_add(&q->list, &hb2->chain);
 		q->lock_ptr = &hb2->lock;
+		/*
+		 * hb1 and hb2 belong to the same futex_hash_bucket_private
+		 * because if we managed get a reference on hb1 then it can't be
+		 * replaced. Therefore we avoid put(hb1)+get(hb2) here.
+		 */
 	}
 	q->key = *key2;
 }
diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c
index ec73a6ea7462a..4dc71ff8911fd 100644
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -173,8 +173,10 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 	hb = futex_hash(&key);
 
 	/* Make sure we really have tasks to wakeup */
-	if (!futex_hb_waiters_pending(hb))
+	if (!futex_hb_waiters_pending(hb)) {
+		futex_hash_put(hb);
 		return ret;
+	}
 
 	spin_lock(&hb->lock);
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 10/14] futex: Resize futex hash table based on number of threads.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
                   ` (8 preceding siblings ...)
  2024-12-15 23:00 ` [PATCH v5 09/14] futex: Allow to re-allocate the private hash bucket Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 11/14] futex: Use a hashmask instead of hashsize Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

Automatically size hash bucket based on the number of threads. The logic
tries to allocate between 16 and futex_hashsize (the default for the
system wide hash bucket) and uses 4 * number-of-threads.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/futex.h | 12 ------------
 kernel/fork.c         |  4 +---
 kernel/futex/core.c   | 28 +++++++++++++++++++++++++---
 3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 3ced01a9c5218..403b54526a081 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -87,13 +87,6 @@ static inline void futex_mm_init(struct mm_struct *mm)
 	mutex_init(&mm->futex_hash_lock);
 }
 
-static inline bool futex_hash_requires_allocation(void)
-{
-	if (current->mm->futex_hash_bucket)
-		return false;
-	return true;
-}
-
 #else
 static inline void futex_init_task(struct task_struct *tsk) { }
 static inline void futex_exit_recursive(struct task_struct *tsk) { }
@@ -116,11 +109,6 @@ static inline int futex_hash_allocate_default(void)
 static inline void futex_hash_free(struct mm_struct *mm) { }
 static inline void futex_mm_init(struct mm_struct *mm) { }
 
-static inline bool futex_hash_requires_allocation(void)
-{
-	return false;
-}
-
 #endif
 
 #endif
diff --git a/kernel/fork.c b/kernel/fork.c
index e34bb2a107a9d..35ec9958707c5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2134,9 +2134,7 @@ static bool need_futex_hash_allocate_default(u64 clone_flags)
 {
 	if ((clone_flags & (CLONE_THREAD | CLONE_VM)) != (CLONE_THREAD | CLONE_VM))
 		return false;
-	if (!thread_group_empty(current))
-		return false;
-	return futex_hash_requires_allocation();
+	return true;
 }
 
 /*
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index f80ae39f2a83a..15e319239c282 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -64,6 +64,8 @@ struct futex_hash_bucket_private {
 	struct futex_hash_bucket queues[];
 };
 
+static unsigned int futex_default_max_buckets;
+
 /*
  * Fault injections for futexes.
  */
@@ -1400,8 +1402,8 @@ static int futex_hash_allocate(unsigned int hash_slots)
 		hash_slots = 16;
 	if (hash_slots < 2)
 		hash_slots = 2;
-	if (hash_slots > 131072)
-		hash_slots = 131072;
+	if (hash_slots > futex_default_max_buckets)
+		hash_slots = futex_default_max_buckets;
 	if (!is_power_of_2(hash_slots))
 		hash_slots = rounddown_pow_of_two(hash_slots);
 
@@ -1429,7 +1431,26 @@ static int futex_hash_allocate(unsigned int hash_slots)
 
 int futex_hash_allocate_default(void)
 {
-	return futex_hash_allocate(0);
+	unsigned int threads, buckets, current_buckets = 0;
+	struct futex_hash_bucket_private *hb_p;
+
+	if (!current->mm)
+		return 0;
+
+	scoped_guard(rcu) {
+		threads = get_nr_threads(current);
+		hb_p = rcu_dereference(current->mm->futex_hash_bucket);
+		if (hb_p)
+			current_buckets = hb_p->hash_mask + 1;
+	}
+
+	buckets = roundup_pow_of_two(4 * threads);
+	buckets = max(buckets, 16);
+	buckets = min(buckets, futex_default_max_buckets);
+	if (current_buckets >= buckets)
+		return 0;
+
+	return futex_hash_allocate(buckets);
 }
 
 static int futex_hash_get_slots(void)
@@ -1473,6 +1494,7 @@ static int __init futex_init(void)
 #else
 	futex_hashsize = roundup_pow_of_two(256 * num_possible_cpus());
 #endif
+	futex_default_max_buckets = futex_hashsize;
 
 	futex_queues = alloc_large_system_hash("futex", sizeof(*futex_queues),
 					       futex_hashsize, 0, 0,
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 11/14] futex: Use a hashmask instead of hashsize.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
                   ` (9 preceding siblings ...)
  2024-12-15 23:00 ` [PATCH v5 10/14] futex: Resize futex hash table based on number of threads Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 12/14] tools/perf: Add the prctl(PR_FUTEX_HASH,…) to futex-hash Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

The global hash uses futex_hashsize to save the amount of the hash
buckets that have been allocated during system boot. On each
futex_hash() invocation this number is substracted by one to get the
mask. This can be optimized by saving directly the mask avoiding the
substraction on each futex_hash() invocation.

Rename futex_hashsize to futex_hashmask and save the mask of the
allocated hash map.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/futex/core.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 15e319239c282..b237154d67df0 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -52,10 +52,10 @@
  */
 static struct {
 	struct futex_hash_bucket *queues;
-	unsigned long            hashsize;
+	unsigned long            hashmask;
 } __futex_data __read_mostly __aligned(2*sizeof(long));
 #define futex_queues   (__futex_data.queues)
-#define futex_hashsize (__futex_data.hashsize)
+#define futex_hashmask (__futex_data.hashmask)
 
 struct futex_hash_bucket_private {
 	rcuref_t	users;
@@ -300,7 +300,7 @@ struct futex_hash_bucket *futex_hash(union futex_key *key)
 	hash = jhash2((u32 *)key,
 		      offsetof(typeof(*key), both.offset) / 4,
 		      key->both.offset);
-	return &futex_queues[hash & (futex_hashsize - 1)];
+	return &futex_queues[hash & futex_hashmask];
 }
 
 void futex_hash_put(struct futex_hash_bucket *hb)
@@ -1486,25 +1486,25 @@ int futex_hash_prctl(unsigned long arg2, unsigned long arg3)
 
 static int __init futex_init(void)
 {
+	unsigned long i, hashsize;
 	unsigned int futex_shift;
-	unsigned long i;
 
 #ifdef CONFIG_BASE_SMALL
-	futex_hashsize = 16;
+	hashsize = 16;
 #else
-	futex_hashsize = roundup_pow_of_two(256 * num_possible_cpus());
+	hashsize = roundup_pow_of_two(256 * num_possible_cpus());
 #endif
-	futex_default_max_buckets = futex_hashsize;
+	futex_default_max_buckets = hashsize;
 
 	futex_queues = alloc_large_system_hash("futex", sizeof(*futex_queues),
-					       futex_hashsize, 0, 0,
+					       hashsize, 0, 0,
 					       &futex_shift, NULL,
-					       futex_hashsize, futex_hashsize);
-	futex_hashsize = 1UL << futex_shift;
+					       hashsize, hashsize);
+	hashsize = 1UL << futex_shift;
 
-	for (i = 0; i < futex_hashsize; i++)
+	for (i = 0; i < hashsize; i++)
 		futex_hash_bucket_init(&futex_queues[i], 0);
-
+	futex_hashmask = hashsize - 1;
 	return 0;
 }
 core_initcall(futex_init);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 12/14] tools/perf: Add the prctl(PR_FUTEX_HASH,…) to futex-hash.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
                   ` (10 preceding siblings ...)
  2024-12-15 23:00 ` [PATCH v5 11/14] futex: Use a hashmask instead of hashsize Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 13/14] tools/perf: The the current affinity for CPU pinning in futex-hash Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 14/14] tools/perf: Allocate futex locks on the local CPU-node Sebastian Andrzej Siewior
  13 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

Wire up PR_FUTEX_HASH to futex-hash. Use the `-b' argument to specify
the number of buckets. Read it back and show during invocation.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 tools/perf/bench/futex-hash.c | 19 +++++++++++++++++--
 tools/perf/bench/futex.h      |  1 +
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index b472eded521b1..e24e987ae213e 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -22,6 +22,7 @@
 #include <sys/time.h>
 #include <sys/mman.h>
 #include <perf/cpumap.h>
+#include <sys/prctl.h>
 
 #include "../util/mutex.h"
 #include "../util/stat.h"
@@ -53,6 +54,7 @@ static struct bench_futex_parameters params = {
 };
 
 static const struct option options[] = {
+	OPT_UINTEGER('b', "buckets", &params.nbuckets, "Task local futex buckets to allocate"),
 	OPT_UINTEGER('t', "threads", &params.nthreads, "Specify amount of threads"),
 	OPT_UINTEGER('r', "runtime", &params.runtime, "Specify runtime (in seconds)"),
 	OPT_UINTEGER('f', "futexes", &params.nfutexes, "Specify amount of futexes per threads"),
@@ -120,6 +122,10 @@ static void print_summary(void)
 	       (int)bench__runtime.tv_sec);
 }
 
+#define PR_FUTEX_HASH			77
+# define PR_FUTEX_HASH_SET_SLOTS	1
+# define PR_FUTEX_HASH_GET_SLOTS	2
+
 int bench_futex_hash(int argc, const char **argv)
 {
 	int ret = 0;
@@ -131,6 +137,7 @@ int bench_futex_hash(int argc, const char **argv)
 	struct perf_cpu_map *cpu;
 	int nrcpus;
 	size_t size;
+	int num_buckets;
 
 	argc = parse_options(argc, argv, options, bench_futex_hash_usage, 0);
 	if (argc) {
@@ -147,6 +154,14 @@ int bench_futex_hash(int argc, const char **argv)
 	act.sa_sigaction = toggle_done;
 	sigaction(SIGINT, &act, NULL);
 
+	ret = prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, params.nbuckets);
+	if (ret) {
+		printf("Allocation of %u hash buckets failed: %d/%m\n",
+		       params.nbuckets, ret);
+		goto errmem;
+	}
+	num_buckets = prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS);
+
 	if (params.mlockall) {
 		if (mlockall(MCL_CURRENT | MCL_FUTURE))
 			err(EXIT_FAILURE, "mlockall");
@@ -162,8 +177,8 @@ int bench_futex_hash(int argc, const char **argv)
 	if (!params.fshared)
 		futex_flag = FUTEX_PRIVATE_FLAG;
 
-	printf("Run summary [PID %d]: %d threads, each operating on %d [%s] futexes for %d secs.\n\n",
-	       getpid(), params.nthreads, params.nfutexes, params.fshared ? "shared":"private", params.runtime);
+	printf("Run summary [PID %d]: %d threads, hash slots: %d each operating on %d [%s] futexes for %d secs.\n\n",
+	       getpid(), params.nthreads, num_buckets, params.nfutexes, params.fshared ? "shared":"private", params.runtime);
 
 	init_stats(&throughput_stats);
 	mutex_init(&thread_lock);
diff --git a/tools/perf/bench/futex.h b/tools/perf/bench/futex.h
index ebdc2b032afc1..abc353c63a9a4 100644
--- a/tools/perf/bench/futex.h
+++ b/tools/perf/bench/futex.h
@@ -20,6 +20,7 @@ struct bench_futex_parameters {
 	bool multi; /* lock-pi */
 	bool pi; /* requeue-pi */
 	bool broadcast; /* requeue */
+	unsigned int nbuckets;
 	unsigned int runtime; /* seconds*/
 	unsigned int nthreads;
 	unsigned int nfutexes;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 13/14] tools/perf: The the current affinity for CPU pinning in futex-hash.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
                   ` (11 preceding siblings ...)
  2024-12-15 23:00 ` [PATCH v5 12/14] tools/perf: Add the prctl(PR_FUTEX_HASH,…) to futex-hash Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  2024-12-15 23:00 ` [PATCH v5 14/14] tools/perf: Allocate futex locks on the local CPU-node Sebastian Andrzej Siewior
  13 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

In order to simplify NUMA local testing, let futex-hash use the current
affinity mask and pin the individual threads based on that mask.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 tools/perf/bench/futex-hash.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index e24e987ae213e..216b0d1301ffc 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -126,10 +126,24 @@ static void print_summary(void)
 # define PR_FUTEX_HASH_SET_SLOTS	1
 # define PR_FUTEX_HASH_GET_SLOTS	2
 
+static unsigned int get_cpu_bit(cpu_set_t *set, size_t set_size, unsigned int r_cpu)
+{
+	unsigned int cpu = 0;
+
+	do {
+		if (CPU_ISSET_S(cpu, set_size, set)) {
+			if (!r_cpu)
+				return cpu;
+			r_cpu--;
+		}
+		cpu++;
+	} while (1);
+}
+
 int bench_futex_hash(int argc, const char **argv)
 {
 	int ret = 0;
-	cpu_set_t *cpuset;
+	cpu_set_t *cpuset, cpuset_;
 	struct sigaction act;
 	unsigned int i;
 	pthread_attr_t thread_attr;
@@ -167,8 +181,12 @@ int bench_futex_hash(int argc, const char **argv)
 			err(EXIT_FAILURE, "mlockall");
 	}
 
+	ret = pthread_getaffinity_np(pthread_self(), sizeof(cpuset_), &cpuset_);
+	BUG_ON(ret);
+	nrcpus = CPU_COUNT(&cpuset_);
+
 	if (!params.nthreads) /* default to the number of CPUs */
-		params.nthreads = perf_cpu_map__nr(cpu);
+		params.nthreads = nrcpus;
 
 	worker = calloc(params.nthreads, sizeof(*worker));
 	if (!worker)
@@ -189,10 +207,9 @@ int bench_futex_hash(int argc, const char **argv)
 	pthread_attr_init(&thread_attr);
 	gettimeofday(&bench__start, NULL);
 
-	nrcpus = cpu__max_cpu().cpu;
-	cpuset = CPU_ALLOC(nrcpus);
+	cpuset = CPU_ALLOC(4096);
 	BUG_ON(!cpuset);
-	size = CPU_ALLOC_SIZE(nrcpus);
+	size = CPU_ALLOC_SIZE(4096);
 
 	for (i = 0; i < params.nthreads; i++) {
 		worker[i].tid = i;
@@ -202,7 +219,8 @@ int bench_futex_hash(int argc, const char **argv)
 
 		CPU_ZERO_S(size, cpuset);
 
-		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
+		CPU_SET_S(get_cpu_bit(&cpuset_, sizeof(cpuset_), i % nrcpus), size, cpuset);
+
 		ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
 		if (ret) {
 			CPU_FREE(cpuset);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 14/14] tools/perf: Allocate futex locks on the local CPU-node.
  2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
                   ` (12 preceding siblings ...)
  2024-12-15 23:00 ` [PATCH v5 13/14] tools/perf: The the current affinity for CPU pinning in futex-hash Sebastian Andrzej Siewior
@ 2024-12-15 23:00 ` Sebastian Andrzej Siewior
  13 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-15 23:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long, Sebastian Andrzej Siewior

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 tools/perf/bench/futex-hash.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index 216b0d1301ffc..4c7c6677463f8 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -122,6 +122,8 @@ static void print_summary(void)
 	       (int)bench__runtime.tv_sec);
 }
 
+#include <numa.h>
+
 #define PR_FUTEX_HASH			77
 # define PR_FUTEX_HASH_SET_SLOTS	1
 # define PR_FUTEX_HASH_GET_SLOTS	2
@@ -212,14 +214,19 @@ int bench_futex_hash(int argc, const char **argv)
 	size = CPU_ALLOC_SIZE(4096);
 
 	for (i = 0; i < params.nthreads; i++) {
+		unsigned int cpu_num;
 		worker[i].tid = i;
-		worker[i].futex = calloc(params.nfutexes, sizeof(*worker[i].futex));
-		if (!worker[i].futex)
-			goto errmem;
 
 		CPU_ZERO_S(size, cpuset);
+		cpu_num = get_cpu_bit(&cpuset_, sizeof(cpuset_), i % nrcpus);
+		//worker[i].futex = calloc(params.nfutexes, sizeof(*worker[i].futex));
 
-		CPU_SET_S(get_cpu_bit(&cpuset_, sizeof(cpuset_), i % nrcpus), size, cpuset);
+		worker[i].futex = numa_alloc_onnode(params.nfutexes * sizeof(*worker[i].futex),
+						    numa_node_of_cpu(cpu_num));
+		if (worker[i].futex == MAP_FAILED || worker[i].futex == NULL)
+			goto errmem;
+
+		CPU_SET_S(cpu_num, size, cpuset);
 
 		ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
 		if (ret) {
@@ -271,7 +278,7 @@ int bench_futex_hash(int argc, const char **argv)
 				       &worker[i].futex[params.nfutexes-1], t);
 		}
 
-		zfree(&worker[i].futex);
+		numa_free(worker[i].futex, params.nfutexes * sizeof(*worker[i].futex));
 	}
 
 	print_summary();
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 06/14] futex: Add helper which include the put of a hb after end of operation.
  2024-12-15 23:00 ` [PATCH v5 06/14] futex: Add helper which include the put of a hb after end of operation Sebastian Andrzej Siewior
@ 2024-12-16 18:48   ` Thomas Gleixner
  2024-12-17 17:07     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2024-12-16 18:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Valentin Schneider, Waiman Long,
	Sebastian Andrzej Siewior

On Mon, Dec 16 2024 at 00:00, Sebastian Andrzej Siewior wrote:
> With the planned schema of resize of hb, a reference count will be
> obtained during futex_hash() and will be dropped after the hb is
> unlocked. Once the reference is dropped, the hb must not be used because
> it will disappear after a resize.
> To prepare the integration, rename
> - futex_hb_unlock() to futex_hb_unlock_put()
> - futex_queue() to futex_queue_put()
> - futex_q_unlock() to futex_q_unlock_put()
> - double_unlock_hb() to double_unlock_hb_put()
>
> which is additionally includes futex_hb_unlock_put(), an empty stub.
> Introduce futex_hb_unlock_put() which is the unlock plus the reference
> drop. Move futex_hb_waiters_dec() before the reference drop, if needed
> before the unlock.
> Update comments referring to the functions accordingly.

If I didn't know what you are talking about, it would have taken me a
while to decode the word salad above. It starts with the usage of 'hb',
an acronym which might be understandable for people familiar with the
futex code, but otherwise it's an arbitrary reference to nothing.

Assuming that 'hb' stands for hashbucket, the usage here is wrong:

      With the planned schema of resize of hb...

This is about resizing the hash not the hashbucket, right?

So something like this might be more to the point:

   futex: Prepare for reference counting of the process private hash

   To support runtime resizing of the process private hash, it's
   required to add a reference count to the hash structure. The
   reference count ensures that the hash cannot be resized or freed
   while a task is operating on it.

   The reference count will be obtained within futex_hash() and dropped
   once the hash bucket is unlocked and not longer required for the
   particular operation (queue, unqueue, wakeup etc.).

   This is achieved by:

    - appending _put() to existing functions so it's clear that they
      also put the hash reference and fixing up the usage sites

    - providing new helpers, which combine common operations (unlock,
      put), and using them at the appropriate places

    - providing new helper for standalone reference counting
      functionality and using them at places, where the unlock operation
      needs to be separate.
   
Hmm?
> -void futex_q_unlock(struct futex_hash_bucket *hb)
> +void futex_q_unlock_put(struct futex_hash_bucket *hb)
>  	__releases(&hb->lock)
>  {
>  	spin_unlock(&hb->lock);
>  	futex_hb_waiters_dec(hb);
> +	futex_hash_put(hb);

You missed a place to move the waiter_dec() before the unlock. Actually
this move should be in a separate preparatory patch, which does only
that. It also needs a proper change log which explains why this done,
why it is equivalent and not introducing a change in terms of ordering
rules. This:

> Move futex_hb_waiters_dec() before the reference drop, if needed
> before the unlock.

does not really give any clue at all.

>  		if (unlikely(ret)) {
> -			double_unlock_hb(hb1, hb2);
>  			futex_hb_waiters_dec(hb2);
> +			double_unlock_hb_put(hb1, hb2);

And having this move before the _put() change makes the latter a purely
mechanical change which let's the reader/reviewer focus on the reference
count rules and not be distracted by the waiter count changes.
  
> -	/* futex_queue and wait for wakeup, timeout, or a signal. */
> +	/* futex_queue_put and wait for wakeup, timeout, or a signal. */

When you fix up these comments, can you please use the fn() notation?

>  	futex_wait_queue(hb, &q, to);
>  
>  	/* If we were woken (and unqueued), we succeeded, whatever. */

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 07/14] futex: Move the retry_private label.
  2024-12-15 23:00 ` [PATCH v5 07/14] futex: Move the retry_private label Sebastian Andrzej Siewior
@ 2024-12-16 20:41   ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2024-12-16 20:41 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Valentin Schneider, Waiman Long,
	Sebastian Andrzej Siewior

On Mon, Dec 16 2024 at 00:00, Sebastian Andrzej Siewior wrote:
> The label futex_requeue in futex_requeue() and futex_wake_op() is jumped
> after the lock is dropped in a retry operation.

The label is jumped? 

> This assumes that the hb does not need to be hashed again. If hb is
> resized then the hb can change if the reference is dropped.

Again 'hb' and the confusion of hash bucket (hb) resize.

> Move the retry_private label before the hashing operation.

The overall explanation is not really comprehensible.

    futex: Re-evaluate the hash bucket after dropping the lock

     Sebastian Andrzej Siewior wrote:

     In futex_requeue() and futex_wake_op() the hash bucket lock is
     dropped in the failure paths for handling page faults and other
     error scenarios. After that the code jumps back to retry_private
     which relocks the hash bucket[s] under the assumption that the hash
     bucket pointer which was retrieved via futex_hash() is still valid.
     
     With resizable private hash buckets, that assumption is not longer
     true as the waiters can be moved to a larger hash in the meantime.

     Move the retry_private label above the hashing function to handle
     this correctly.

Or so.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 09/14] futex: Allow to re-allocate the private hash bucket.
  2024-12-15 23:00 ` [PATCH v5 09/14] futex: Allow to re-allocate the private hash bucket Sebastian Andrzej Siewior
@ 2024-12-17 14:58   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-17 14:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Almeida, Darren Hart, Davidlohr Bueso, Ingo Molnar,
	Juri Lelli, Peter Zijlstra, Thomas Gleixner, Valentin Schneider,
	Waiman Long

On 2024-12-16 00:00:13 [+0100], To linux-kernel@vger.kernel.org wrote:
> @@ -154,6 +303,17 @@ struct futex_hash_bucket *futex_hash(union futex_key *key)
>  
>  void futex_hash_put(struct futex_hash_bucket *hb)
>  {
> +	struct futex_hash_bucket_private *hb_p;
> +
> +	if (hb->hb_slot == 0)
> +		return;
> +	hb_p = container_of(hb, struct futex_hash_bucket_private,
> +			    queues[hb->hb_slot - 1]);
> +
> +	if (!rcuref_put(&hb_p->users))
> +		return;
> +
> +	futex_assign_new_hb(NULL);

I have reworked two things:
- I have to remove futex_assign_new_hb() here and let the next
  rcuref_get() perform the assignment instead. The reason is that the
  caller can change its task state (futex_wait_queue()) and blocking on
  the mutex will reset it. Instead of delaying it after the schedule()
  it looks simpler to delay it to the next rcuref_get() where it has to
  be handled anyway.

- The logic in __futex_assign_new_hb() didn't deal properly with delayed
  assigns.

>  }

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 06/14] futex: Add helper which include the put of a hb after end of operation.
  2024-12-16 18:48   ` Thomas Gleixner
@ 2024-12-17 17:07     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-17 17:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, André Almeida, Darren Hart, Davidlohr Bueso,
	Ingo Molnar, Juri Lelli, Peter Zijlstra, Valentin Schneider,
	Waiman Long

On 2024-12-16 19:48:07 [+0100], Thomas Gleixner wrote:
> So something like this might be more to the point:
> 
>    futex: Prepare for reference counting of the process private hash
> 
>    To support runtime resizing of the process private hash, it's
>    required to add a reference count to the hash structure. The
>    reference count ensures that the hash cannot be resized or freed
>    while a task is operating on it.
> 
>    The reference count will be obtained within futex_hash() and dropped
>    once the hash bucket is unlocked and not longer required for the
>    particular operation (queue, unqueue, wakeup etc.).
> 
>    This is achieved by:
> 
>     - appending _put() to existing functions so it's clear that they
>       also put the hash reference and fixing up the usage sites
> 
>     - providing new helpers, which combine common operations (unlock,
>       put), and using them at the appropriate places
> 
>     - providing new helper for standalone reference counting
>       functionality and using them at places, where the unlock operation
>       needs to be separate.
>    
> Hmm?

much better.

> > -void futex_q_unlock(struct futex_hash_bucket *hb)
> > +void futex_q_unlock_put(struct futex_hash_bucket *hb)
> >  	__releases(&hb->lock)
> >  {
> >  	spin_unlock(&hb->lock);
> >  	futex_hb_waiters_dec(hb);
> > +	futex_hash_put(hb);
> 
> You missed a place to move the waiter_dec() before the unlock. Actually

This is fine because futex_hb_waiters_dec() happens before the reference
drop. However it is better to keep it symmetrical so I going to move it.

> this move should be in a separate preparatory patch, which does only
> that. It also needs a proper change log which explains why this done,
> why it is equivalent and not introducing a change in terms of ordering
> rules. This:
>
> > Move futex_hb_waiters_dec() before the reference drop, if needed
> > before the unlock.
> 
> does not really give any clue at all.
> 
> >  		if (unlikely(ret)) {
> > -			double_unlock_hb(hb1, hb2);
> >  			futex_hb_waiters_dec(hb2);
> > +			double_unlock_hb_put(hb1, hb2);
> 
> And having this move before the _put() change makes the latter a purely
> mechanical change which let's the reader/reviewer focus on the reference
> count rules and not be distracted by the waiter count changes.

Okay, moving this into its own patch.

> > -	/* futex_queue and wait for wakeup, timeout, or a signal. */
> > +	/* futex_queue_put and wait for wakeup, timeout, or a signal. */
> 
> When you fix up these comments, can you please use the fn() notation?

sure

> Thanks,
> 
>         tglx

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-12-17 17:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-15 23:00 [PATCH v5 0/14] futex: Add support task local hash maps Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 01/14] futex: Create helper function to initialize a hash slot Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 02/14] futex: Add basic infrastructure for local task local hash Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 03/14] futex: Allow automatic allocation of process wide futex hash Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 04/14] futex: Hash only the address for private futexes Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 05/14] futex: Move private hashing into its own function Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 06/14] futex: Add helper which include the put of a hb after end of operation Sebastian Andrzej Siewior
2024-12-16 18:48   ` Thomas Gleixner
2024-12-17 17:07     ` Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 07/14] futex: Move the retry_private label Sebastian Andrzej Siewior
2024-12-16 20:41   ` Thomas Gleixner
2024-12-15 23:00 ` [PATCH v5 08/14] futex: Introduce futex_get_locked_hb() Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 09/14] futex: Allow to re-allocate the private hash bucket Sebastian Andrzej Siewior
2024-12-17 14:58   ` Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 10/14] futex: Resize futex hash table based on number of threads Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 11/14] futex: Use a hashmask instead of hashsize Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 12/14] tools/perf: Add the prctl(PR_FUTEX_HASH,…) to futex-hash Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 13/14] tools/perf: The the current affinity for CPU pinning in futex-hash Sebastian Andrzej Siewior
2024-12-15 23:00 ` [PATCH v5 14/14] tools/perf: Allocate futex locks on the local CPU-node Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox