All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup
@ 2025-02-05 20:09 Mateusz Guzik
  2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
  To: ebiederm, oleg
  Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel,
	Mateusz Guzik

The clone side contends against exit side in a way which avoidably
exacerbates the problem by the latter waiting on locks held by the
former while holding the tasklist_lock.

Whacking this for both add_device_randomness and pids allocation gives
me a 15% speed up for thread creation/destruction in a 24-core vm.

The random patch is worth about 4%.

nothing blew up with lockdep, lightly tested so far

Bench (plop into will-it-scale):
$ cat tests/threadspawn1.c

char *testcase_description = "Thread creation and teardown";

static void *worker(void *arg)
{
	return (NULL);
}

void testcase(unsigned long long *iterations, unsigned long nr)
{
	pthread_t thread;
	int error;

	while (1) {
		error = pthread_create(&thread, NULL, worker, NULL);
		assert(error == 0);
		error = pthread_join(thread, NULL);
		assert(error == 0);
		(*iterations)++;
	}
}

v5:
- whack scripts/selinux/genheaders/genheaders which accidentally got in
- rebased on next-20250205

v4:
- justify moving get_pid in the commit message with a one-liner
- drop the tty unref patch -- it is completely optional and Oleg has his
  own variant
- add the ACK by Oleg

v3:
- keep procfs flush where it was, instead hoist get_pid outside of the
  lock
- make detach_pid et al accept an array argument of pids to populate
- sprinkle asserts
- drop irq trips around pidmap_lock
- move tty unref outside of tasklist_lock

Mateusz Guzik (5):
  exit: perform add_device_randomness() without tasklist_lock
  exit: hoist get_pid() in release_task() outside of tasklist_lock
  pid: sprinkle tasklist_lock asserts
  pid: perform free_pid() calls outside of tasklist_lock
  pid: drop irq disablement around pidmap_lock

 include/linux/pid.h |  7 ++--
 kernel/exit.c       | 36 +++++++++++++-------
 kernel/pid.c        | 82 +++++++++++++++++++++++++--------------------
 kernel/sys.c        | 14 +++++---
 4 files changed, 82 insertions(+), 57 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock
  2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
@ 2025-02-05 20:09 ` Mateusz Guzik
  2025-02-05 20:56   ` Oleg Nesterov
  2025-02-05 20:09 ` [PATCH v5 2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock Mateusz Guzik
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
  To: ebiederm, oleg
  Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel,
	Mateusz Guzik

Parallel calls to add_device_randomness() contend on their own.

The clone side aleady runs outside of tasklist_lock, which in turn means
any caller on the exit side extends the tasklist_lock hold time while
contending on the random-private lock.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 kernel/exit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 3485e5fc499e..c79b41509cd3 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -174,9 +174,6 @@ static void __exit_signal(struct task_struct *tsk)
 			sig->curr_target = next_thread(tsk);
 	}
 
-	add_device_randomness((const void*) &tsk->se.sum_exec_runtime,
-			      sizeof(unsigned long long));
-
 	/*
 	 * Accumulate here the counters for all threads as they die. We could
 	 * skip the group leader because it is the last user of signal_struct,
@@ -278,6 +275,9 @@ void release_task(struct task_struct *p)
 	write_unlock_irq(&tasklist_lock);
 	proc_flush_pid(thread_pid);
 	put_pid(thread_pid);
+	add_device_randomness(&p->se.sum_exec_runtime,
+			      sizeof(p->se.sum_exec_runtime));
+	free_pids(post.pids);
 	release_thread(p);
 	put_task_struct_rcu_user(p);
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock
  2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
  2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik
@ 2025-02-05 20:09 ` Mateusz Guzik
  2025-02-05 20:09 ` [PATCH v5 3/5] pid: sprinkle tasklist_lock asserts Mateusz Guzik
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
  To: ebiederm, oleg
  Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel,
	Mateusz Guzik

reduces hold time as get_pid() contains an atomic

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 kernel/exit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index c79b41509cd3..b5c0cbc6bdfb 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -248,9 +248,10 @@ void release_task(struct task_struct *p)
 
 	cgroup_release(p);
 
+	thread_pid = get_pid(p->thread_pid);
+
 	write_lock_irq(&tasklist_lock);
 	ptrace_release_task(p);
-	thread_pid = get_pid(p->thread_pid);
 	__exit_signal(p);
 
 	/*
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 3/5] pid: sprinkle tasklist_lock asserts
  2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
  2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik
  2025-02-05 20:09 ` [PATCH v5 2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock Mateusz Guzik
@ 2025-02-05 20:09 ` Mateusz Guzik
  2025-02-05 20:09 ` [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Mateusz Guzik
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
  To: ebiederm, oleg
  Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel,
	Mateusz Guzik

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 kernel/pid.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index 924084713be8..2ae872f689a7 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -339,17 +339,23 @@ static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
  */
 void attach_pid(struct task_struct *task, enum pid_type type)
 {
-	struct pid *pid = *task_pid_ptr(task, type);
+	struct pid *pid;
+
+	lockdep_assert_held_write(&tasklist_lock);
+
+	pid = *task_pid_ptr(task, type);
 	hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]);
 }
 
 static void __change_pid(struct task_struct *task, enum pid_type type,
 			struct pid *new)
 {
-	struct pid **pid_ptr = task_pid_ptr(task, type);
-	struct pid *pid;
+	struct pid **pid_ptr, *pid;
 	int tmp;
 
+	lockdep_assert_held_write(&tasklist_lock);
+
+	pid_ptr = task_pid_ptr(task, type);
 	pid = *pid_ptr;
 
 	hlist_del_rcu(&task->pid_links[type]);
@@ -386,6 +392,8 @@ void exchange_tids(struct task_struct *left, struct task_struct *right)
 	struct hlist_head *head1 = &pid1->tasks[PIDTYPE_PID];
 	struct hlist_head *head2 = &pid2->tasks[PIDTYPE_PID];
 
+	lockdep_assert_held_write(&tasklist_lock);
+
 	/* Swap the single entry tid lists */
 	hlists_swap_heads_rcu(head1, head2);
 
@@ -403,6 +411,7 @@ void transfer_pid(struct task_struct *old, struct task_struct *new,
 			   enum pid_type type)
 {
 	WARN_ON_ONCE(type == PIDTYPE_PID);
+	lockdep_assert_held_write(&tasklist_lock);
 	hlist_replace_rcu(&old->pid_links[type], &new->pid_links[type]);
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock
  2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
                   ` (2 preceding siblings ...)
  2025-02-05 20:09 ` [PATCH v5 3/5] pid: sprinkle tasklist_lock asserts Mateusz Guzik
@ 2025-02-05 20:09 ` Mateusz Guzik
  2025-02-07 18:31   ` Mark Brown
  2025-02-05 20:09 ` [PATCH v5 5/5] pid: drop irq disablement around pidmap_lock Mateusz Guzik
  2025-02-05 20:29 ` [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Oleg Nesterov
  5 siblings, 1 reply; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
  To: ebiederm, oleg
  Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel,
	Mateusz Guzik

As the clone side already executes pid allocation with only pidmap_lock
held, issuing free_pid() while still holding tasklist_lock exacerbates
total hold time of the latter.

The pid array is smuggled through newly added release_task_post struct
so that any extra things want to get moved out have means to do it.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 include/linux/pid.h |  7 ++++---
 kernel/exit.c       | 27 +++++++++++++++++++--------
 kernel/pid.c        | 44 ++++++++++++++++++++++----------------------
 kernel/sys.c        | 14 +++++++++-----
 4 files changed, 54 insertions(+), 38 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index 98837a1ff0f3..311ecebd7d56 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -101,9 +101,9 @@ extern struct pid *get_task_pid(struct task_struct *task, enum pid_type type);
  * these helpers must be called with the tasklist_lock write-held.
  */
 extern void attach_pid(struct task_struct *task, enum pid_type);
-extern void detach_pid(struct task_struct *task, enum pid_type);
-extern void change_pid(struct task_struct *task, enum pid_type,
-			struct pid *pid);
+void detach_pid(struct pid **pids, struct task_struct *task, enum pid_type);
+void change_pid(struct pid **pids, struct task_struct *task, enum pid_type,
+		struct pid *pid);
 extern void exchange_tids(struct task_struct *task, struct task_struct *old);
 extern void transfer_pid(struct task_struct *old, struct task_struct *new,
 			 enum pid_type);
@@ -129,6 +129,7 @@ extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
 extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 			     size_t set_tid_size);
 extern void free_pid(struct pid *pid);
+void free_pids(struct pid **pids);
 extern void disable_pid_allocation(struct pid_namespace *ns);
 
 /*
diff --git a/kernel/exit.c b/kernel/exit.c
index b5c0cbc6bdfb..0d6df671c8a8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -122,14 +122,22 @@ static __init int kernel_exit_sysfs_init(void)
 late_initcall(kernel_exit_sysfs_init);
 #endif
 
-static void __unhash_process(struct task_struct *p, bool group_dead)
+/*
+ * For things release_task() would like to do *after* tasklist_lock is released.
+ */
+struct release_task_post {
+	struct pid *pids[PIDTYPE_MAX];
+};
+
+static void __unhash_process(struct release_task_post *post, struct task_struct *p,
+			     bool group_dead)
 {
 	nr_threads--;
-	detach_pid(p, PIDTYPE_PID);
+	detach_pid(post->pids, p, PIDTYPE_PID);
 	if (group_dead) {
-		detach_pid(p, PIDTYPE_TGID);
-		detach_pid(p, PIDTYPE_PGID);
-		detach_pid(p, PIDTYPE_SID);
+		detach_pid(post->pids, p, PIDTYPE_TGID);
+		detach_pid(post->pids, p, PIDTYPE_PGID);
+		detach_pid(post->pids, p, PIDTYPE_SID);
 
 		list_del_rcu(&p->tasks);
 		list_del_init(&p->sibling);
@@ -141,7 +149,7 @@ static void __unhash_process(struct task_struct *p, bool group_dead)
 /*
  * This function expects the tasklist_lock write-locked.
  */
-static void __exit_signal(struct task_struct *tsk)
+static void __exit_signal(struct release_task_post *post, struct task_struct *tsk)
 {
 	struct signal_struct *sig = tsk->signal;
 	bool group_dead = thread_group_leader(tsk);
@@ -194,7 +202,7 @@ static void __exit_signal(struct task_struct *tsk)
 	task_io_accounting_add(&sig->ioac, &tsk->ioac);
 	sig->sum_sched_runtime += tsk->se.sum_exec_runtime;
 	sig->nr_threads--;
-	__unhash_process(tsk, group_dead);
+	__unhash_process(post, tsk, group_dead);
 	write_sequnlock(&sig->stats_lock);
 
 	/*
@@ -236,10 +244,13 @@ void __weak release_thread(struct task_struct *dead_task)
 
 void release_task(struct task_struct *p)
 {
+	struct release_task_post post;
 	struct task_struct *leader;
 	struct pid *thread_pid;
 	int zap_leader;
 repeat:
+	memset(&post, 0, sizeof(post));
+
 	/* don't need to get the RCU readlock here - the process is dead and
 	 * can't be modifying its own credentials. But shut RCU-lockdep up */
 	rcu_read_lock();
@@ -252,7 +263,7 @@ void release_task(struct task_struct *p)
 
 	write_lock_irq(&tasklist_lock);
 	ptrace_release_task(p);
-	__exit_signal(p);
+	__exit_signal(&post, p);
 
 	/*
 	 * If we are the last non-leader member of the thread
diff --git a/kernel/pid.c b/kernel/pid.c
index 2ae872f689a7..73625f28c166 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -88,20 +88,6 @@ struct pid_namespace init_pid_ns = {
 };
 EXPORT_SYMBOL_GPL(init_pid_ns);
 
-/*
- * Note: disable interrupts while the pidmap_lock is held as an
- * interrupt might come in and do read_lock(&tasklist_lock).
- *
- * If we don't disable interrupts there is a nasty deadlock between
- * detach_pid()->free_pid() and another cpu that does
- * spin_lock(&pidmap_lock) followed by an interrupt routine that does
- * read_lock(&tasklist_lock);
- *
- * After we clean up the tasklist_lock and know there are no
- * irq handlers that take it we can leave the interrupts enabled.
- * For now it is easier to be safe than to prove it can't happen.
- */
-
 static  __cacheline_aligned_in_smp DEFINE_SPINLOCK(pidmap_lock);
 seqcount_spinlock_t pidmap_lock_seq = SEQCNT_SPINLOCK_ZERO(pidmap_lock_seq, &pidmap_lock);
 
@@ -128,10 +114,11 @@ static void delayed_put_pid(struct rcu_head *rhp)
 
 void free_pid(struct pid *pid)
 {
-	/* We can be called with write_lock_irq(&tasklist_lock) held */
 	int i;
 	unsigned long flags;
 
+	lockdep_assert_not_held(&tasklist_lock);
+
 	spin_lock_irqsave(&pidmap_lock, flags);
 	for (i = 0; i <= pid->level; i++) {
 		struct upid *upid = pid->numbers + i;
@@ -160,6 +147,18 @@ void free_pid(struct pid *pid)
 	call_rcu(&pid->rcu, delayed_put_pid);
 }
 
+void free_pids(struct pid **pids)
+{
+	int tmp;
+
+	/*
+	 * This can batch pidmap_lock.
+	 */
+	for (tmp = PIDTYPE_MAX; --tmp >= 0; )
+		if (pids[tmp])
+			free_pid(pids[tmp]);
+}
+
 struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 		      size_t set_tid_size)
 {
@@ -347,8 +346,8 @@ void attach_pid(struct task_struct *task, enum pid_type type)
 	hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]);
 }
 
-static void __change_pid(struct task_struct *task, enum pid_type type,
-			struct pid *new)
+static void __change_pid(struct pid **pids, struct task_struct *task,
+			 enum pid_type type, struct pid *new)
 {
 	struct pid **pid_ptr, *pid;
 	int tmp;
@@ -370,18 +369,19 @@ static void __change_pid(struct task_struct *task, enum pid_type type,
 		if (pid_has_task(pid, tmp))
 			return;
 
-	free_pid(pid);
+	WARN_ON(pids[type]);
+	pids[type] = pid;
 }
 
-void detach_pid(struct task_struct *task, enum pid_type type)
+void detach_pid(struct pid **pids, struct task_struct *task, enum pid_type type)
 {
-	__change_pid(task, type, NULL);
+	__change_pid(pids, task, type, NULL);
 }
 
-void change_pid(struct task_struct *task, enum pid_type type,
+void change_pid(struct pid **pids, struct task_struct *task, enum pid_type type,
 		struct pid *pid)
 {
-	__change_pid(task, type, pid);
+	__change_pid(pids, task, type, pid);
 	attach_pid(task, type);
 }
 
diff --git a/kernel/sys.c b/kernel/sys.c
index cb366ff8703a..4efca8a97d62 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1085,6 +1085,7 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid)
 {
 	struct task_struct *p;
 	struct task_struct *group_leader = current->group_leader;
+	struct pid *pids[PIDTYPE_MAX] = { 0 };
 	struct pid *pgrp;
 	int err;
 
@@ -1142,13 +1143,14 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid)
 		goto out;
 
 	if (task_pgrp(p) != pgrp)
-		change_pid(p, PIDTYPE_PGID, pgrp);
+		change_pid(pids, p, PIDTYPE_PGID, pgrp);
 
 	err = 0;
 out:
 	/* All paths lead to here, thus we are safe. -DaveM */
 	write_unlock_irq(&tasklist_lock);
 	rcu_read_unlock();
+	free_pids(pids);
 	return err;
 }
 
@@ -1222,21 +1224,22 @@ SYSCALL_DEFINE1(getsid, pid_t, pid)
 	return retval;
 }
 
-static void set_special_pids(struct pid *pid)
+static void set_special_pids(struct pid **pids, struct pid *pid)
 {
 	struct task_struct *curr = current->group_leader;
 
 	if (task_session(curr) != pid)
-		change_pid(curr, PIDTYPE_SID, pid);
+		change_pid(pids, curr, PIDTYPE_SID, pid);
 
 	if (task_pgrp(curr) != pid)
-		change_pid(curr, PIDTYPE_PGID, pid);
+		change_pid(pids, curr, PIDTYPE_PGID, pid);
 }
 
 int ksys_setsid(void)
 {
 	struct task_struct *group_leader = current->group_leader;
 	struct pid *sid = task_pid(group_leader);
+	struct pid *pids[PIDTYPE_MAX] = { 0 };
 	pid_t session = pid_vnr(sid);
 	int err = -EPERM;
 
@@ -1252,13 +1255,14 @@ int ksys_setsid(void)
 		goto out;
 
 	group_leader->signal->leader = 1;
-	set_special_pids(sid);
+	set_special_pids(pids, sid);
 
 	proc_clear_tty(group_leader);
 
 	err = session;
 out:
 	write_unlock_irq(&tasklist_lock);
+	free_pids(pids);
 	if (err > 0) {
 		proc_sid_connector(group_leader);
 		sched_autogroup_create_attach(group_leader);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 5/5] pid: drop irq disablement around pidmap_lock
  2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
                   ` (3 preceding siblings ...)
  2025-02-05 20:09 ` [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Mateusz Guzik
@ 2025-02-05 20:09 ` Mateusz Guzik
  2025-02-05 20:29 ` [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Oleg Nesterov
  5 siblings, 0 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
  To: ebiederm, oleg
  Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel,
	Mateusz Guzik

It no longer serves any purpose now that the tasklist_lock ->
pidmap_lock ordering got eliminated.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 kernel/pid.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index 73625f28c166..900193de4232 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -115,11 +115,10 @@ static void delayed_put_pid(struct rcu_head *rhp)
 void free_pid(struct pid *pid)
 {
 	int i;
-	unsigned long flags;
 
 	lockdep_assert_not_held(&tasklist_lock);
 
-	spin_lock_irqsave(&pidmap_lock, flags);
+	spin_lock(&pidmap_lock);
 	for (i = 0; i <= pid->level; i++) {
 		struct upid *upid = pid->numbers + i;
 		struct pid_namespace *ns = upid->ns;
@@ -142,7 +141,7 @@ void free_pid(struct pid *pid)
 		idr_remove(&ns->idr, upid->nr);
 	}
 	pidfs_remove_pid(pid);
-	spin_unlock_irqrestore(&pidmap_lock, flags);
+	spin_unlock(&pidmap_lock);
 
 	call_rcu(&pid->rcu, delayed_put_pid);
 }
@@ -210,7 +209,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 		}
 
 		idr_preload(GFP_KERNEL);
-		spin_lock_irq(&pidmap_lock);
+		spin_lock(&pidmap_lock);
 
 		if (tid) {
 			nr = idr_alloc(&tmp->idr, NULL, tid,
@@ -237,7 +236,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 			nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min,
 					      pid_max, GFP_ATOMIC);
 		}
-		spin_unlock_irq(&pidmap_lock);
+		spin_unlock(&pidmap_lock);
 		idr_preload_end();
 
 		if (nr < 0) {
@@ -271,7 +270,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 
 	upid = pid->numbers + ns->level;
 	idr_preload(GFP_KERNEL);
-	spin_lock_irq(&pidmap_lock);
+	spin_lock(&pidmap_lock);
 	if (!(ns->pid_allocated & PIDNS_ADDING))
 		goto out_unlock;
 	pidfs_add_pid(pid);
@@ -280,18 +279,18 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 		idr_replace(&upid->ns->idr, pid, upid->nr);
 		upid->ns->pid_allocated++;
 	}
-	spin_unlock_irq(&pidmap_lock);
+	spin_unlock(&pidmap_lock);
 	idr_preload_end();
 
 	return pid;
 
 out_unlock:
-	spin_unlock_irq(&pidmap_lock);
+	spin_unlock(&pidmap_lock);
 	idr_preload_end();
 	put_pid_ns(ns);
 
 out_free:
-	spin_lock_irq(&pidmap_lock);
+	spin_lock(&pidmap_lock);
 	while (++i <= ns->level) {
 		upid = pid->numbers + i;
 		idr_remove(&upid->ns->idr, upid->nr);
@@ -301,7 +300,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 	if (ns->pid_allocated == PIDNS_ADDING)
 		idr_set_cursor(&ns->idr, 0);
 
-	spin_unlock_irq(&pidmap_lock);
+	spin_unlock(&pidmap_lock);
 
 	kmem_cache_free(ns->pid_cachep, pid);
 	return ERR_PTR(retval);
@@ -309,9 +308,9 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 
 void disable_pid_allocation(struct pid_namespace *ns)
 {
-	spin_lock_irq(&pidmap_lock);
+	spin_lock(&pidmap_lock);
 	ns->pid_allocated &= ~PIDNS_ADDING;
-	spin_unlock_irq(&pidmap_lock);
+	spin_unlock(&pidmap_lock);
 }
 
 struct pid *find_pid_ns(int nr, struct pid_namespace *ns)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup
  2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
                   ` (4 preceding siblings ...)
  2025-02-05 20:09 ` [PATCH v5 5/5] pid: drop irq disablement around pidmap_lock Mateusz Guzik
@ 2025-02-05 20:29 ` Oleg Nesterov
  5 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2025-02-05 20:29 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: ebiederm, brauner, akpm, Liam.Howlett, linux-mm, linux-kernel

On 02/05, Mateusz Guzik wrote:
>
> Whacking this for both add_device_randomness and pids allocation gives
> me a 15% speed up for thread creation/destruction in a 24-core vm.

Nice. The whole series look good to me.

Oleg.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock
  2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik
@ 2025-02-05 20:56   ` Oleg Nesterov
  2025-02-06 10:40     ` Christian Brauner
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2025-02-05 20:56 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: ebiederm, brauner, akpm, Liam.Howlett, linux-mm, linux-kernel

On 02/05, Mateusz Guzik wrote:
>
> Parallel calls to add_device_randomness() contend on their own.
...
> +	add_device_randomness(&p->se.sum_exec_runtime,
> +			      sizeof(p->se.sum_exec_runtime));

OK, but

> +	free_pids(post.pids);

wait... free_pids() comes later in 4/5 ?

Oleg.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock
  2025-02-05 20:56   ` Oleg Nesterov
@ 2025-02-06 10:40     ` Christian Brauner
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2025-02-06 10:40 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mateusz Guzik, ebiederm, akpm, Liam.Howlett, linux-mm,
	linux-kernel

On Wed, Feb 05, 2025 at 09:56:56PM +0100, Oleg Nesterov wrote:
> On 02/05, Mateusz Guzik wrote:
> >
> > Parallel calls to add_device_randomness() contend on their own.
> ...
> > +	add_device_randomness(&p->se.sum_exec_runtime,
> > +			      sizeof(p->se.sum_exec_runtime));
> 
> OK, but
> 
> > +	free_pids(post.pids);
> 
> wait... free_pids() comes later in 4/5 ?

Yes, that seems to be an accidental leftover. A

git rebase -S -i v6.14-rc1 -x "make -j7 O=build.v1/ kernel/"

caught this right away. Removed.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock
  2025-02-05 20:09 ` [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Mateusz Guzik
@ 2025-02-07 18:31   ` Mark Brown
  2025-02-07 19:45     ` Mateusz Guzik
  2025-02-07 19:51     ` Mateusz Guzik
  0 siblings, 2 replies; 12+ messages in thread
From: Mark Brown @ 2025-02-07 18:31 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: ebiederm, oleg, brauner, akpm, Liam.Howlett, linux-mm,
	linux-kernel, Aishwarya.TCV

[-- Attachment #1: Type: text/plain, Size: 6137 bytes --]

On Wed, Feb 05, 2025 at 09:09:28PM +0100, Mateusz Guzik wrote:
> As the clone side already executes pid allocation with only pidmap_lock
> held, issuing free_pid() while still holding tasklist_lock exacerbates
> total hold time of the latter.
> 
> The pid array is smuggled through newly added release_task_post struct
> so that any extra things want to get moved out have means to do it.

We are seeing failures in -next with the clone3 clone3_set_tid selftest
on at least arm and arm64 systems which have been bisected to this
commit, in -next as 88dec855ce117f52bcf88748ddcbb25b10f4f2fc (same
bisect for both):

For arm64 we're seeing a bunch of new failures including:

# # [389] Trying clone3() with CLONE_SET_TID to 391 and 0x0
# # File exists - Failed to create new process
# # [389] clone3() with CLONE_SET_TID 391 says: -17 - expected 0
# not ok 19 reallocate child TID with 1 TIDs and flags 0x0

# # [389] Trying clone3() with CLONE_SET_TID to 1 and 0x20000000
# # File exists - Failed to create new process
# # [389] clone3() with CLONE_SET_TID 1 says: -17 - expected 0
# not ok 21 create PID 1 in new NS with 2 TIDs and flags 0x20000000

# # [1] Trying clone3() with CLONE_SET_TID to 43 and 0x0
# # File exists - Failed to create new process
# # [1] clone3() with CLONE_SET_TID 43 says: -17 - expected 0
# not ok 24 check leak on invalid specific TID with 2 TIDs and flags 0x0

but there's more, 32 bit only seems to see one new failure (at least on
Beaglebone Black which is where I have that test running).

Full log for a run on arm64:

   https://lava.sirena.org.uk/scheduler/job/1102114

and arm:

   https://lava.sirena.org.uk/scheduler/job/1102088

Bisect log, the start is my tooling feeding in test results it already
has to hand in the commit range to seed the bisect:

# bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207
# good: [653cd79f296fc6fa592cb9fa2f7b8494d5573a43] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
# good: [4c7518062d638837cea915e0ffe30f846780639a] ASoC: SOF: ipc4: Add support for split firmware releases
# good: [0a7c85b516830c0bb088b0bdb2f2c50c76fc531a] regulator: ad5398: Fix incorrect power down bit mask
# good: [215705db51eb23052c73126d2efb6acbc2db0424] spi: Replace custom fsleep() implementation
# good: [6603c5133daadbb3277fbd93be0d0d5b8ec928e8] ASoC: dt-bindings: atmel,at91-ssc: Convert to YAML format
# good: [25fac20edd09b60651eabcc57c187b1277f43d08] spi: gpio: Support a single always-selected device
# good: [652ffad172d089acb1a20e5fde1b66e687832b06] spi: fsi: Batch TX operations
# good: [6eab7034579917f207ca6d8e3f4e11e85e0ab7d5] ASoC: soc-core: Stop using of_property_read_bool() for non-boolean properties
# good: [f5aab0438ef17f01c5ecd25e61ae6a03f82a4586] regulator: pca9450: Fix enable register for LDO5
# good: [c1ac98492d1584d31f335d233a5cd7a4d4116e5a] spi: realtek-rtl-snand: Drop unneeded assignment for cache_type
# good: [5a6a461079decea452fdcae955bccecf92e07e97] regulator: ad5398: Add device tree support
# good: [995cf0e014b0144edf1125668a97c252c5ab775e] regmap: Reorder 'struct regmap'
# good: [89785306453ce6d949e783f6936821a0b7649ee2] spi: zynqmp-gqspi: Always acknowledge interrupts
# good: [0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d] Merge branch 'for-5.19/cleanup' into for-next
git bisect start 'ed58d103e6da15a442ff87567898768dc3a66987' '653cd79f296fc6fa592cb9fa2f7b8494d5573a43' '4c7518062d638837cea915e0ffe30f846780639a' '0a7c85b516830c0bb088b0bdb2f2c50c76fc531a' '215705db51eb23052c73126d2efb6acbc2db0424' '6603c5133daadbb3277fbd93be0d0d5b8ec928e8' '25fac20edd09b60651eabcc57c187b1277f43d08' '652ffad172d089acb1a20e5fde1b66e687832b06' '6eab7034579917f207ca6d8e3f4e11e85e0ab7d5' 'f5aab0438ef17f01c5ecd25e61ae6a03f82a4586' 'c1ac98492d1584d31f335d233a5cd7a4d4116e5a' '5a6a461079decea452fdcae955bccecf92e07e97' '995cf0e014b0144edf1125668a97c252c5ab775e' '89785306453ce6d949e783f6936821a0b7649ee2' '0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d'
# bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207
git bisect bad ed58d103e6da15a442ff87567898768dc3a66987
# bad: [5a44f71c6ba5b0a7623c25047ac61ed852afbd84] Merge branch 'for-linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git
git bisect bad 5a44f71c6ba5b0a7623c25047ac61ed852afbd84
# bad: [953e43d94b2048d260774e9ddbfd3f378aa2c256] Merge branch 'fs-next' of linux-next
git bisect bad 953e43d94b2048d260774e9ddbfd3f378aa2c256
# good: [2ab7baf7ebddff9faff6846648fd0753e5ee58c1] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git
git bisect good 2ab7baf7ebddff9faff6846648fd0753e5ee58c1
# good: [09c7fd0d4465df7fd6cda5668bd32267884b6cbf] Merge branch '9p-next' of git://github.com/martinetd/linux
git bisect good 09c7fd0d4465df7fd6cda5668bd32267884b6cbf
# bad: [7a1f00b09c09aadbf33660a31f3f3d8ee6302c45] Merge branch 'vfs-6.15.iomap' into vfs.all
git bisect bad 7a1f00b09c09aadbf33660a31f3f3d8ee6302c45
# good: [9bc19073026d0dce6b7d17d22e74643c80283f8f] Merge branch 'vfs-6.15.mount' into vfs.all
git bisect good 9bc19073026d0dce6b7d17d22e74643c80283f8f
# bad: [ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c] Merge branch 'kernel-6.15.tasklist_lock' into vfs.all
git bisect bad ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c
# good: [d7c340391cb0f0882bc8d5d9798ef32b577a89bc] Merge branch 'vfs-6.15.pipe' into vfs.all
git bisect good d7c340391cb0f0882bc8d5d9798ef32b577a89bc
# good: [e88fed94388f62a28acfef4954348abe79557d19] pid: sprinkle tasklist_lock asserts
git bisect good e88fed94388f62a28acfef4954348abe79557d19
# bad: [5ca27e0557d722dd648f949dde6e4997f6255f18] pid: drop irq disablement around pidmap_lock
git bisect bad 5ca27e0557d722dd648f949dde6e4997f6255f18
# bad: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock
git bisect bad 88dec855ce117f52bcf88748ddcbb25b10f4f2fc
# first bad commit: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock
  2025-02-07 18:31   ` Mark Brown
@ 2025-02-07 19:45     ` Mateusz Guzik
  2025-02-07 19:51     ` Mateusz Guzik
  1 sibling, 0 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-07 19:45 UTC (permalink / raw)
  To: Mark Brown
  Cc: ebiederm, oleg, brauner, akpm, Liam.Howlett, linux-mm,
	linux-kernel, Aishwarya.TCV

thanks for the report, i'll prod it over the weekend

On Fri, Feb 7, 2025 at 7:31 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Wed, Feb 05, 2025 at 09:09:28PM +0100, Mateusz Guzik wrote:
> > As the clone side already executes pid allocation with only pidmap_lock
> > held, issuing free_pid() while still holding tasklist_lock exacerbates
> > total hold time of the latter.
> >
> > The pid array is smuggled through newly added release_task_post struct
> > so that any extra things want to get moved out have means to do it.
>
> We are seeing failures in -next with the clone3 clone3_set_tid selftest
> on at least arm and arm64 systems which have been bisected to this
> commit, in -next as 88dec855ce117f52bcf88748ddcbb25b10f4f2fc (same
> bisect for both):
>
> For arm64 we're seeing a bunch of new failures including:
>
> # # [389] Trying clone3() with CLONE_SET_TID to 391 and 0x0
> # # File exists - Failed to create new process
> # # [389] clone3() with CLONE_SET_TID 391 says: -17 - expected 0
> # not ok 19 reallocate child TID with 1 TIDs and flags 0x0
>
> # # [389] Trying clone3() with CLONE_SET_TID to 1 and 0x20000000
> # # File exists - Failed to create new process
> # # [389] clone3() with CLONE_SET_TID 1 says: -17 - expected 0
> # not ok 21 create PID 1 in new NS with 2 TIDs and flags 0x20000000
>
> # # [1] Trying clone3() with CLONE_SET_TID to 43 and 0x0
> # # File exists - Failed to create new process
> # # [1] clone3() with CLONE_SET_TID 43 says: -17 - expected 0
> # not ok 24 check leak on invalid specific TID with 2 TIDs and flags 0x0
>
> but there's more, 32 bit only seems to see one new failure (at least on
> Beaglebone Black which is where I have that test running).
>
> Full log for a run on arm64:
>
>    https://lava.sirena.org.uk/scheduler/job/1102114
>
> and arm:
>
>    https://lava.sirena.org.uk/scheduler/job/1102088
>
> Bisect log, the start is my tooling feeding in test results it already
> has to hand in the commit range to seed the bisect:
>
> # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207
> # good: [653cd79f296fc6fa592cb9fa2f7b8494d5573a43] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
> # good: [4c7518062d638837cea915e0ffe30f846780639a] ASoC: SOF: ipc4: Add support for split firmware releases
> # good: [0a7c85b516830c0bb088b0bdb2f2c50c76fc531a] regulator: ad5398: Fix incorrect power down bit mask
> # good: [215705db51eb23052c73126d2efb6acbc2db0424] spi: Replace custom fsleep() implementation
> # good: [6603c5133daadbb3277fbd93be0d0d5b8ec928e8] ASoC: dt-bindings: atmel,at91-ssc: Convert to YAML format
> # good: [25fac20edd09b60651eabcc57c187b1277f43d08] spi: gpio: Support a single always-selected device
> # good: [652ffad172d089acb1a20e5fde1b66e687832b06] spi: fsi: Batch TX operations
> # good: [6eab7034579917f207ca6d8e3f4e11e85e0ab7d5] ASoC: soc-core: Stop using of_property_read_bool() for non-boolean properties
> # good: [f5aab0438ef17f01c5ecd25e61ae6a03f82a4586] regulator: pca9450: Fix enable register for LDO5
> # good: [c1ac98492d1584d31f335d233a5cd7a4d4116e5a] spi: realtek-rtl-snand: Drop unneeded assignment for cache_type
> # good: [5a6a461079decea452fdcae955bccecf92e07e97] regulator: ad5398: Add device tree support
> # good: [995cf0e014b0144edf1125668a97c252c5ab775e] regmap: Reorder 'struct regmap'
> # good: [89785306453ce6d949e783f6936821a0b7649ee2] spi: zynqmp-gqspi: Always acknowledge interrupts
> # good: [0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d] Merge branch 'for-5.19/cleanup' into for-next
> git bisect start 'ed58d103e6da15a442ff87567898768dc3a66987' '653cd79f296fc6fa592cb9fa2f7b8494d5573a43' '4c7518062d638837cea915e0ffe30f846780639a' '0a7c85b516830c0bb088b0bdb2f2c50c76fc531a' '215705db51eb23052c73126d2efb6acbc2db0424' '6603c5133daadbb3277fbd93be0d0d5b8ec928e8' '25fac20edd09b60651eabcc57c187b1277f43d08' '652ffad172d089acb1a20e5fde1b66e687832b06' '6eab7034579917f207ca6d8e3f4e11e85e0ab7d5' 'f5aab0438ef17f01c5ecd25e61ae6a03f82a4586' 'c1ac98492d1584d31f335d233a5cd7a4d4116e5a' '5a6a461079decea452fdcae955bccecf92e07e97' '995cf0e014b0144edf1125668a97c252c5ab775e' '89785306453ce6d949e783f6936821a0b7649ee2' '0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d'
> # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207
> git bisect bad ed58d103e6da15a442ff87567898768dc3a66987
> # bad: [5a44f71c6ba5b0a7623c25047ac61ed852afbd84] Merge branch 'for-linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git
> git bisect bad 5a44f71c6ba5b0a7623c25047ac61ed852afbd84
> # bad: [953e43d94b2048d260774e9ddbfd3f378aa2c256] Merge branch 'fs-next' of linux-next
> git bisect bad 953e43d94b2048d260774e9ddbfd3f378aa2c256
> # good: [2ab7baf7ebddff9faff6846648fd0753e5ee58c1] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git
> git bisect good 2ab7baf7ebddff9faff6846648fd0753e5ee58c1
> # good: [09c7fd0d4465df7fd6cda5668bd32267884b6cbf] Merge branch '9p-next' of git://github.com/martinetd/linux
> git bisect good 09c7fd0d4465df7fd6cda5668bd32267884b6cbf
> # bad: [7a1f00b09c09aadbf33660a31f3f3d8ee6302c45] Merge branch 'vfs-6.15.iomap' into vfs.all
> git bisect bad 7a1f00b09c09aadbf33660a31f3f3d8ee6302c45
> # good: [9bc19073026d0dce6b7d17d22e74643c80283f8f] Merge branch 'vfs-6.15.mount' into vfs.all
> git bisect good 9bc19073026d0dce6b7d17d22e74643c80283f8f
> # bad: [ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c] Merge branch 'kernel-6.15.tasklist_lock' into vfs.all
> git bisect bad ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c
> # good: [d7c340391cb0f0882bc8d5d9798ef32b577a89bc] Merge branch 'vfs-6.15.pipe' into vfs.all
> git bisect good d7c340391cb0f0882bc8d5d9798ef32b577a89bc
> # good: [e88fed94388f62a28acfef4954348abe79557d19] pid: sprinkle tasklist_lock asserts
> git bisect good e88fed94388f62a28acfef4954348abe79557d19
> # bad: [5ca27e0557d722dd648f949dde6e4997f6255f18] pid: drop irq disablement around pidmap_lock
> git bisect bad 5ca27e0557d722dd648f949dde6e4997f6255f18
> # bad: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock
> git bisect bad 88dec855ce117f52bcf88748ddcbb25b10f4f2fc
> # first bad commit: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock



-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock
  2025-02-07 18:31   ` Mark Brown
  2025-02-07 19:45     ` Mateusz Guzik
@ 2025-02-07 19:51     ` Mateusz Guzik
  1 sibling, 0 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-07 19:51 UTC (permalink / raw)
  To: Mark Brown
  Cc: ebiederm, oleg, brauner, akpm, Liam.Howlett, linux-mm,
	linux-kernel, Aishwarya.TCV

... and found

next-20250207 is somehow missing a call to free_pids() in release_task().

There was some fat-fingering in v5 and maybe previous versions which
made the call land in the wrong commit, I presume there was further
crappery elsewhere which concluded in the call not being present to
begin with.

I just confirmed the version is is intended to land has the call in
the right place:
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=kernel-6.15.tasklist_lock&id=7903f907a226058ed99f86e9924e082aea57fc45

As in this chunk fell out in -next:
diff --git a/kernel/exit.c b/kernel/exit.c
index b4fc3cc96ea4..0d6df671c8a8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -289,6 +289,7 @@ void release_task(struct task_struct *p)
        put_pid(thread_pid);
        add_device_randomness(&p->se.sum_exec_runtime,
                              sizeof(p->se.sum_exec_runtime));
+       free_pids(post.pids);
        release_thread(p);
        put_task_struct_rcu_user(p);

On Fri, Feb 7, 2025 at 7:31 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Wed, Feb 05, 2025 at 09:09:28PM +0100, Mateusz Guzik wrote:
> > As the clone side already executes pid allocation with only pidmap_lock
> > held, issuing free_pid() while still holding tasklist_lock exacerbates
> > total hold time of the latter.
> >
> > The pid array is smuggled through newly added release_task_post struct
> > so that any extra things want to get moved out have means to do it.
>
> We are seeing failures in -next with the clone3 clone3_set_tid selftest
> on at least arm and arm64 systems which have been bisected to this
> commit, in -next as 88dec855ce117f52bcf88748ddcbb25b10f4f2fc (same
> bisect for both):
>
> For arm64 we're seeing a bunch of new failures including:
>
> # # [389] Trying clone3() with CLONE_SET_TID to 391 and 0x0
> # # File exists - Failed to create new process
> # # [389] clone3() with CLONE_SET_TID 391 says: -17 - expected 0
> # not ok 19 reallocate child TID with 1 TIDs and flags 0x0
>
> # # [389] Trying clone3() with CLONE_SET_TID to 1 and 0x20000000
> # # File exists - Failed to create new process
> # # [389] clone3() with CLONE_SET_TID 1 says: -17 - expected 0
> # not ok 21 create PID 1 in new NS with 2 TIDs and flags 0x20000000
>
> # # [1] Trying clone3() with CLONE_SET_TID to 43 and 0x0
> # # File exists - Failed to create new process
> # # [1] clone3() with CLONE_SET_TID 43 says: -17 - expected 0
> # not ok 24 check leak on invalid specific TID with 2 TIDs and flags 0x0
>
> but there's more, 32 bit only seems to see one new failure (at least on
> Beaglebone Black which is where I have that test running).
>
> Full log for a run on arm64:
>
>    https://lava.sirena.org.uk/scheduler/job/1102114
>
> and arm:
>
>    https://lava.sirena.org.uk/scheduler/job/1102088
>
> Bisect log, the start is my tooling feeding in test results it already
> has to hand in the commit range to seed the bisect:
>
> # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207
> # good: [653cd79f296fc6fa592cb9fa2f7b8494d5573a43] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
> # good: [4c7518062d638837cea915e0ffe30f846780639a] ASoC: SOF: ipc4: Add support for split firmware releases
> # good: [0a7c85b516830c0bb088b0bdb2f2c50c76fc531a] regulator: ad5398: Fix incorrect power down bit mask
> # good: [215705db51eb23052c73126d2efb6acbc2db0424] spi: Replace custom fsleep() implementation
> # good: [6603c5133daadbb3277fbd93be0d0d5b8ec928e8] ASoC: dt-bindings: atmel,at91-ssc: Convert to YAML format
> # good: [25fac20edd09b60651eabcc57c187b1277f43d08] spi: gpio: Support a single always-selected device
> # good: [652ffad172d089acb1a20e5fde1b66e687832b06] spi: fsi: Batch TX operations
> # good: [6eab7034579917f207ca6d8e3f4e11e85e0ab7d5] ASoC: soc-core: Stop using of_property_read_bool() for non-boolean properties
> # good: [f5aab0438ef17f01c5ecd25e61ae6a03f82a4586] regulator: pca9450: Fix enable register for LDO5
> # good: [c1ac98492d1584d31f335d233a5cd7a4d4116e5a] spi: realtek-rtl-snand: Drop unneeded assignment for cache_type
> # good: [5a6a461079decea452fdcae955bccecf92e07e97] regulator: ad5398: Add device tree support
> # good: [995cf0e014b0144edf1125668a97c252c5ab775e] regmap: Reorder 'struct regmap'
> # good: [89785306453ce6d949e783f6936821a0b7649ee2] spi: zynqmp-gqspi: Always acknowledge interrupts
> # good: [0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d] Merge branch 'for-5.19/cleanup' into for-next
> git bisect start 'ed58d103e6da15a442ff87567898768dc3a66987' '653cd79f296fc6fa592cb9fa2f7b8494d5573a43' '4c7518062d638837cea915e0ffe30f846780639a' '0a7c85b516830c0bb088b0bdb2f2c50c76fc531a' '215705db51eb23052c73126d2efb6acbc2db0424' '6603c5133daadbb3277fbd93be0d0d5b8ec928e8' '25fac20edd09b60651eabcc57c187b1277f43d08' '652ffad172d089acb1a20e5fde1b66e687832b06' '6eab7034579917f207ca6d8e3f4e11e85e0ab7d5' 'f5aab0438ef17f01c5ecd25e61ae6a03f82a4586' 'c1ac98492d1584d31f335d233a5cd7a4d4116e5a' '5a6a461079decea452fdcae955bccecf92e07e97' '995cf0e014b0144edf1125668a97c252c5ab775e' '89785306453ce6d949e783f6936821a0b7649ee2' '0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d'
> # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207
> git bisect bad ed58d103e6da15a442ff87567898768dc3a66987
> # bad: [5a44f71c6ba5b0a7623c25047ac61ed852afbd84] Merge branch 'for-linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git
> git bisect bad 5a44f71c6ba5b0a7623c25047ac61ed852afbd84
> # bad: [953e43d94b2048d260774e9ddbfd3f378aa2c256] Merge branch 'fs-next' of linux-next
> git bisect bad 953e43d94b2048d260774e9ddbfd3f378aa2c256
> # good: [2ab7baf7ebddff9faff6846648fd0753e5ee58c1] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git
> git bisect good 2ab7baf7ebddff9faff6846648fd0753e5ee58c1
> # good: [09c7fd0d4465df7fd6cda5668bd32267884b6cbf] Merge branch '9p-next' of git://github.com/martinetd/linux
> git bisect good 09c7fd0d4465df7fd6cda5668bd32267884b6cbf
> # bad: [7a1f00b09c09aadbf33660a31f3f3d8ee6302c45] Merge branch 'vfs-6.15.iomap' into vfs.all
> git bisect bad 7a1f00b09c09aadbf33660a31f3f3d8ee6302c45
> # good: [9bc19073026d0dce6b7d17d22e74643c80283f8f] Merge branch 'vfs-6.15.mount' into vfs.all
> git bisect good 9bc19073026d0dce6b7d17d22e74643c80283f8f
> # bad: [ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c] Merge branch 'kernel-6.15.tasklist_lock' into vfs.all
> git bisect bad ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c
> # good: [d7c340391cb0f0882bc8d5d9798ef32b577a89bc] Merge branch 'vfs-6.15.pipe' into vfs.all
> git bisect good d7c340391cb0f0882bc8d5d9798ef32b577a89bc
> # good: [e88fed94388f62a28acfef4954348abe79557d19] pid: sprinkle tasklist_lock asserts
> git bisect good e88fed94388f62a28acfef4954348abe79557d19
> # bad: [5ca27e0557d722dd648f949dde6e4997f6255f18] pid: drop irq disablement around pidmap_lock
> git bisect bad 5ca27e0557d722dd648f949dde6e4997f6255f18
> # bad: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock
> git bisect bad 88dec855ce117f52bcf88748ddcbb25b10f4f2fc
> # first bad commit: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock



-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-02-07 19:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik
2025-02-05 20:56   ` Oleg Nesterov
2025-02-06 10:40     ` Christian Brauner
2025-02-05 20:09 ` [PATCH v5 2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock Mateusz Guzik
2025-02-05 20:09 ` [PATCH v5 3/5] pid: sprinkle tasklist_lock asserts Mateusz Guzik
2025-02-05 20:09 ` [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Mateusz Guzik
2025-02-07 18:31   ` Mark Brown
2025-02-07 19:45     ` Mateusz Guzik
2025-02-07 19:51     ` Mateusz Guzik
2025-02-05 20:09 ` [PATCH v5 5/5] pid: drop irq disablement around pidmap_lock Mateusz Guzik
2025-02-05 20:29 ` [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.