public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] getrusage: Use trylock when getting sighand lock.
@ 2024-01-17 19:25 Dylan Hatch
  2024-01-17 20:44 ` Oleg Nesterov
  2024-01-19 14:15 ` [PATCH 1/2] getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand() Oleg Nesterov
  0 siblings, 2 replies; 11+ messages in thread
From: Dylan Hatch @ 2024-01-17 19:25 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Kees Cook, Frederic Weisbecker,
	Joel Fernandes (Google), Dylan Hatch, Ard Biesheuvel,
	Matthew Wilcox (Oracle), Thomas Gleixner,
	Sebastian Andrzej Siewior, Eric W. Biederman, Vincent Whitchurch,
	Dmitry Vyukov, Luis Chamberlain, Mike Christie, David Hildenbrand,
	Catalin Marinas, Stefan Roesch, Joey Gouly, Josh Triplett,
	Helge Deller, Ondrej Mosnacek, Florent Revest, Miguel Ojeda
  Cc: linux-kernel

Processes with many threads run the risk of causing a hard lockup if
too many threads are calling getrusage() at once. This is because a
calling thread with RUSAGE_SELF spins on the sighand lock with irq
disabled, and the critical section of getrusage scales linearly with the
size of the process. All cpus may end up spinning on the sighand lock
for a long time because another thread has the lock and is busy
iterating over 250k+ threads.

In order to mitigate this, periodically re-enable interrupts while
waiting for the sighand lock. This approach lacks the FIFO fairness of a
normal spinlock mechanism, but this effect is somewhat contained to
different threads within the same process.

-- Alternatives Considered --

In an earlier version of the above approach, we added a cond_resched()
call when disabling interrupts to prevent soft lockups. This solution
turned out not to be workable on its own since getrusage() is called
from a non-preemptible context in kernel/exit.c, but could possibly be
adapted by having an alternate version of getrusage() that can be called
from a preemptible context.

Another alternative would be to have getruage() itself release the lock
and enable interrupts periodically while iterating over large numbers of
threads. However, this would be difficult to implement correctly, and
the correctness/consistency of the data reported by getrusage() would be
questionable.

One final alternative might be to add a per-process mutex for callers of
getrusage() to acquire before acquiring the sighand lock, or to be used
as a total replacement of the sigahnd lock. We haven't fully explored
what the implications of this might be.

Signed-off-by: Dylan Hatch <dylanbhatch@google.com>
---
 include/linux/sched/signal.h | 13 +++++++++++
 kernel/signal.c              | 43 ++++++++++++++++++++++++++++++++++++
 kernel/sys.c                 |  8 ++++++-
 3 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3499c1a8b9295..7852f16139965 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -747,6 +747,19 @@ static inline struct sighand_struct *lock_task_sighand(struct task_struct *task,
 	return ret;
 }
 
+extern struct sighand_struct *__lock_task_sighand_safe(struct task_struct *task,
+							unsigned long *flags);
+
+static inline struct sighand_struct *lock_task_sighand_safe(struct task_struct *task,
+						       unsigned long *flags)
+{
+	struct sighand_struct *ret;
+
+	ret = __lock_task_sighand_safe(task, flags);
+	(void)__cond_lock(&task->sighand->siglock, ret);
+	return ret;
+}
+
 static inline void unlock_task_sighand(struct task_struct *task,
 						unsigned long *flags)
 {
diff --git a/kernel/signal.c b/kernel/signal.c
index 47a7602dfe8df..6d60c73b7ab91 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1397,6 +1397,49 @@ int zap_other_threads(struct task_struct *p)
 	return count;
 }
 
+struct sighand_struct *__lock_task_sighand_safe(struct task_struct *tsk,
+						unsigned long *flags)
+{
+	struct sighand_struct *sighand;
+	int n;
+	bool lock = false;
+
+again:
+	rcu_read_lock();
+	local_irq_save(*flags);
+	for (n = 0; n < 500; n++) {
+		sighand = rcu_dereference(tsk->sighand);
+		if (unlikely(sighand == NULL))
+			break;
+
+		/*
+		 * The downside of this approach is we loose the fairness of
+		 * FIFO waiting because the acqusition order between multiple
+		 * waiting tasks is effectively random.
+		 */
+		lock = spin_trylock(&sighand->siglock);
+		if (!lock) {
+			cpu_relax();
+			continue;
+		}
+
+		/* __lock_task_sighand has context explaining this check. */
+		if (likely(sighand == rcu_access_pointer(tsk->sighand)))
+			break;
+		spin_unlock(&sighand->siglock);
+		lock = false;
+	}
+	rcu_read_unlock();
+
+	/* Handle pending IRQ */
+	if (!lock && sighand) {
+		local_irq_restore(*flags);
+		goto again;
+	}
+
+	return sighand;
+}
+
 struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
 					   unsigned long *flags)
 {
diff --git a/kernel/sys.c b/kernel/sys.c
index e219fcfa112d8..1b1254a3c506b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1798,7 +1798,13 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
 		goto out;
 	}
 
-	if (!lock_task_sighand(p, &flags))
+	/*
+	 * We use lock_task_sighand_safe here instead of lock_task_sighand
+	 * because one process with many threads all calling getrusage may
+	 * otherwise cause an NMI watchdog timeout by disabling IRQs for too
+	 * long.
+	 */
+	if (!lock_task_sighand_safe(p, &flags))
 		return;
 
 	switch (who) {
-- 
2.43.0.381.gb435a96ce8-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-01-23  2:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-17 19:25 [RFC PATCH] getrusage: Use trylock when getting sighand lock Dylan Hatch
2024-01-17 20:44 ` Oleg Nesterov
2024-01-18 15:56   ` Oleg Nesterov
2024-01-19 14:15 ` [PATCH 1/2] getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand() Oleg Nesterov
2024-01-19 14:15   ` [PATCH 2/2] getrusage: use sig->stats_lock Oleg Nesterov
2024-01-20  3:27     ` Dylan Hatch
2024-01-21  4:45       ` Andrew Morton
2024-01-21 12:07         ` Oleg Nesterov
2024-01-23  2:53           ` Dylan Hatch
2024-01-21 22:32       ` Andrew Morton
2024-01-20  3:29   ` [PATCH 1/2] getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand() Dylan Hatch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox