[RFC PATCH 0/2] disable optimistic spinning for ftrace

public inbox for linux-trace-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/2] disable optimistic spinning for ftrace_lock
@ 2026-03-04  7:46 Yafang Shao
  2026-03-04  7:46 ` [RFC PATCH 1/2] locking: add mutex_lock_nospin() Yafang Shao
  2026-03-04  7:46 ` [RFC PATCH 2/2] ftrace: disable optimistic spinning for ftrace_lock Yafang Shao
  0 siblings, 2 replies; 29+ messages in thread
From: Yafang Shao @ 2026-03-04  7:46 UTC (permalink / raw)
  To: peterz, mingo, will, boqun, longman, rostedt, mhiramat,
	mark.rutland, mathieu.desnoyers
  Cc: linux-kernel, linux-trace-kernel, bpf, Yafang Shao

Background
==========

One of our latency-sensitive services reported random CPU %sys spikes.
After a thorough investigation, we finally identified the root cause of the
CPU %sys spikes. The key kernel stacks are as follows:

- Task A

2026-02-14-16:53:40.938243: [CPU198] 2156302(bpftrace) cgrp:4019437 pod:4019253

        find_kallsyms_symbol+142
        module_address_lookup+104
        kallsyms_lookup_buildid+203
        kallsyms_lookup+20
        print_rec+64
        t_show+67
        seq_read_iter+709
        seq_read+165
        vfs_read+165
        ksys_read+103
        __x64_sys_read+25
        do_syscall_64+56
        entry_SYSCALL_64_after_hwframe+100

This task (2156302, bpftrace) is reading the
/sys/kernel/debug/tracing/available_filter_functions to check if a function
is traceable:

  https://github.com/bpftrace/bpftrace/blob/master/src/tracefs/tracefs.h#L21

Reading the available_filter_functions file is time-consuming, as it
contains tens of thousands of functions:

  $ cat /sys/kernel/debug/tracing/available_filter_functions | wc -l
  61081

  $ time cat /sys/kernel/debug/tracing/available_filter_functions > /dev/null

  real    0m0.452s
  user    0m0.000s
  sys     0m0.452s

Consequently, the ftrace_lock is held by this task for an extended period.

- Other Tasks

2026-02-14-16:53:41.437094: [CPU79] 2156308(bpftrace) cgrp:4019437 pod:4019253

        mutex_spin_on_owner+108
        __mutex_lock.constprop.0+1132
        __mutex_lock_slowpath+19
        mutex_lock+56
        t_start+51
        seq_read_iter+250
        seq_read+165
        vfs_read+165
        ksys_read+103
        __x64_sys_read+25
        do_syscall_64+56
        entry_SYSCALL_64_after_hwframe+100

Since ftrace_lock is held by Task-A and Task-A is actively running on a
CPU, all other tasks waiting for the same lock will spin on their
respective CPUs. This leads to increased CPU pressure.

Reproduction
============

This issue can be reproduced simply by running
`cat available_filter_functions`.

- Single process reading available_filter_functions:

  $ time cat /sys/kernel/tracing/available_filter_functions > /dev/null

  real    0m0.452s
  user    0m0.001s
  sys     0m0.451s

- Six processes reading available_filter_functions simultaneously:

  for i in `seq 0 5`; do
    time cat /sys/kernel/tracing/available_filter_functions > /dev/null &
  done

  The results are as follows:

  real    0m1.801s
  user    0m0.000s
  sys     0m1.779s

  real    0m1.804s
  user    0m0.001s
  sys     0m1.791s

  real    0m1.805s
  user    0m0.000s
  sys     0m1.792s

  real    0m1.806s
  user    0m0.001s
  sys     0m1.796s

  As more processes are added, the system time increases correspondingly.

Solution
========

One approach is to optimize the reading of available_filter_functions to
make it as fast as possible. However, the risk lies in the contention
caused by optimistic spin locking.

Therefore, we need to consider an alternative solution that avoids
optimistic spinning for heavy mutexes that may be held for long durations.
Note that we do not want to disable CONFIG_MUTEX_SPIN_ON_OWNER entirely, as
that could lead to unexpected performance regressions.

In this patch, a new wrapper mutex_lock_nospin() is used for ftrace_lock to
selectively disable optimistic spinning.

Yafang Shao (2):
  locking: add mutex_lock_nospin()
  ftrace: disable optimistic spinning for ftrace_lock

 include/linux/mutex.h  |  3 +++
 kernel/locking/mutex.c | 39 +++++++++++++++++++++++++------
 kernel/trace/ftrace.c  | 52 +++++++++++++++++++++---------------------
 3 files changed, 61 insertions(+), 33 deletions(-)

-- 
2.47.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-04  7:46 [RFC PATCH 0/2] disable optimistic spinning for ftrace_lock Yafang Shao
@ 2026-03-04  7:46 ` Yafang Shao
  2026-03-04  9:02   ` Peter Zijlstra
  2026-03-04  7:46 ` [RFC PATCH 2/2] ftrace: disable optimistic spinning for ftrace_lock Yafang Shao
  1 sibling, 1 reply; 29+ messages in thread
From: Yafang Shao @ 2026-03-04  7:46 UTC (permalink / raw)
  To: peterz, mingo, will, boqun, longman, rostedt, mhiramat,
	mark.rutland, mathieu.desnoyers
  Cc: linux-kernel, linux-trace-kernel, bpf, Yafang Shao

Introduce mutex_lock_nospin(), a helper that disables optimistic spinning
on the owner for specific heavy locks. This prevents long spinning times
that can lead to latency spikes for other tasks on the same runqueue.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/mutex.h  |  3 +++
 kernel/locking/mutex.c | 39 ++++++++++++++++++++++++++++++++-------
 2 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index ecaa0440f6ec..1e488bb24b57 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -189,11 +189,13 @@ extern int __must_check mutex_lock_interruptible_nested(struct mutex *lock,
 extern int __must_check _mutex_lock_killable(struct mutex *lock,
 		unsigned int subclass, struct lockdep_map *nest_lock) __cond_acquires(0, lock);
 extern void mutex_lock_io_nested(struct mutex *lock, unsigned int subclass) __acquires(lock);
+extern void mutex_lock_nospin_nested(struct mutex *lock, unsigned int subclass);
 
 #define mutex_lock(lock) mutex_lock_nested(lock, 0)
 #define mutex_lock_interruptible(lock) mutex_lock_interruptible_nested(lock, 0)
 #define mutex_lock_killable(lock) _mutex_lock_killable(lock, 0, NULL)
 #define mutex_lock_io(lock) mutex_lock_io_nested(lock, 0)
+#define mutex_lock_nospin(lock) mutex_lock_nospin_nested(lock, 0)
 
 #define mutex_lock_nest_lock(lock, nest_lock)				\
 do {									\
@@ -215,6 +217,7 @@ extern void mutex_lock(struct mutex *lock) __acquires(lock);
 extern int __must_check mutex_lock_interruptible(struct mutex *lock) __cond_acquires(0, lock);
 extern int __must_check mutex_lock_killable(struct mutex *lock) __cond_acquires(0, lock);
 extern void mutex_lock_io(struct mutex *lock) __acquires(lock);
+extern void mutex_lock_nospin(struct mutex *lock) __acquires(lock);
 
 # define mutex_lock_nested(lock, subclass) mutex_lock(lock)
 # define mutex_lock_interruptible_nested(lock, subclass) mutex_lock_interruptible(lock)
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 2a1d165b3167..03d3b0749882 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -290,6 +290,14 @@ void __sched mutex_lock(struct mutex *lock)
 		__mutex_lock_slowpath(lock);
 }
 EXPORT_SYMBOL(mutex_lock);
+
+void __sched mutex_lock_nospin(struct mutex *lock)
+{
+	might_sleep();
+
+	if (!__mutex_trylock_fast(lock))
+		__mutex_lock_nospin(lock);
+}
 #endif
 
 #include "ww_mutex.h"
@@ -443,8 +451,11 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
  */
 static __always_inline bool
 mutex_optimistic_spin(struct mutex *lock, struct ww_acquire_ctx *ww_ctx,
-		      struct mutex_waiter *waiter)
+		      struct mutex_waiter *waiter, const bool nospin)
 {
+	if (nospin)
+		return false;
+
 	if (!waiter) {
 		/*
 		 * The purpose of the mutex_can_spin_on_owner() function is
@@ -577,7 +588,8 @@ EXPORT_SYMBOL(ww_mutex_unlock);
 static __always_inline int __sched
 __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclass,
 		    struct lockdep_map *nest_lock, unsigned long ip,
-		    struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
+		    struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx,
+		    const bool nospin)
 {
 	DEFINE_WAKE_Q(wake_q);
 	struct mutex_waiter waiter;
@@ -615,7 +627,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 
 	trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
 	if (__mutex_trylock(lock) ||
-	    mutex_optimistic_spin(lock, ww_ctx, NULL)) {
+	    mutex_optimistic_spin(lock, ww_ctx, NULL, nospin)) {
 		/* got the lock, yay! */
 		lock_acquired(&lock->dep_map, ip);
 		if (ww_ctx)
@@ -716,7 +728,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 			 * to run.
 			 */
 			clear_task_blocked_on(current, lock);
-			if (mutex_optimistic_spin(lock, ww_ctx, &waiter))
+			if (mutex_optimistic_spin(lock, ww_ctx, &waiter, nospin))
 				break;
 			set_task_blocked_on(current, lock);
 			trace_contention_begin(lock, LCB_F_MUTEX);
@@ -773,14 +785,21 @@ static int __sched
 __mutex_lock(struct mutex *lock, unsigned int state, unsigned int subclass,
 	     struct lockdep_map *nest_lock, unsigned long ip)
 {
-	return __mutex_lock_common(lock, state, subclass, nest_lock, ip, NULL, false);
+	return __mutex_lock_common(lock, state, subclass, nest_lock, ip, NULL, false, false);
+}
+
+static int __sched
+__mutex_lock_nospin(struct mutex *lock, unsigned int state, unsigned int subclass,
+		    struct lockdep_map *nest_lock, unsigned long ip)
+{
+	return __mutex_lock_common(lock, state, subclass, nest_lock, ip, NULL, false, true);
 }
 
 static int __sched
 __ww_mutex_lock(struct mutex *lock, unsigned int state, unsigned int subclass,
 		unsigned long ip, struct ww_acquire_ctx *ww_ctx)
 {
-	return __mutex_lock_common(lock, state, subclass, NULL, ip, ww_ctx, true);
+	return __mutex_lock_common(lock, state, subclass, NULL, ip, ww_ctx, true, false);
 }
 
 /**
@@ -861,11 +880,17 @@ mutex_lock_io_nested(struct mutex *lock, unsigned int subclass)
 
 	token = io_schedule_prepare();
 	__mutex_lock_common(lock, TASK_UNINTERRUPTIBLE,
-			    subclass, NULL, _RET_IP_, NULL, 0);
+			    subclass, NULL, _RET_IP_, NULL, 0, false);
 	io_schedule_finish(token);
 }
 EXPORT_SYMBOL_GPL(mutex_lock_io_nested);
 
+void __sched
+mutex_lock_nospin_nested(struct mutex *lock, unsigned int subclass)
+{
+	__mutex_lock_nospin(lock, TASK_UNINTERRUPTIBLE, subclass, NULL, _RET_IP_);
+}
+
 static inline int
 ww_mutex_deadlock_injection(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-04  7:46 ` [RFC PATCH 1/2] locking: add mutex_lock_nospin() Yafang Shao
@ 2026-03-04  9:02   ` Peter Zijlstra
  2026-03-04  9:37     ` Yafang Shao
  2026-03-04  9:54     ` David Laight
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2026-03-04  9:02 UTC (permalink / raw)
  To: Yafang Shao
  Cc: mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland,
	mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf

On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote:
> Introduce mutex_lock_nospin(), a helper that disables optimistic spinning
> on the owner for specific heavy locks. This prevents long spinning times
> that can lead to latency spikes for other tasks on the same runqueue.

This makes no sense; spinning stops on need_resched().

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-04  9:02   ` Peter Zijlstra
@ 2026-03-04  9:37     ` Yafang Shao
  2026-03-04 10:11       ` Peter Zijlstra
  2026-03-04  9:54     ` David Laight
  1 sibling, 1 reply; 29+ messages in thread
From: Yafang Shao @ 2026-03-04  9:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland,
	mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf

On Wed, Mar 4, 2026 at 5:03 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote:
> > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning
> > on the owner for specific heavy locks. This prevents long spinning times
> > that can lead to latency spikes for other tasks on the same runqueue.
>
> This makes no sense; spinning stops on need_resched().

Hello Peter,

The condition to stop spinning on need_resched() relies on the mutex
owner remaining unchanged. However, when multiple tasks contend for
the same lock, the owner can change frequently. This creates a
potential TOCTOU (Time of Check to Time of Use) issue.

  mutex_optimistic_spin
      owner = __mutex_trylock_or_owner(lock);
      mutex_spin_on_owner
          // the __mutex_owner(lock) might get a new owner.
          while (__mutex_owner(lock) == owner)

We observed high CPU pressure in production when this scenario occurred.

Below are the benchmark results when running 10 concurrent tasks:

for i in `seq 0 9`; do
        time cat /sys/kernel/debug/tracing/available_filter_functions
> /dev/null &
done

- before this patch

real    0m4.636s    user 0m0.001s    sys 0m3.773s
real    0m5.157s    user 0m0.001s    sys 0m4.362s
real    0m5.205s    user 0m0.000s    sys 0m4.538s
real    0m5.212s    user 0m0.001s    sys 0m4.700s
real    0m5.246s    user 0m0.001s    sys 0m4.501s
real    0m5.254s    user 0m0.003s    sys 0m4.335s
real    0m5.260s    user 0m0.003s    sys 0m4.525s
real    0m5.267s    user 0m0.004s    sys 0m4.482s
real    0m5.273s    user 0m0.002s    sys 0m4.215s
real    0m5.285s    user 0m0.003s    sys 0m4.373s


- after this patch

real    0m4.733s    user 0m0.002s    sys 0m0.511s
real    0m4.740s    user 0m0.001s    sys 0m0.509s
real    0m4.862s    user 0m0.001s    sys 0m0.513s
real    0m4.884s    user 0m0.000s    sys 0m0.507s
real    0m4.888s    user 0m0.003s    sys 0m0.513s
real    0m4.888s    user 0m0.000s    sys 0m0.511s
real    0m4.886s    user 0m0.003s    sys 0m0.508s
real    0m4.952s    user 0m0.000s    sys 0m0.513s
real    0m4.973s    user 0m0.001s    sys 0m0.510s
real    0m5.042s    user 0m0.002s    sys 0m0.515s

The results show that system time dropped dramatically from ~4.5
seconds to ~0.5 seconds, confirming that the patch can help reduce the
issue.

Please correct me if I've misunderstood anything.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-04  9:37     ` Yafang Shao
@ 2026-03-04 10:11       ` Peter Zijlstra
  2026-03-04 11:52         ` Yafang Shao
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2026-03-04 10:11 UTC (permalink / raw)
  To: Yafang Shao
  Cc: mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland,
	mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf

On Wed, Mar 04, 2026 at 05:37:31PM +0800, Yafang Shao wrote:
> On Wed, Mar 4, 2026 at 5:03 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote:
> > > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning
> > > on the owner for specific heavy locks. This prevents long spinning times
> > > that can lead to latency spikes for other tasks on the same runqueue.
> >
> > This makes no sense; spinning stops on need_resched().
> 
> Hello Peter,
> 
> The condition to stop spinning on need_resched() relies on the mutex
> owner remaining unchanged. However, when multiple tasks contend for
> the same lock, the owner can change frequently. This creates a
> potential TOCTOU (Time of Check to Time of Use) issue.
> 
>   mutex_optimistic_spin
>       owner = __mutex_trylock_or_owner(lock);
>       mutex_spin_on_owner
>           // the __mutex_owner(lock) might get a new owner.
>           while (__mutex_owner(lock) == owner)
> 

How do these new owners become the owner? Are they succeeding the
__mutex_trylock() that sits before mutex_optimistic_spin() and
effectively starving the spinner?

Something like the below would make a difference if that were so.

---
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index c867f6c15530..0796e77a8c3b 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -521,7 +521,7 @@ static __always_inline bool
 mutex_optimistic_spin(struct mutex *lock, struct ww_acquire_ctx *ww_ctx,
 		      struct mutex_waiter *waiter)
 {
-	return false;
+	return __mutex_trylock(lock);
 }
 #endif
 
@@ -614,8 +614,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 	mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
 
 	trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
-	if (__mutex_trylock(lock) ||
-	    mutex_optimistic_spin(lock, ww_ctx, NULL)) {
+	if (mutex_optimistic_spin(lock, ww_ctx, NULL)) {
 		/* got the lock, yay! */
 		lock_acquired(&lock->dep_map, ip);
 		if (ww_ctx)


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-04 10:11       ` Peter Zijlstra
@ 2026-03-04 11:52         ` Yafang Shao
  2026-03-04 12:41           ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Yafang Shao @ 2026-03-04 11:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland,
	mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf

On Wed, Mar 4, 2026 at 6:11 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Mar 04, 2026 at 05:37:31PM +0800, Yafang Shao wrote:
> > On Wed, Mar 4, 2026 at 5:03 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote:
> > > > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning
> > > > on the owner for specific heavy locks. This prevents long spinning times
> > > > that can lead to latency spikes for other tasks on the same runqueue.
> > >
> > > This makes no sense; spinning stops on need_resched().
> >
> > Hello Peter,
> >
> > The condition to stop spinning on need_resched() relies on the mutex
> > owner remaining unchanged. However, when multiple tasks contend for
> > the same lock, the owner can change frequently. This creates a
> > potential TOCTOU (Time of Check to Time of Use) issue.
> >
> >   mutex_optimistic_spin
> >       owner = __mutex_trylock_or_owner(lock);
> >       mutex_spin_on_owner
> >           // the __mutex_owner(lock) might get a new owner.
> >           while (__mutex_owner(lock) == owner)
> >
>
> How do these new owners become the owner? Are they succeeding the
> __mutex_trylock() that sits before mutex_optimistic_spin() and
> effectively starving the spinner?
>
> Something like the below would make a difference if that were so.

The following change made no difference; concurrent runs still result
in prolonged system time.

real 0m5.265s user 0m0.000s sys 0m4.921s
real 0m5.295s user 0m0.002s sys 0m4.697s
real 0m5.293s user 0m0.003s sys 0m4.844s
real 0m5.303s user 0m0.001s sys 0m4.511s
real 0m5.303s user 0m0.000s sys 0m4.694s
real 0m5.302s user 0m0.002s sys 0m4.677s
real 0m5.313s user 0m0.000s sys 0m4.837s
real 0m5.327s user 0m0.000s sys 0m4.808s
real 0m5.330s user 0m0.001s sys 0m4.893s
real 0m5.358s user 0m0.005s sys 0m4.919s

Our kernel is not built with CONFIG_PREEMPT enabled, so prolonged
system time can lead to CPU pressure and potential latency spikes.
Since we can reliably reproduce this unnecessary spinning, why not
improve it to reduce the overhead?

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-04 11:52         ` Yafang Shao
@ 2026-03-04 12:41           ` Peter Zijlstra
  2026-03-04 14:25             ` Yafang Shao
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2026-03-04 12:41 UTC (permalink / raw)
  To: Yafang Shao
  Cc: mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland,
	mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf

On Wed, Mar 04, 2026 at 07:52:06PM +0800, Yafang Shao wrote:
> On Wed, Mar 4, 2026 at 6:11 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Mar 04, 2026 at 05:37:31PM +0800, Yafang Shao wrote:
> > > On Wed, Mar 4, 2026 at 5:03 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote:
> > > > > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning
> > > > > on the owner for specific heavy locks. This prevents long spinning times
> > > > > that can lead to latency spikes for other tasks on the same runqueue.
> > > >
> > > > This makes no sense; spinning stops on need_resched().
> > >
> > > Hello Peter,
> > >
> > > The condition to stop spinning on need_resched() relies on the mutex
> > > owner remaining unchanged. However, when multiple tasks contend for
> > > the same lock, the owner can change frequently. This creates a
> > > potential TOCTOU (Time of Check to Time of Use) issue.
> > >
> > >   mutex_optimistic_spin
> > >       owner = __mutex_trylock_or_owner(lock);
> > >       mutex_spin_on_owner
> > >           // the __mutex_owner(lock) might get a new owner.
> > >           while (__mutex_owner(lock) == owner)
> > >
> >
> > How do these new owners become the owner? Are they succeeding the
> > __mutex_trylock() that sits before mutex_optimistic_spin() and
> > effectively starving the spinner?
> >
> > Something like the below would make a difference if that were so.
> 
> The following change made no difference; concurrent runs still result
> in prolonged system time.
> 
> real 0m5.265s user 0m0.000s sys 0m4.921s
> real 0m5.295s user 0m0.002s sys 0m4.697s
> real 0m5.293s user 0m0.003s sys 0m4.844s
> real 0m5.303s user 0m0.001s sys 0m4.511s
> real 0m5.303s user 0m0.000s sys 0m4.694s
> real 0m5.302s user 0m0.002s sys 0m4.677s
> real 0m5.313s user 0m0.000s sys 0m4.837s
> real 0m5.327s user 0m0.000s sys 0m4.808s
> real 0m5.330s user 0m0.001s sys 0m4.893s
> real 0m5.358s user 0m0.005s sys 0m4.919s
> 
> Our kernel is not built with CONFIG_PREEMPT enabled, so prolonged
> system time can lead to CPU pressure and potential latency spikes.
> Since we can reliably reproduce this unnecessary spinning, why not
> improve it to reduce the overhead?

If you cannot explain what the problem is (you haven't), there is
nothing to fix.

Also, current kernels cannot be build without PREEMPT; and if you care
about latency running a PREEMPT=n kernel is daft. That said,
TIF_NEED_RESCHED should work irrespective of PREEMPT settings, the
PREEMPT settings just affect when and how you end up in schedule().

Even without PREEMPT, if there is a task waiting either the wakeup or
the tick will set TIF_NEED_RESCHED and it should stop spinning. If there
is no task waiting, there is no actual latency, just burning CPU time,
and that isn't a problem per-se.

What should happen is that the first spinner gets the lock next, the
next spinner is then promoted to first spinner and so on.

This chain continues, which means the lock owner is always
on-cpu and good progress is being made and there is no CPU contention,
or the spinner gets marked for preemption (as said, this does not
require PREEMPT=y) and will stop spinning and go sleep, or the owner
goes to sleep and all the spinners stop and also go sleep.

Again, you have not said anything specific enough to figure out what
happens on your end. You said the owner changes, this means there is
progress made. What isn't clear is if any one particular spinner is
starved (that would be a problem) or if this latency spike you observe
is worse than would be from running a while(1) loop, in which case,
that's just how it is.

What is not sane, is marking random locks with random properties just
because random workload.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-04 12:41           ` Peter Zijlstra
@ 2026-03-04 14:25             ` Yafang Shao
  0 siblings, 0 replies; 29+ messages in thread
From: Yafang Shao @ 2026-03-04 14:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland,
	mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf

On Wed, Mar 4, 2026 at 8:41 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Mar 04, 2026 at 07:52:06PM +0800, Yafang Shao wrote:
> > On Wed, Mar 4, 2026 at 6:11 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Wed, Mar 04, 2026 at 05:37:31PM +0800, Yafang Shao wrote:
> > > > On Wed, Mar 4, 2026 at 5:03 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > > > >
> > > > > On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote:
> > > > > > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning
> > > > > > on the owner for specific heavy locks. This prevents long spinning times
> > > > > > that can lead to latency spikes for other tasks on the same runqueue.
> > > > >
> > > > > This makes no sense; spinning stops on need_resched().
> > > >
> > > > Hello Peter,
> > > >
> > > > The condition to stop spinning on need_resched() relies on the mutex
> > > > owner remaining unchanged. However, when multiple tasks contend for
> > > > the same lock, the owner can change frequently. This creates a
> > > > potential TOCTOU (Time of Check to Time of Use) issue.
> > > >
> > > >   mutex_optimistic_spin
> > > >       owner = __mutex_trylock_or_owner(lock);
> > > >       mutex_spin_on_owner
> > > >           // the __mutex_owner(lock) might get a new owner.
> > > >           while (__mutex_owner(lock) == owner)
> > > >
> > >
> > > How do these new owners become the owner? Are they succeeding the
> > > __mutex_trylock() that sits before mutex_optimistic_spin() and
> > > effectively starving the spinner?
> > >
> > > Something like the below would make a difference if that were so.
> >
> > The following change made no difference; concurrent runs still result
> > in prolonged system time.
> >
> > real 0m5.265s user 0m0.000s sys 0m4.921s
> > real 0m5.295s user 0m0.002s sys 0m4.697s
> > real 0m5.293s user 0m0.003s sys 0m4.844s
> > real 0m5.303s user 0m0.001s sys 0m4.511s
> > real 0m5.303s user 0m0.000s sys 0m4.694s
> > real 0m5.302s user 0m0.002s sys 0m4.677s
> > real 0m5.313s user 0m0.000s sys 0m4.837s
> > real 0m5.327s user 0m0.000s sys 0m4.808s
> > real 0m5.330s user 0m0.001s sys 0m4.893s
> > real 0m5.358s user 0m0.005s sys 0m4.919s
> >
> > Our kernel is not built with CONFIG_PREEMPT enabled, so prolonged
> > system time can lead to CPU pressure and potential latency spikes.
> > Since we can reliably reproduce this unnecessary spinning, why not
> > improve it to reduce the overhead?
>
> If you cannot explain what the problem is (you haven't), there is
> nothing to fix.
>
> Also, current kernels cannot be build without PREEMPT; and if you care
> about latency running a PREEMPT=n kernel is daft. That said,
> TIF_NEED_RESCHED should work irrespective of PREEMPT settings, the
> PREEMPT settings just affect when and how you end up in schedule().
>
> Even without PREEMPT, if there is a task waiting either the wakeup or
> the tick will set TIF_NEED_RESCHED and it should stop spinning. If there
> is no task waiting, there is no actual latency, just burning CPU time,
> and that isn't a problem per-se.
>
> What should happen is that the first spinner gets the lock next, the
> next spinner is then promoted to first spinner and so on.
>
> This chain continues, which means the lock owner is always
> on-cpu and good progress is being made and there is no CPU contention,
> or the spinner gets marked for preemption (as said, this does not
> require PREEMPT=y) and will stop spinning and go sleep, or the owner
> goes to sleep and all the spinners stop and also go sleep.
>
> Again, you have not said anything specific enough to figure out what
> happens on your end. You said the owner changes, this means there is
> progress made. What isn't clear is if any one particular spinner is
> starved (that would be a problem)

As far as I can tell, no spinner is starved. The spinner and the
impacted task are interleaved, which is expected behavior.

> or if this latency spike you observe
> is worse than would be from running a while(1) loop, in which case,
> that's just how it is.

The latency spike occurs because the impacted task must wait for the
spinner to voluntarily yield the CPU. In effect, the spinner behaves
similarly to a while (1) {} loop.

So the real problem here is that we should avoid unnecessary spinning.
Is there any reason we have to spin to speed up the ftrace_lock? I
believe not.

>
> What is not sane, is marking random locks with random properties just
> because random workload.



-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-04  9:02   ` Peter Zijlstra
  2026-03-04  9:37     ` Yafang Shao
@ 2026-03-04  9:54     ` David Laight
  2026-03-04 20:57       ` Steven Rostedt
  1 sibling, 1 reply; 29+ messages in thread
From: David Laight @ 2026-03-04  9:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yafang Shao, mingo, will, boqun, longman, rostedt, mhiramat,
	mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel,
	bpf

On Wed, 4 Mar 2026 10:02:49 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote:
> > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning
> > on the owner for specific heavy locks. This prevents long spinning times
> > that can lead to latency spikes for other tasks on the same runqueue.  
> 
> This makes no sense; spinning stops on need_resched().
> 

That might still be an issue if a high priority process is spinning.
But a %sys spike doesn't imply a latency spike.

Is this using the osq_lock.c code?
That will have problems on overprovisioned VMs, it tries to find out
whether the hypervisor has switched out - but ISTR that is flawed.

In reality a spin lock shouldn't be held for long enough to cause
any kind latency issue.
So something in the code that reads the list of filter functions
needs to be done differently so that the lock isn't held for as long.

	David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-04  9:54     ` David Laight
@ 2026-03-04 20:57       ` Steven Rostedt
  2026-03-04 21:44         ` David Laight
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Rostedt @ 2026-03-04 20:57 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Yafang Shao, mingo, will, boqun, longman,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Wed, 4 Mar 2026 09:54:15 +0000
David Laight <david.laight.linux@gmail.com> wrote:

> That might still be an issue if a high priority process is spinning.
> But a %sys spike doesn't imply a latency spike.
> 
> Is this using the osq_lock.c code?
> That will have problems on overprovisioned VMs, it tries to find out
> whether the hypervisor has switched out - but ISTR that is flawed.
> 
> In reality a spin lock shouldn't be held for long enough to cause
> any kind latency issue.
> So something in the code that reads the list of filter functions
> needs to be done differently so that the lock isn't held for as long.

It's not a spinlock, it's an adaptive mutex which spins while the owner of
the mutex is also still running on the CPU. If the spinner CPU triggers a
NEED_RESCHED or the owner goes to sleep, the spinner stops spinning and
goes to sleep too.

Honestly, this still looks like a non-issue or a corner case that I don't
think requires these changes.

This looks like one of those "Patient: Doctor it hurts me when I do this.
Doctor: Then don't do that." cases.

Why is a production system having multiple users cat
avaliable_filter_functions to begin with?

-- Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-04 20:57       ` Steven Rostedt
@ 2026-03-04 21:44         ` David Laight
  2026-03-05  2:17           ` Yafang Shao
  0 siblings, 1 reply; 29+ messages in thread
From: David Laight @ 2026-03-04 21:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Yafang Shao, mingo, will, boqun, longman,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Wed, 4 Mar 2026 15:57:42 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 4 Mar 2026 09:54:15 +0000
> David Laight <david.laight.linux@gmail.com> wrote:
> 
> > That might still be an issue if a high priority process is spinning.
> > But a %sys spike doesn't imply a latency spike.
> > 
> > Is this using the osq_lock.c code?
> > That will have problems on overprovisioned VMs, it tries to find out
> > whether the hypervisor has switched out - but ISTR that is flawed.
> > 
> > In reality a spin lock shouldn't be held for long enough to cause
> > any kind latency issue.
> > So something in the code that reads the list of filter functions
> > needs to be done differently so that the lock isn't held for as long.  
> 
> It's not a spinlock, it's an adaptive mutex which spins while the owner of
> the mutex is also still running on the CPU. If the spinner CPU triggers a
> NEED_RESCHED or the owner goes to sleep, the spinner stops spinning and
> goes to sleep too.

I think half my brain knew that - otherwise I wouldn't have mentioned
the osq_lock.c code.
That all reminded me I've a patch that optimises that code a bit.
But I do remember thinking it ought to have a 'I been spinning long
enough, time to sleep' path.

	David 

> 
> Honestly, this still looks like a non-issue or a corner case that I don't
> think requires these changes.
> 
> This looks like one of those "Patient: Doctor it hurts me when I do this.
> Doctor: Then don't do that." cases.
> 
> Why is a production system having multiple users cat
> avaliable_filter_functions to begin with?
> 
> -- Steve


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-04 21:44         ` David Laight
@ 2026-03-05  2:17           ` Yafang Shao
  2026-03-05  2:28             ` Steven Rostedt
  0 siblings, 1 reply; 29+ messages in thread
From: Yafang Shao @ 2026-03-05  2:17 UTC (permalink / raw)
  To: David Laight
  Cc: Steven Rostedt, Peter Zijlstra, mingo, will, boqun, longman,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Thu, Mar 5, 2026 at 5:44 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Wed, 4 Mar 2026 15:57:42 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > On Wed, 4 Mar 2026 09:54:15 +0000
> > David Laight <david.laight.linux@gmail.com> wrote:
> >
> > > That might still be an issue if a high priority process is spinning.
> > > But a %sys spike doesn't imply a latency spike.
> > >
> > > Is this using the osq_lock.c code?
> > > That will have problems on overprovisioned VMs, it tries to find out
> > > whether the hypervisor has switched out - but ISTR that is flawed.
> > >
> > > In reality a spin lock shouldn't be held for long enough to cause
> > > any kind latency issue.
> > > So something in the code that reads the list of filter functions
> > > needs to be done differently so that the lock isn't held for as long.
> >
> > It's not a spinlock, it's an adaptive mutex which spins while the owner of
> > the mutex is also still running on the CPU. If the spinner CPU triggers a
> > NEED_RESCHED or the owner goes to sleep, the spinner stops spinning and
> > goes to sleep too.
>
> I think half my brain knew that - otherwise I wouldn't have mentioned
> the osq_lock.c code.
> That all reminded me I've a patch that optimises that code a bit.
> But I do remember thinking it ought to have a 'I been spinning long
> enough, time to sleep' path.
>
>         David
>
> >
> > Honestly, this still looks like a non-issue or a corner case that I don't
> > think requires these changes.
> >
> > This looks like one of those "Patient: Doctor it hurts me when I do this.
> > Doctor: Then don't do that." cases.
> >
> > Why is a production system having multiple users cat
> > avaliable_filter_functions to begin with?

bpftrace is a widely used tool for online debugging and dynamic
tracing. However, sysadmins may unknowingly run multiple bpftrace
instances concurrently without realizing the potential impact on
system performance.

If this is your answer, I believe we should clearly document the
following warning:

  Warning: Do not read available_filter_functions concurrently, as
doing so can significantly degrade system performance and potentially
impact production workloads.


--
Regards
Yafang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05  2:17           ` Yafang Shao
@ 2026-03-05  2:28             ` Steven Rostedt
  2026-03-05  2:33               ` Yafang Shao
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Rostedt @ 2026-03-05  2:28 UTC (permalink / raw)
  To: Yafang Shao
  Cc: David Laight, Peter Zijlstra, mingo, will, boqun, longman,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Thu, 5 Mar 2026 10:17:09 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> bpftrace is a widely used tool for online debugging and dynamic
> tracing. However, sysadmins may unknowingly run multiple bpftrace
> instances concurrently without realizing the potential impact on
> system performance.
> 
> If this is your answer, I believe we should clearly document the
> following warning:
> 
>   Warning: Do not read available_filter_functions concurrently, as
> doing so can significantly degrade system performance and potentially
> impact production workloads.

Or update bpftrace to cache that file. It's only updated on module load
and unload, which isn't done much. There's no reason it needs to
constantly read that file if bpftrace is being used constantly.


-- Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05  2:28             ` Steven Rostedt
@ 2026-03-05  2:33               ` Yafang Shao
  2026-03-05  3:00                 ` Steven Rostedt
  0 siblings, 1 reply; 29+ messages in thread
From: Yafang Shao @ 2026-03-05  2:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David Laight, Peter Zijlstra, mingo, will, boqun, longman,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Thu, Mar 5, 2026 at 10:28 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 5 Mar 2026 10:17:09 +0800
> Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > bpftrace is a widely used tool for online debugging and dynamic
> > tracing. However, sysadmins may unknowingly run multiple bpftrace
> > instances concurrently without realizing the potential impact on
> > system performance.
> >
> > If this is your answer, I believe we should clearly document the
> > following warning:
> >
> >   Warning: Do not read available_filter_functions concurrently, as
> > doing so can significantly degrade system performance and potentially
> > impact production workloads.
>
> Or update bpftrace to cache that file. It's only updated on module load
> and unload, which isn't done much. There's no reason it needs to
> constantly read that file if bpftrace is being used constantly.

Other tools may also read available_filter_functions, requiring each
one to be patched individually to avoid this flaw—a clearly
impractical solution.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05  2:33               ` Yafang Shao
@ 2026-03-05  3:00                 ` Steven Rostedt
  2026-03-05  3:08                   ` Yafang Shao
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Rostedt @ 2026-03-05  3:00 UTC (permalink / raw)
  To: Yafang Shao
  Cc: David Laight, Peter Zijlstra, mingo, will, boqun, longman,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Thu, 5 Mar 2026 10:33:00 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> Other tools may also read available_filter_functions, requiring each
> one to be patched individually to avoid this flaw—a clearly
> impractical solution.

What exactly is the issue? If a task does a while 1 in user space, it
wouldn't be much different. With PREEMPT_LAZY the most a task will spin
in the kernel is one extra tick over a user space task spinning in user
space.

available_filter_functions is definitely not a hot path, so I
personally don't care if it were to use "nospin". My worry is about
adding this "special" mutex for a single corner case, and I want to know
that its a real bug before we add something special into the kernel.

-- Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05  3:00                 ` Steven Rostedt
@ 2026-03-05  3:08                   ` Yafang Shao
  2026-03-05  4:30                     ` Waiman Long
  0 siblings, 1 reply; 29+ messages in thread
From: Yafang Shao @ 2026-03-05  3:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David Laight, Peter Zijlstra, mingo, will, boqun, longman,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 5 Mar 2026 10:33:00 +0800
> Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > Other tools may also read available_filter_functions, requiring each
> > one to be patched individually to avoid this flaw—a clearly
> > impractical solution.
>
> What exactly is the issue?

It makes no sense to spin unnecessarily when it can be avoided. We
continuously improve the kernel to do the right thing—and unnecessary
spinning is certainly not the right thing.

> If a task does a while 1 in user space, it
> wouldn't be much different.

The while loop in user space performs actual work, whereas useless
spinning does nothing but burn CPU cycles. My point is simple: if this
unnecessary spinning isn't already considered an issue, it should
be—it's something that clearly needs improvement.

> With PREEMPT_LAZY the most a task will spin
> in the kernel is one extra tick over a user space task spinning in user
> space.
>
> available_filter_functions is definitely not a hot path, so I
> personally don't care if it were to use "nospin". My worry is about
> adding this "special" mutex for a single corner case, and I want to know
> that its a real bug before we add something special into the kernel.
>
> -- Steve



-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05  3:08                   ` Yafang Shao
@ 2026-03-05  4:30                     ` Waiman Long
  2026-03-05  5:40                       ` Yafang Shao
  2026-03-05  9:32                       ` David Laight
  0 siblings, 2 replies; 29+ messages in thread
From: Waiman Long @ 2026-03-05  4:30 UTC (permalink / raw)
  To: Yafang Shao, Steven Rostedt
  Cc: David Laight, Peter Zijlstra, mingo, will, boqun, mhiramat,
	mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel,
	bpf

On 3/4/26 10:08 PM, Yafang Shao wrote:
> On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>> On Thu, 5 Mar 2026 10:33:00 +0800
>> Yafang Shao <laoar.shao@gmail.com> wrote:
>>
>>> Other tools may also read available_filter_functions, requiring each
>>> one to be patched individually to avoid this flaw—a clearly
>>> impractical solution.
>> What exactly is the issue?
> It makes no sense to spin unnecessarily when it can be avoided. We
> continuously improve the kernel to do the right thing—and unnecessary
> spinning is certainly not the right thing.
>
>> If a task does a while 1 in user space, it
>> wouldn't be much different.
> The while loop in user space performs actual work, whereas useless
> spinning does nothing but burn CPU cycles. My point is simple: if this
> unnecessary spinning isn't already considered an issue, it should
> be—it's something that clearly needs improvement.

The whole point of optimistic spinning is to reduce the lock acquisition 
latency. If the waiter sleeps, the unlock operation will have to wake up 
the waiter which can have a variable latency depending on how busy the 
system is at the time. Yes, it is burning CPU cycles while spinning, 
Most workloads will gain performance with this optimistic spinning 
feature. You do have a point that for system monitoring tools that 
observe the system behavior, they shouldn't burn that much CPU times 
that affect performance of real workload that the tools are monitoring.

BTW, you should expand the commit log of patch 1 to include the 
rationale of why we should add this feature to mutex as the information 
in the cover letter won't get included in the git log if this patch 
series is merged. You should also elaborate in comment on under what 
conditions should this this new mutex API be used.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05  4:30                     ` Waiman Long
@ 2026-03-05  5:40                       ` Yafang Shao
  2026-03-05 13:21                         ` Steven Rostedt
  2026-03-05 18:34                         ` Waiman Long
  2026-03-05  9:32                       ` David Laight
  1 sibling, 2 replies; 29+ messages in thread
From: Yafang Shao @ 2026-03-05  5:40 UTC (permalink / raw)
  To: Waiman Long
  Cc: Steven Rostedt, David Laight, Peter Zijlstra, mingo, will, boqun,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Thu, Mar 5, 2026 at 12:30 PM Waiman Long <longman@redhat.com> wrote:
>
> On 3/4/26 10:08 PM, Yafang Shao wrote:
> > On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >> On Thu, 5 Mar 2026 10:33:00 +0800
> >> Yafang Shao <laoar.shao@gmail.com> wrote:
> >>
> >>> Other tools may also read available_filter_functions, requiring each
> >>> one to be patched individually to avoid this flaw—a clearly
> >>> impractical solution.
> >> What exactly is the issue?
> > It makes no sense to spin unnecessarily when it can be avoided. We
> > continuously improve the kernel to do the right thing—and unnecessary
> > spinning is certainly not the right thing.
> >
> >> If a task does a while 1 in user space, it
> >> wouldn't be much different.
> > The while loop in user space performs actual work, whereas useless
> > spinning does nothing but burn CPU cycles. My point is simple: if this
> > unnecessary spinning isn't already considered an issue, it should
> > be—it's something that clearly needs improvement.
>
> The whole point of optimistic spinning is to reduce the lock acquisition
> latency. If the waiter sleeps, the unlock operation will have to wake up
> the waiter which can have a variable latency depending on how busy the
> system is at the time. Yes, it is burning CPU cycles while spinning,
> Most workloads will gain performance with this optimistic spinning
> feature. You do have a point that for system monitoring tools that
> observe the system behavior, they shouldn't burn that much CPU times
> that affect performance of real workload that the tools are monitoring.

Exactly. ftrace is intended for debugging and should not significantly
impact real workloads. Therefore, it's reasonable to make it sleep if
it cannot acquire the lock immediately, rather than spinning and
consuming CPU cycles.

>
> BTW, you should expand the commit log of patch 1 to include the
> rationale of why we should add this feature to mutex as the information
> in the cover letter won't get included in the git log if this patch
> series is merged. You should also elaborate in comment on under what
> conditions should this this new mutex API be used.

Sure.  I will update it.

BTW, these issues are notably hard to find. I suspect there are other
locks out there with the same problem.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05  5:40                       ` Yafang Shao
@ 2026-03-05 13:21                         ` Steven Rostedt
  2026-03-06  2:22                           ` Yafang Shao
  2026-03-05 18:34                         ` Waiman Long
  1 sibling, 1 reply; 29+ messages in thread
From: Steven Rostedt @ 2026-03-05 13:21 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Waiman Long, David Laight, Peter Zijlstra, mingo, will, boqun,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Thu, 5 Mar 2026 13:40:27 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> Exactly. ftrace is intended for debugging and should not significantly
> impact real workloads. Therefore, it's reasonable to make it sleep if
> it cannot acquire the lock immediately, rather than spinning and
> consuming CPU cycles.

Actually, ftrace is more than just debugging. It is the infrastructure for
live kernel patching as well.

> 
> >
> > BTW, you should expand the commit log of patch 1 to include the
> > rationale of why we should add this feature to mutex as the information
> > in the cover letter won't get included in the git log if this patch
> > series is merged. You should also elaborate in comment on under what
> > conditions should this this new mutex API be used.  
> 
> Sure.  I will update it.
> 
> BTW, these issues are notably hard to find. I suspect there are other
> locks out there with the same problem.

As I mentioned, I'm not against the change. I just want to make sure the
rationale is strong enough to make the change.

One thing that should be modified with your patch is the name. "nospin"
references the implementation of the mutex. Instead it should be called
something like: "noncritical" or "slowpath" stating that the grabbing of
this mutex is not of a critical section.

Maybe an entirely new interface should be defined:

struct slow_mutex;

slow_mutex_lock()
slow_mutex_unlock()

etc,

that makes it obvious that this mutex may be held for long periods of time.
In fact, this would be useful for RT workloads, as these mutexes could be
flagged to warn RT critical tasks if those tasks were to take one of them.

There has been some talk to mark paths in the kernel that RT tasks would
get a SIGKILL if they were to hit a path that is known to be non
deterministic.

-- Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05 13:21                         ` Steven Rostedt
@ 2026-03-06  2:22                           ` Yafang Shao
  2026-03-06 10:00                             ` David Laight
  0 siblings, 1 reply; 29+ messages in thread
From: Yafang Shao @ 2026-03-06  2:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Waiman Long, David Laight, Peter Zijlstra, mingo, will, boqun,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Thu, Mar 5, 2026 at 9:20 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 5 Mar 2026 13:40:27 +0800
> Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > Exactly. ftrace is intended for debugging and should not significantly
> > impact real workloads. Therefore, it's reasonable to make it sleep if
> > it cannot acquire the lock immediately, rather than spinning and
> > consuming CPU cycles.
>
> Actually, ftrace is more than just debugging. It is the infrastructure for
> live kernel patching as well.

good to know.

>
> >
> > >
> > > BTW, you should expand the commit log of patch 1 to include the
> > > rationale of why we should add this feature to mutex as the information
> > > in the cover letter won't get included in the git log if this patch
> > > series is merged. You should also elaborate in comment on under what
> > > conditions should this this new mutex API be used.
> >
> > Sure.  I will update it.
> >
> > BTW, these issues are notably hard to find. I suspect there are other
> > locks out there with the same problem.
>
> As I mentioned, I'm not against the change. I just want to make sure the
> rationale is strong enough to make the change.
>
> One thing that should be modified with your patch is the name. "nospin"
> references the implementation of the mutex. Instead it should be called
> something like: "noncritical" or "slowpath" stating that the grabbing of
> this mutex is not of a critical section.
>
> Maybe an entirely new interface should be defined:
>
>
> struct slow_mutex;

Is it necessary to define a new structure for this slow mutex? We
could simply reuse the existing struct mutex instead. Alternatively,
should we add some new flags to this slow_mutex for debugging
purposes?

>
> slow_mutex_lock()
> slow_mutex_unlock()

These two APIs appear sufficient to handle this use case.

>
> etc,
>
> that makes it obvious that this mutex may be held for long periods of time.
> In fact, this would be useful for RT workloads, as these mutexes could be
> flagged to warn RT critical tasks if those tasks were to take one of them.
>
> There has been some talk to mark paths in the kernel that RT tasks would
> get a SIGKILL if they were to hit a path that is known to be non
> deterministic.

Thanks for your information.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-06  2:22                           ` Yafang Shao
@ 2026-03-06 10:00                             ` David Laight
  2026-03-09  2:34                               ` Yafang Shao
  0 siblings, 1 reply; 29+ messages in thread
From: David Laight @ 2026-03-06 10:00 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Steven Rostedt, Waiman Long, Peter Zijlstra, mingo, will, boqun,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Fri, 6 Mar 2026 10:22:11 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> On Thu, Mar 5, 2026 at 9:20 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Thu, 5 Mar 2026 13:40:27 +0800
> > Yafang Shao <laoar.shao@gmail.com> wrote:
> >  
> > > Exactly. ftrace is intended for debugging and should not significantly
> > > impact real workloads. Therefore, it's reasonable to make it sleep if
> > > it cannot acquire the lock immediately, rather than spinning and
> > > consuming CPU cycles.  
> >
> > Actually, ftrace is more than just debugging. It is the infrastructure for
> > live kernel patching as well.  
> 
> good to know.
> 
> >  
> > >  
> > > >
> > > > BTW, you should expand the commit log of patch 1 to include the
> > > > rationale of why we should add this feature to mutex as the information
> > > > in the cover letter won't get included in the git log if this patch
> > > > series is merged. You should also elaborate in comment on under what
> > > > conditions should this this new mutex API be used.  
> > >
> > > Sure.  I will update it.
> > >
> > > BTW, these issues are notably hard to find. I suspect there are other
> > > locks out there with the same problem.  
> >
> > As I mentioned, I'm not against the change. I just want to make sure the
> > rationale is strong enough to make the change.
> >
> > One thing that should be modified with your patch is the name. "nospin"
> > references the implementation of the mutex. Instead it should be called
> > something like: "noncritical" or "slowpath" stating that the grabbing of
> > this mutex is not of a critical section.
> >
> > Maybe an entirely new interface should be defined:
> >
> >
> > struct slow_mutex;  
> 
> Is it necessary to define a new structure for this slow mutex? We
> could simply reuse the existing struct mutex instead. Alternatively,
> should we add some new flags to this slow_mutex for debugging
> purposes?
> 
> >
> > slow_mutex_lock()
> > slow_mutex_unlock()  
> 
> These two APIs appear sufficient to handle this use case.

Don't semaphores still exist?
IIRC they always sleep.

Although I wonder if the mutex need to be held for as long at it is.
ISTR one of the tracebacks was one the 'address to name' lookup,
that code will be slow.
Since the mutex can't be held across the multiple reads that are done
to read the full list of tracepoints it must surely be possible to
release it across the name lookup?

	David

> 
> >
> > etc,
> >
> > that makes it obvious that this mutex may be held for long periods of time.
> > In fact, this would be useful for RT workloads, as these mutexes could be
> > flagged to warn RT critical tasks if those tasks were to take one of them.
> >
> > There has been some talk to mark paths in the kernel that RT tasks would
> > get a SIGKILL if they were to hit a path that is known to be non
> > deterministic.  
> 
> Thanks for your information.
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-06 10:00                             ` David Laight
@ 2026-03-09  2:34                               ` Yafang Shao
  0 siblings, 0 replies; 29+ messages in thread
From: Yafang Shao @ 2026-03-09  2:34 UTC (permalink / raw)
  To: David Laight, Steven Rostedt
  Cc: Waiman Long, Peter Zijlstra, mingo, will, boqun, mhiramat,
	mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel,
	bpf

On Fri, Mar 6, 2026 at 6:00 PM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Fri, 6 Mar 2026 10:22:11 +0800
> Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > On Thu, Mar 5, 2026 at 9:20 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Thu, 5 Mar 2026 13:40:27 +0800
> > > Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > > Exactly. ftrace is intended for debugging and should not significantly
> > > > impact real workloads. Therefore, it's reasonable to make it sleep if
> > > > it cannot acquire the lock immediately, rather than spinning and
> > > > consuming CPU cycles.
> > >
> > > Actually, ftrace is more than just debugging. It is the infrastructure for
> > > live kernel patching as well.
> >
> > good to know.
> >
> > >
> > > >
> > > > >
> > > > > BTW, you should expand the commit log of patch 1 to include the
> > > > > rationale of why we should add this feature to mutex as the information
> > > > > in the cover letter won't get included in the git log if this patch
> > > > > series is merged. You should also elaborate in comment on under what
> > > > > conditions should this this new mutex API be used.
> > > >
> > > > Sure.  I will update it.
> > > >
> > > > BTW, these issues are notably hard to find. I suspect there are other
> > > > locks out there with the same problem.
> > >
> > > As I mentioned, I'm not against the change. I just want to make sure the
> > > rationale is strong enough to make the change.
> > >
> > > One thing that should be modified with your patch is the name. "nospin"
> > > references the implementation of the mutex. Instead it should be called
> > > something like: "noncritical" or "slowpath" stating that the grabbing of
> > > this mutex is not of a critical section.
> > >
> > > Maybe an entirely new interface should be defined:
> > >
> > >
> > > struct slow_mutex;
> >
> > Is it necessary to define a new structure for this slow mutex? We
> > could simply reuse the existing struct mutex instead. Alternatively,
> > should we add some new flags to this slow_mutex for debugging
> > purposes?
> >
> > >
> > > slow_mutex_lock()
> > > slow_mutex_unlock()
> >
> > These two APIs appear sufficient to handle this use case.
>
> Don't semaphores still exist?

While semaphores may present similar challenges, I'm not currently
aware of specific instances that share this exact issue. Should we
encounter any problematic semaphores in production workloads, we can
address them at that time.

> IIRC they always sleep.
>
> Although I wonder if the mutex need to be held for as long at it is.
> ISTR one of the tracebacks was one the 'address to name' lookup,
> that code will be slow.
> Since the mutex can't be held across the multiple reads that are done
> to read the full list of tracepoints it must surely be possible to
> release it across the name lookup?

That's a great point, though I'm not entirely certain at the moment.
Perhaps Steven can provide further insight.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05  5:40                       ` Yafang Shao
  2026-03-05 13:21                         ` Steven Rostedt
@ 2026-03-05 18:34                         ` Waiman Long
  2026-03-05 18:44                           ` Waiman Long
  1 sibling, 1 reply; 29+ messages in thread
From: Waiman Long @ 2026-03-05 18:34 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Steven Rostedt, David Laight, Peter Zijlstra, mingo, will, boqun,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On 3/5/26 12:40 AM, Yafang Shao wrote:
> On Thu, Mar 5, 2026 at 12:30 PM Waiman Long <longman@redhat.com> wrote:
>> On 3/4/26 10:08 PM, Yafang Shao wrote:
>>> On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>>>> On Thu, 5 Mar 2026 10:33:00 +0800
>>>> Yafang Shao <laoar.shao@gmail.com> wrote:
>>>>
>>>>> Other tools may also read available_filter_functions, requiring each
>>>>> one to be patched individually to avoid this flaw—a clearly
>>>>> impractical solution.
>>>> What exactly is the issue?
>>> It makes no sense to spin unnecessarily when it can be avoided. We
>>> continuously improve the kernel to do the right thing—and unnecessary
>>> spinning is certainly not the right thing.
>>>
>>>> If a task does a while 1 in user space, it
>>>> wouldn't be much different.
>>> The while loop in user space performs actual work, whereas useless
>>> spinning does nothing but burn CPU cycles. My point is simple: if this
>>> unnecessary spinning isn't already considered an issue, it should
>>> be—it's something that clearly needs improvement.
>> The whole point of optimistic spinning is to reduce the lock acquisition
>> latency. If the waiter sleeps, the unlock operation will have to wake up
>> the waiter which can have a variable latency depending on how busy the
>> system is at the time. Yes, it is burning CPU cycles while spinning,
>> Most workloads will gain performance with this optimistic spinning
>> feature. You do have a point that for system monitoring tools that
>> observe the system behavior, they shouldn't burn that much CPU times
>> that affect performance of real workload that the tools are monitoring.
> Exactly. ftrace is intended for debugging and should not significantly
> impact real workloads. Therefore, it's reasonable to make it sleep if
> it cannot acquire the lock immediately, rather than spinning and
> consuming CPU cycles.

Your patch series use wordings that give a negative connotation about 
optimistic spinning making it look like a bad thing. In fact, it is just 
a request for a new mutex API for use cases where they can suffer higher 
latency in order to minimize the system overhead they incur. So don't 
bad-mouth optimistic spinning and emphasize the use cases you want to 
support with the new API in your next version.

My 2 cents.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05 18:34                         ` Waiman Long
@ 2026-03-05 18:44                           ` Waiman Long
  2026-03-06  2:27                             ` Yafang Shao
  0 siblings, 1 reply; 29+ messages in thread
From: Waiman Long @ 2026-03-05 18:44 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Steven Rostedt, David Laight, Peter Zijlstra, mingo, will, boqun,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On 3/5/26 1:34 PM, Waiman Long wrote:
> On 3/5/26 12:40 AM, Yafang Shao wrote:
>> On Thu, Mar 5, 2026 at 12:30 PM Waiman Long <longman@redhat.com> wrote:
>>> On 3/4/26 10:08 PM, Yafang Shao wrote:
>>>> On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt 
>>>> <rostedt@goodmis.org> wrote:
>>>>> On Thu, 5 Mar 2026 10:33:00 +0800
>>>>> Yafang Shao <laoar.shao@gmail.com> wrote:
>>>>>
>>>>>> Other tools may also read available_filter_functions, requiring each
>>>>>> one to be patched individually to avoid this flaw—a clearly
>>>>>> impractical solution.
>>>>> What exactly is the issue?
>>>> It makes no sense to spin unnecessarily when it can be avoided. We
>>>> continuously improve the kernel to do the right thing—and unnecessary
>>>> spinning is certainly not the right thing.
>>>>
>>>>> If a task does a while 1 in user space, it
>>>>> wouldn't be much different.
>>>> The while loop in user space performs actual work, whereas useless
>>>> spinning does nothing but burn CPU cycles. My point is simple: if this
>>>> unnecessary spinning isn't already considered an issue, it should
>>>> be—it's something that clearly needs improvement.
>>> The whole point of optimistic spinning is to reduce the lock 
>>> acquisition
>>> latency. If the waiter sleeps, the unlock operation will have to 
>>> wake up
>>> the waiter which can have a variable latency depending on how busy the
>>> system is at the time. Yes, it is burning CPU cycles while spinning,
>>> Most workloads will gain performance with this optimistic spinning
>>> feature. You do have a point that for system monitoring tools that
>>> observe the system behavior, they shouldn't burn that much CPU times
>>> that affect performance of real workload that the tools are monitoring.
>> Exactly. ftrace is intended for debugging and should not significantly
>> impact real workloads. Therefore, it's reasonable to make it sleep if
>> it cannot acquire the lock immediately, rather than spinning and
>> consuming CPU cycles.
>
> Your patch series use wordings that give a negative connotation about 
> optimistic spinning making it look like a bad thing. In fact, it is 
> just a request for a new mutex API for use cases where they can suffer 
> higher latency in order to minimize the system overhead they incur. So 
> don't bad-mouth optimistic spinning and emphasize the use cases you 
> want to support with the new API in your next version. 

BTW, for any new mutex API introduced, you should also provide an 
equivalent version in kernel/locking/rtmutex_api.c for PREEMPT_RT kernel.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05 18:44                           ` Waiman Long
@ 2026-03-06  2:27                             ` Yafang Shao
  0 siblings, 0 replies; 29+ messages in thread
From: Yafang Shao @ 2026-03-06  2:27 UTC (permalink / raw)
  To: Waiman Long
  Cc: Steven Rostedt, David Laight, Peter Zijlstra, mingo, will, boqun,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Fri, Mar 6, 2026 at 2:45 AM Waiman Long <longman@redhat.com> wrote:
>
> On 3/5/26 1:34 PM, Waiman Long wrote:
> > On 3/5/26 12:40 AM, Yafang Shao wrote:
> >> On Thu, Mar 5, 2026 at 12:30 PM Waiman Long <longman@redhat.com> wrote:
> >>> On 3/4/26 10:08 PM, Yafang Shao wrote:
> >>>> On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt
> >>>> <rostedt@goodmis.org> wrote:
> >>>>> On Thu, 5 Mar 2026 10:33:00 +0800
> >>>>> Yafang Shao <laoar.shao@gmail.com> wrote:
> >>>>>
> >>>>>> Other tools may also read available_filter_functions, requiring each
> >>>>>> one to be patched individually to avoid this flaw—a clearly
> >>>>>> impractical solution.
> >>>>> What exactly is the issue?
> >>>> It makes no sense to spin unnecessarily when it can be avoided. We
> >>>> continuously improve the kernel to do the right thing—and unnecessary
> >>>> spinning is certainly not the right thing.
> >>>>
> >>>>> If a task does a while 1 in user space, it
> >>>>> wouldn't be much different.
> >>>> The while loop in user space performs actual work, whereas useless
> >>>> spinning does nothing but burn CPU cycles. My point is simple: if this
> >>>> unnecessary spinning isn't already considered an issue, it should
> >>>> be—it's something that clearly needs improvement.
> >>> The whole point of optimistic spinning is to reduce the lock
> >>> acquisition
> >>> latency. If the waiter sleeps, the unlock operation will have to
> >>> wake up
> >>> the waiter which can have a variable latency depending on how busy the
> >>> system is at the time. Yes, it is burning CPU cycles while spinning,
> >>> Most workloads will gain performance with this optimistic spinning
> >>> feature. You do have a point that for system monitoring tools that
> >>> observe the system behavior, they shouldn't burn that much CPU times
> >>> that affect performance of real workload that the tools are monitoring.
> >> Exactly. ftrace is intended for debugging and should not significantly
> >> impact real workloads. Therefore, it's reasonable to make it sleep if
> >> it cannot acquire the lock immediately, rather than spinning and
> >> consuming CPU cycles.
> >
> > Your patch series use wordings that give a negative connotation about
> > optimistic spinning making it look like a bad thing.

Perhaps I didn't phrase that well. I do understand that optimistic
spinning is valuable in use cases where we wouldn't want to disable
CONFIG_MUTEX_SPIN_ON_OWNER.

> In fact, it is
> > just a request for a new mutex API for use cases where they can suffer
> > higher latency in order to minimize the system overhead they incur. So
> > don't bad-mouth optimistic spinning and emphasize the use cases you
> > want to support with the new API in your next version.
>
> BTW, for any new mutex API introduced, you should also provide an
> equivalent version in kernel/locking/rtmutex_api.c for PREEMPT_RT kernel.

Thanks for the suggestion.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05  4:30                     ` Waiman Long
  2026-03-05  5:40                       ` Yafang Shao
@ 2026-03-05  9:32                       ` David Laight
  2026-03-05 19:00                         ` Waiman Long
  1 sibling, 1 reply; 29+ messages in thread
From: David Laight @ 2026-03-05  9:32 UTC (permalink / raw)
  To: Waiman Long
  Cc: Yafang Shao, Steven Rostedt, Peter Zijlstra, mingo, will, boqun,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Wed, 4 Mar 2026 23:30:40 -0500
Waiman Long <longman@redhat.com> wrote:

> On 3/4/26 10:08 PM, Yafang Shao wrote:
> > On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote:  
> >> On Thu, 5 Mar 2026 10:33:00 +0800
> >> Yafang Shao <laoar.shao@gmail.com> wrote:
> >>  
> >>> Other tools may also read available_filter_functions, requiring each
> >>> one to be patched individually to avoid this flaw—a clearly
> >>> impractical solution.  
> >> What exactly is the issue?  
> > It makes no sense to spin unnecessarily when it can be avoided. We
> > continuously improve the kernel to do the right thing—and unnecessary
> > spinning is certainly not the right thing.
> >  
> >> If a task does a while 1 in user space, it
> >> wouldn't be much different.  
> > The while loop in user space performs actual work, whereas useless
> > spinning does nothing but burn CPU cycles. My point is simple: if this
> > unnecessary spinning isn't already considered an issue, it should
> > be—it's something that clearly needs improvement.  
> 
> The whole point of optimistic spinning is to reduce the lock acquisition 
> latency. If the waiter sleeps, the unlock operation will have to wake up 
> the waiter which can have a variable latency depending on how busy the 
> system is at the time. Yes, it is burning CPU cycles while spinning, 
> Most workloads will gain performance with this optimistic spinning 
> feature. You do have a point that for system monitoring tools that 
> observe the system behavior, they shouldn't burn that much CPU times 
> that affect performance of real workload that the tools are monitoring.
> 
> BTW, you should expand the commit log of patch 1 to include the 
> rationale of why we should add this feature to mutex as the information 
> in the cover letter won't get included in the git log if this patch 
> series is merged. You should also elaborate in comment on under what 
> conditions should this this new mutex API be used.

Isn't changing mutex_lock() the wrong place anyway?
What you need is for the code holding the lock to indicate that
it isn't worth waiters spinning because the lock will be held
for a long time.

	David

> 
> Cheers,
> Longman
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05  9:32                       ` David Laight
@ 2026-03-05 19:00                         ` Waiman Long
  2026-03-06  2:33                           ` Yafang Shao
  0 siblings, 1 reply; 29+ messages in thread
From: Waiman Long @ 2026-03-05 19:00 UTC (permalink / raw)
  To: David Laight
  Cc: Yafang Shao, Steven Rostedt, Peter Zijlstra, mingo, will, boqun,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On 3/5/26 4:32 AM, David Laight wrote:
> On Wed, 4 Mar 2026 23:30:40 -0500
> Waiman Long <longman@redhat.com> wrote:
>
>> On 3/4/26 10:08 PM, Yafang Shao wrote:
>>> On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>>>> On Thu, 5 Mar 2026 10:33:00 +0800
>>>> Yafang Shao <laoar.shao@gmail.com> wrote:
>>>>   
>>>>> Other tools may also read available_filter_functions, requiring each
>>>>> one to be patched individually to avoid this flaw—a clearly
>>>>> impractical solution.
>>>> What exactly is the issue?
>>> It makes no sense to spin unnecessarily when it can be avoided. We
>>> continuously improve the kernel to do the right thing—and unnecessary
>>> spinning is certainly not the right thing.
>>>   
>>>> If a task does a while 1 in user space, it
>>>> wouldn't be much different.
>>> The while loop in user space performs actual work, whereas useless
>>> spinning does nothing but burn CPU cycles. My point is simple: if this
>>> unnecessary spinning isn't already considered an issue, it should
>>> be—it's something that clearly needs improvement.
>> The whole point of optimistic spinning is to reduce the lock acquisition
>> latency. If the waiter sleeps, the unlock operation will have to wake up
>> the waiter which can have a variable latency depending on how busy the
>> system is at the time. Yes, it is burning CPU cycles while spinning,
>> Most workloads will gain performance with this optimistic spinning
>> feature. You do have a point that for system monitoring tools that
>> observe the system behavior, they shouldn't burn that much CPU times
>> that affect performance of real workload that the tools are monitoring.
>>
>> BTW, you should expand the commit log of patch 1 to include the
>> rationale of why we should add this feature to mutex as the information
>> in the cover letter won't get included in the git log if this patch
>> series is merged. You should also elaborate in comment on under what
>> conditions should this this new mutex API be used.
> Isn't changing mutex_lock() the wrong place anyway?
> What you need is for the code holding the lock to indicate that
> it isn't worth waiters spinning because the lock will be held
> for a long time.

I have actually thought about having a flag somewhere in the mutex 
itself to indicate that optimistic spinning isn't needed. However the 
owner field is running out of usable flag bits. The other option is to 
add it to osq as it doesn't really need to use the full 32 bits for the 
tail. In this case, we can just initialize the mutex to say that we 
don't need optimistic spinning and no new mutex_lock() API will be needed.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
  2026-03-05 19:00                         ` Waiman Long
@ 2026-03-06  2:33                           ` Yafang Shao
  0 siblings, 0 replies; 29+ messages in thread
From: Yafang Shao @ 2026-03-06  2:33 UTC (permalink / raw)
  To: Waiman Long
  Cc: David Laight, Steven Rostedt, Peter Zijlstra, mingo, will, boqun,
	mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, bpf

On Fri, Mar 6, 2026 at 3:00 AM Waiman Long <longman@redhat.com> wrote:
>
> On 3/5/26 4:32 AM, David Laight wrote:
> > On Wed, 4 Mar 2026 23:30:40 -0500
> > Waiman Long <longman@redhat.com> wrote:
> >
> >> On 3/4/26 10:08 PM, Yafang Shao wrote:
> >>> On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >>>> On Thu, 5 Mar 2026 10:33:00 +0800
> >>>> Yafang Shao <laoar.shao@gmail.com> wrote:
> >>>>
> >>>>> Other tools may also read available_filter_functions, requiring each
> >>>>> one to be patched individually to avoid this flaw—a clearly
> >>>>> impractical solution.
> >>>> What exactly is the issue?
> >>> It makes no sense to spin unnecessarily when it can be avoided. We
> >>> continuously improve the kernel to do the right thing—and unnecessary
> >>> spinning is certainly not the right thing.
> >>>
> >>>> If a task does a while 1 in user space, it
> >>>> wouldn't be much different.
> >>> The while loop in user space performs actual work, whereas useless
> >>> spinning does nothing but burn CPU cycles. My point is simple: if this
> >>> unnecessary spinning isn't already considered an issue, it should
> >>> be—it's something that clearly needs improvement.
> >> The whole point of optimistic spinning is to reduce the lock acquisition
> >> latency. If the waiter sleeps, the unlock operation will have to wake up
> >> the waiter which can have a variable latency depending on how busy the
> >> system is at the time. Yes, it is burning CPU cycles while spinning,
> >> Most workloads will gain performance with this optimistic spinning
> >> feature. You do have a point that for system monitoring tools that
> >> observe the system behavior, they shouldn't burn that much CPU times
> >> that affect performance of real workload that the tools are monitoring.
> >>
> >> BTW, you should expand the commit log of patch 1 to include the
> >> rationale of why we should add this feature to mutex as the information
> >> in the cover letter won't get included in the git log if this patch
> >> series is merged. You should also elaborate in comment on under what
> >> conditions should this this new mutex API be used.
> > Isn't changing mutex_lock() the wrong place anyway?
> > What you need is for the code holding the lock to indicate that
> > it isn't worth waiters spinning because the lock will be held
> > for a long time.
>
> I have actually thought about having a flag somewhere in the mutex
> itself to indicate that optimistic spinning isn't needed. However the
> owner field is running out of usable flag bits.

True. Introducing a new MUTEX_FLAGS would likely require a substantial
refactor of the mutex code, which may not be worth it.

> The other option is to
> add it to osq as it doesn't really need to use the full 32 bits for the
> tail. In this case, we can just initialize the mutex to say that we
> don't need optimistic spinning and no new mutex_lock() API will be needed.

I believe a new mutex_lock() variant would be clearer and easier to
understand. It also has the advantage of requiring minimal changes to
the existing mutex code.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH 2/2] ftrace: disable optimistic spinning for ftrace_lock
  2026-03-04  7:46 [RFC PATCH 0/2] disable optimistic spinning for ftrace_lock Yafang Shao
  2026-03-04  7:46 ` [RFC PATCH 1/2] locking: add mutex_lock_nospin() Yafang Shao
@ 2026-03-04  7:46 ` Yafang Shao
  1 sibling, 0 replies; 29+ messages in thread
From: Yafang Shao @ 2026-03-04  7:46 UTC (permalink / raw)
  To: peterz, mingo, will, boqun, longman, rostedt, mhiramat,
	mark.rutland, mathieu.desnoyers
  Cc: linux-kernel, linux-trace-kernel, bpf, Yafang Shao

mutex_lock_nospin() is used for ftrace_lock to selectively disable
optimistic spinning.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/trace/ftrace.c | 52 +++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 827fb9a0bf0d..b8cca4f76118 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1284,7 +1284,7 @@ static void clear_ftrace_mod_list(struct list_head *head)
 	if (!head)
 		return;
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 	list_for_each_entry_safe(p, n, head, list)
 		free_ftrace_mod(p);
 	mutex_unlock(&ftrace_lock);
@@ -4254,7 +4254,7 @@ static void *t_start(struct seq_file *m, loff_t *pos)
 	void *p = NULL;
 	loff_t l;
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	if (unlikely(ftrace_disabled))
 		return NULL;
@@ -4362,7 +4362,7 @@ static __init void ftrace_check_work_func(struct work_struct *work)
 	struct ftrace_page *pg;
 	struct dyn_ftrace *rec;
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 	do_for_each_ftrace_rec(pg, rec) {
 		test_for_valid_rec(rec);
 	} while_for_each_ftrace_rec();
@@ -5123,7 +5123,7 @@ static void process_mod_list(struct list_head *head, struct ftrace_ops *ops,
 	if (!new_hash)
 		goto out; /* warn? */
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	list_for_each_entry_safe(ftrace_mod, n, head, list) {
 
@@ -5159,7 +5159,7 @@ static void process_mod_list(struct list_head *head, struct ftrace_ops *ops,
 	if (enable && list_empty(head))
 		new_hash->flags &= ~FTRACE_HASH_FL_MOD;
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	ftrace_hash_move_and_update_ops(ops, orig_hash,
 					      new_hash, enable);
@@ -5465,7 +5465,7 @@ register_ftrace_function_probe(char *glob, struct trace_array *tr,
 		return -EINVAL;
 
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 	/* Check if the probe_ops is already registered */
 	list_for_each_entry(iter, &tr->func_probes, list) {
 		if (iter->probe_ops == probe_ops) {
@@ -5540,7 +5540,7 @@ register_ftrace_function_probe(char *glob, struct trace_array *tr,
 		}
 	}
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	if (!count) {
 		/* Nothing was added? */
@@ -5619,7 +5619,7 @@ unregister_ftrace_function_probe_func(char *glob, struct trace_array *tr,
 			return -EINVAL;
 	}
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 	/* Check if the probe_ops is already registered */
 	list_for_each_entry(iter, &tr->func_probes, list) {
 		if (iter->probe_ops == probe_ops) {
@@ -5679,7 +5679,7 @@ unregister_ftrace_function_probe_func(char *glob, struct trace_array *tr,
 		goto out_unlock;
 	}
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	WARN_ON(probe->ref < count);
 
@@ -5943,7 +5943,7 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
 			goto out_regex_unlock;
 	}
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 	ret = ftrace_hash_move_and_update_ops(ops, orig_hash, hash, enable);
 	mutex_unlock(&ftrace_lock);
 
@@ -6205,7 +6205,7 @@ __modify_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
 	 * Now the ftrace_ops_list_func() is called to do the direct callers.
 	 * We can safely change the direct functions attached to each entry.
 	 */
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	size = 1 << hash->size_bits;
 	for (i = 0; i < size; i++) {
@@ -6625,7 +6625,7 @@ int update_ftrace_direct_mod(struct ftrace_ops *ops, struct ftrace_hash *hash, b
 	 * Now the ftrace_ops_list_func() is called to do the direct callers.
 	 * We can safely change the direct functions attached to each entry.
 	 */
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	size = 1 << hash->size_bits;
 	for (i = 0; i < size; i++) {
@@ -6980,7 +6980,7 @@ int ftrace_regex_release(struct inode *inode, struct file *file)
 		} else
 			orig_hash = &iter->ops->func_hash->notrace_hash;
 
-		mutex_lock(&ftrace_lock);
+		mutex_lock_nospin(&ftrace_lock);
 		ftrace_hash_move_and_update_ops(iter->ops, orig_hash,
 						      iter->hash, filter_hash);
 		mutex_unlock(&ftrace_lock);
@@ -7464,7 +7464,7 @@ void ftrace_create_filter_files(struct ftrace_ops *ops,
  */
 void ftrace_destroy_filter_files(struct ftrace_ops *ops)
 {
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 	if (ops->flags & FTRACE_OPS_FL_ENABLED)
 		ftrace_shutdown(ops, 0);
 	ops->flags |= FTRACE_OPS_FL_DELETED;
@@ -7571,7 +7571,7 @@ static int ftrace_process_locs(struct module *mod,
 	if (!start_pg)
 		return -ENOMEM;
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	/*
 	 * Core and each module needs their own pages, as
@@ -7868,7 +7868,7 @@ void ftrace_release_mod(struct module *mod)
 	struct ftrace_page *tmp_page = NULL;
 	struct ftrace_page *pg;
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	/*
 	 * To avoid the UAF problem after the module is unloaded, the
@@ -7938,7 +7938,7 @@ void ftrace_module_enable(struct module *mod)
 	struct dyn_ftrace *rec;
 	struct ftrace_page *pg;
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	if (ftrace_disabled)
 		goto out_unlock;
@@ -8267,7 +8267,7 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
 	key.ip = start;
 	key.flags = end;	/* overload flags, as it is unsigned long */
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	/*
 	 * If we are freeing module init memory, then check if
@@ -8686,7 +8686,7 @@ static void clear_ftrace_pids(struct trace_array *tr, int type)
 
 void ftrace_clear_pids(struct trace_array *tr)
 {
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	clear_ftrace_pids(tr, TRACE_PIDS | TRACE_NO_PIDS);
 
@@ -8695,7 +8695,7 @@ void ftrace_clear_pids(struct trace_array *tr)
 
 static void ftrace_pid_reset(struct trace_array *tr, int type)
 {
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 	clear_ftrace_pids(tr, type);
 
 	ftrace_update_pid_func();
@@ -8713,7 +8713,7 @@ static void *fpid_start(struct seq_file *m, loff_t *pos)
 	struct trace_pid_list *pid_list;
 	struct trace_array *tr = m->private;
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 	rcu_read_lock_sched();
 
 	pid_list = rcu_dereference_sched(tr->function_pids);
@@ -8766,7 +8766,7 @@ static void *fnpid_start(struct seq_file *m, loff_t *pos)
 	struct trace_pid_list *pid_list;
 	struct trace_array *tr = m->private;
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 	rcu_read_lock_sched();
 
 	pid_list = rcu_dereference_sched(tr->function_no_pids);
@@ -9057,7 +9057,7 @@ static int prepare_direct_functions_for_ipmodify(struct ftrace_ops *ops)
 			unsigned long ip = entry->ip;
 			bool found_op = false;
 
-			mutex_lock(&ftrace_lock);
+			mutex_lock_nospin(&ftrace_lock);
 			do_for_each_ftrace_op(op, ftrace_ops_list) {
 				if (!(op->flags & FTRACE_OPS_FL_DIRECT))
 					continue;
@@ -9106,7 +9106,7 @@ static void cleanup_direct_functions_after_ipmodify(struct ftrace_ops *ops)
 			unsigned long ip = entry->ip;
 			bool found_op = false;
 
-			mutex_lock(&ftrace_lock);
+			mutex_lock_nospin(&ftrace_lock);
 			do_for_each_ftrace_op(op, ftrace_ops_list) {
 				if (!(op->flags & FTRACE_OPS_FL_DIRECT))
 					continue;
@@ -9153,7 +9153,7 @@ static int register_ftrace_function_nolock(struct ftrace_ops *ops)
 
 	ftrace_ops_init(ops);
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 
 	ret = ftrace_startup(ops, 0);
 
@@ -9200,7 +9200,7 @@ int unregister_ftrace_function(struct ftrace_ops *ops)
 {
 	int ret;
 
-	mutex_lock(&ftrace_lock);
+	mutex_lock_nospin(&ftrace_lock);
 	ret = ftrace_shutdown(ops, 0);
 	mutex_unlock(&ftrace_lock);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2026-03-09  2:34 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04  7:46 [RFC PATCH 0/2] disable optimistic spinning for ftrace_lock Yafang Shao
2026-03-04  7:46 ` [RFC PATCH 1/2] locking: add mutex_lock_nospin() Yafang Shao
2026-03-04  9:02   ` Peter Zijlstra
2026-03-04  9:37     ` Yafang Shao
2026-03-04 10:11       ` Peter Zijlstra
2026-03-04 11:52         ` Yafang Shao
2026-03-04 12:41           ` Peter Zijlstra
2026-03-04 14:25             ` Yafang Shao
2026-03-04  9:54     ` David Laight
2026-03-04 20:57       ` Steven Rostedt
2026-03-04 21:44         ` David Laight
2026-03-05  2:17           ` Yafang Shao
2026-03-05  2:28             ` Steven Rostedt
2026-03-05  2:33               ` Yafang Shao
2026-03-05  3:00                 ` Steven Rostedt
2026-03-05  3:08                   ` Yafang Shao
2026-03-05  4:30                     ` Waiman Long
2026-03-05  5:40                       ` Yafang Shao
2026-03-05 13:21                         ` Steven Rostedt
2026-03-06  2:22                           ` Yafang Shao
2026-03-06 10:00                             ` David Laight
2026-03-09  2:34                               ` Yafang Shao
2026-03-05 18:34                         ` Waiman Long
2026-03-05 18:44                           ` Waiman Long
2026-03-06  2:27                             ` Yafang Shao
2026-03-05  9:32                       ` David Laight
2026-03-05 19:00                         ` Waiman Long
2026-03-06  2:33                           ` Yafang Shao
2026-03-04  7:46 ` [RFC PATCH 2/2] ftrace: disable optimistic spinning for ftrace_lock Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox