* [RFC PATCH 0/2] disable optimistic spinning for ftrace_lock
@ 2026-03-04 7:46 Yafang Shao
2026-03-04 7:46 ` [RFC PATCH 1/2] locking: add mutex_lock_nospin() Yafang Shao
2026-03-04 7:46 ` [RFC PATCH 2/2] ftrace: disable optimistic spinning for ftrace_lock Yafang Shao
0 siblings, 2 replies; 29+ messages in thread
From: Yafang Shao @ 2026-03-04 7:46 UTC (permalink / raw)
To: peterz, mingo, will, boqun, longman, rostedt, mhiramat,
mark.rutland, mathieu.desnoyers
Cc: linux-kernel, linux-trace-kernel, bpf, Yafang Shao
Background
==========
One of our latency-sensitive services reported random CPU %sys spikes.
After a thorough investigation, we finally identified the root cause of the
CPU %sys spikes. The key kernel stacks are as follows:
- Task A
2026-02-14-16:53:40.938243: [CPU198] 2156302(bpftrace) cgrp:4019437 pod:4019253
find_kallsyms_symbol+142
module_address_lookup+104
kallsyms_lookup_buildid+203
kallsyms_lookup+20
print_rec+64
t_show+67
seq_read_iter+709
seq_read+165
vfs_read+165
ksys_read+103
__x64_sys_read+25
do_syscall_64+56
entry_SYSCALL_64_after_hwframe+100
This task (2156302, bpftrace) is reading the
/sys/kernel/debug/tracing/available_filter_functions to check if a function
is traceable:
https://github.com/bpftrace/bpftrace/blob/master/src/tracefs/tracefs.h#L21
Reading the available_filter_functions file is time-consuming, as it
contains tens of thousands of functions:
$ cat /sys/kernel/debug/tracing/available_filter_functions | wc -l
61081
$ time cat /sys/kernel/debug/tracing/available_filter_functions > /dev/null
real 0m0.452s
user 0m0.000s
sys 0m0.452s
Consequently, the ftrace_lock is held by this task for an extended period.
- Other Tasks
2026-02-14-16:53:41.437094: [CPU79] 2156308(bpftrace) cgrp:4019437 pod:4019253
mutex_spin_on_owner+108
__mutex_lock.constprop.0+1132
__mutex_lock_slowpath+19
mutex_lock+56
t_start+51
seq_read_iter+250
seq_read+165
vfs_read+165
ksys_read+103
__x64_sys_read+25
do_syscall_64+56
entry_SYSCALL_64_after_hwframe+100
Since ftrace_lock is held by Task-A and Task-A is actively running on a
CPU, all other tasks waiting for the same lock will spin on their
respective CPUs. This leads to increased CPU pressure.
Reproduction
============
This issue can be reproduced simply by running
`cat available_filter_functions`.
- Single process reading available_filter_functions:
$ time cat /sys/kernel/tracing/available_filter_functions > /dev/null
real 0m0.452s
user 0m0.001s
sys 0m0.451s
- Six processes reading available_filter_functions simultaneously:
for i in `seq 0 5`; do
time cat /sys/kernel/tracing/available_filter_functions > /dev/null &
done
The results are as follows:
real 0m1.801s
user 0m0.000s
sys 0m1.779s
real 0m1.804s
user 0m0.001s
sys 0m1.791s
real 0m1.805s
user 0m0.000s
sys 0m1.792s
real 0m1.806s
user 0m0.001s
sys 0m1.796s
As more processes are added, the system time increases correspondingly.
Solution
========
One approach is to optimize the reading of available_filter_functions to
make it as fast as possible. However, the risk lies in the contention
caused by optimistic spin locking.
Therefore, we need to consider an alternative solution that avoids
optimistic spinning for heavy mutexes that may be held for long durations.
Note that we do not want to disable CONFIG_MUTEX_SPIN_ON_OWNER entirely, as
that could lead to unexpected performance regressions.
In this patch, a new wrapper mutex_lock_nospin() is used for ftrace_lock to
selectively disable optimistic spinning.
Yafang Shao (2):
locking: add mutex_lock_nospin()
ftrace: disable optimistic spinning for ftrace_lock
include/linux/mutex.h | 3 +++
kernel/locking/mutex.c | 39 +++++++++++++++++++++++++------
kernel/trace/ftrace.c | 52 +++++++++++++++++++++---------------------
3 files changed, 61 insertions(+), 33 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 29+ messages in thread* [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-04 7:46 [RFC PATCH 0/2] disable optimistic spinning for ftrace_lock Yafang Shao @ 2026-03-04 7:46 ` Yafang Shao 2026-03-04 9:02 ` Peter Zijlstra 2026-03-04 7:46 ` [RFC PATCH 2/2] ftrace: disable optimistic spinning for ftrace_lock Yafang Shao 1 sibling, 1 reply; 29+ messages in thread From: Yafang Shao @ 2026-03-04 7:46 UTC (permalink / raw) To: peterz, mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland, mathieu.desnoyers Cc: linux-kernel, linux-trace-kernel, bpf, Yafang Shao Introduce mutex_lock_nospin(), a helper that disables optimistic spinning on the owner for specific heavy locks. This prevents long spinning times that can lead to latency spikes for other tasks on the same runqueue. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- include/linux/mutex.h | 3 +++ kernel/locking/mutex.c | 39 ++++++++++++++++++++++++++++++++------- 2 files changed, 35 insertions(+), 7 deletions(-) diff --git a/include/linux/mutex.h b/include/linux/mutex.h index ecaa0440f6ec..1e488bb24b57 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -189,11 +189,13 @@ extern int __must_check mutex_lock_interruptible_nested(struct mutex *lock, extern int __must_check _mutex_lock_killable(struct mutex *lock, unsigned int subclass, struct lockdep_map *nest_lock) __cond_acquires(0, lock); extern void mutex_lock_io_nested(struct mutex *lock, unsigned int subclass) __acquires(lock); +extern void mutex_lock_nospin_nested(struct mutex *lock, unsigned int subclass); #define mutex_lock(lock) mutex_lock_nested(lock, 0) #define mutex_lock_interruptible(lock) mutex_lock_interruptible_nested(lock, 0) #define mutex_lock_killable(lock) _mutex_lock_killable(lock, 0, NULL) #define mutex_lock_io(lock) mutex_lock_io_nested(lock, 0) +#define mutex_lock_nospin(lock) mutex_lock_nospin_nested(lock, 0) #define mutex_lock_nest_lock(lock, nest_lock) \ do { \ @@ -215,6 +217,7 @@ extern void mutex_lock(struct mutex *lock) __acquires(lock); extern int __must_check mutex_lock_interruptible(struct mutex *lock) __cond_acquires(0, lock); extern int __must_check mutex_lock_killable(struct mutex *lock) __cond_acquires(0, lock); extern void mutex_lock_io(struct mutex *lock) __acquires(lock); +extern void mutex_lock_nospin(struct mutex *lock) __acquires(lock); # define mutex_lock_nested(lock, subclass) mutex_lock(lock) # define mutex_lock_interruptible_nested(lock, subclass) mutex_lock_interruptible(lock) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 2a1d165b3167..03d3b0749882 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -290,6 +290,14 @@ void __sched mutex_lock(struct mutex *lock) __mutex_lock_slowpath(lock); } EXPORT_SYMBOL(mutex_lock); + +void __sched mutex_lock_nospin(struct mutex *lock) +{ + might_sleep(); + + if (!__mutex_trylock_fast(lock)) + __mutex_lock_nospin(lock); +} #endif #include "ww_mutex.h" @@ -443,8 +451,11 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock) */ static __always_inline bool mutex_optimistic_spin(struct mutex *lock, struct ww_acquire_ctx *ww_ctx, - struct mutex_waiter *waiter) + struct mutex_waiter *waiter, const bool nospin) { + if (nospin) + return false; + if (!waiter) { /* * The purpose of the mutex_can_spin_on_owner() function is @@ -577,7 +588,8 @@ EXPORT_SYMBOL(ww_mutex_unlock); static __always_inline int __sched __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclass, struct lockdep_map *nest_lock, unsigned long ip, - struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx) + struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx, + const bool nospin) { DEFINE_WAKE_Q(wake_q); struct mutex_waiter waiter; @@ -615,7 +627,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN); if (__mutex_trylock(lock) || - mutex_optimistic_spin(lock, ww_ctx, NULL)) { + mutex_optimistic_spin(lock, ww_ctx, NULL, nospin)) { /* got the lock, yay! */ lock_acquired(&lock->dep_map, ip); if (ww_ctx) @@ -716,7 +728,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas * to run. */ clear_task_blocked_on(current, lock); - if (mutex_optimistic_spin(lock, ww_ctx, &waiter)) + if (mutex_optimistic_spin(lock, ww_ctx, &waiter, nospin)) break; set_task_blocked_on(current, lock); trace_contention_begin(lock, LCB_F_MUTEX); @@ -773,14 +785,21 @@ static int __sched __mutex_lock(struct mutex *lock, unsigned int state, unsigned int subclass, struct lockdep_map *nest_lock, unsigned long ip) { - return __mutex_lock_common(lock, state, subclass, nest_lock, ip, NULL, false); + return __mutex_lock_common(lock, state, subclass, nest_lock, ip, NULL, false, false); +} + +static int __sched +__mutex_lock_nospin(struct mutex *lock, unsigned int state, unsigned int subclass, + struct lockdep_map *nest_lock, unsigned long ip) +{ + return __mutex_lock_common(lock, state, subclass, nest_lock, ip, NULL, false, true); } static int __sched __ww_mutex_lock(struct mutex *lock, unsigned int state, unsigned int subclass, unsigned long ip, struct ww_acquire_ctx *ww_ctx) { - return __mutex_lock_common(lock, state, subclass, NULL, ip, ww_ctx, true); + return __mutex_lock_common(lock, state, subclass, NULL, ip, ww_ctx, true, false); } /** @@ -861,11 +880,17 @@ mutex_lock_io_nested(struct mutex *lock, unsigned int subclass) token = io_schedule_prepare(); __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, - subclass, NULL, _RET_IP_, NULL, 0); + subclass, NULL, _RET_IP_, NULL, 0, false); io_schedule_finish(token); } EXPORT_SYMBOL_GPL(mutex_lock_io_nested); +void __sched +mutex_lock_nospin_nested(struct mutex *lock, unsigned int subclass) +{ + __mutex_lock_nospin(lock, TASK_UNINTERRUPTIBLE, subclass, NULL, _RET_IP_); +} + static inline int ww_mutex_deadlock_injection(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) { -- 2.47.3 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-04 7:46 ` [RFC PATCH 1/2] locking: add mutex_lock_nospin() Yafang Shao @ 2026-03-04 9:02 ` Peter Zijlstra 2026-03-04 9:37 ` Yafang Shao 2026-03-04 9:54 ` David Laight 0 siblings, 2 replies; 29+ messages in thread From: Peter Zijlstra @ 2026-03-04 9:02 UTC (permalink / raw) To: Yafang Shao Cc: mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote: > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning > on the owner for specific heavy locks. This prevents long spinning times > that can lead to latency spikes for other tasks on the same runqueue. This makes no sense; spinning stops on need_resched(). ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-04 9:02 ` Peter Zijlstra @ 2026-03-04 9:37 ` Yafang Shao 2026-03-04 10:11 ` Peter Zijlstra 2026-03-04 9:54 ` David Laight 1 sibling, 1 reply; 29+ messages in thread From: Yafang Shao @ 2026-03-04 9:37 UTC (permalink / raw) To: Peter Zijlstra Cc: mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Wed, Mar 4, 2026 at 5:03 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote: > > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning > > on the owner for specific heavy locks. This prevents long spinning times > > that can lead to latency spikes for other tasks on the same runqueue. > > This makes no sense; spinning stops on need_resched(). Hello Peter, The condition to stop spinning on need_resched() relies on the mutex owner remaining unchanged. However, when multiple tasks contend for the same lock, the owner can change frequently. This creates a potential TOCTOU (Time of Check to Time of Use) issue. mutex_optimistic_spin owner = __mutex_trylock_or_owner(lock); mutex_spin_on_owner // the __mutex_owner(lock) might get a new owner. while (__mutex_owner(lock) == owner) We observed high CPU pressure in production when this scenario occurred. Below are the benchmark results when running 10 concurrent tasks: for i in `seq 0 9`; do time cat /sys/kernel/debug/tracing/available_filter_functions > /dev/null & done - before this patch real 0m4.636s user 0m0.001s sys 0m3.773s real 0m5.157s user 0m0.001s sys 0m4.362s real 0m5.205s user 0m0.000s sys 0m4.538s real 0m5.212s user 0m0.001s sys 0m4.700s real 0m5.246s user 0m0.001s sys 0m4.501s real 0m5.254s user 0m0.003s sys 0m4.335s real 0m5.260s user 0m0.003s sys 0m4.525s real 0m5.267s user 0m0.004s sys 0m4.482s real 0m5.273s user 0m0.002s sys 0m4.215s real 0m5.285s user 0m0.003s sys 0m4.373s - after this patch real 0m4.733s user 0m0.002s sys 0m0.511s real 0m4.740s user 0m0.001s sys 0m0.509s real 0m4.862s user 0m0.001s sys 0m0.513s real 0m4.884s user 0m0.000s sys 0m0.507s real 0m4.888s user 0m0.003s sys 0m0.513s real 0m4.888s user 0m0.000s sys 0m0.511s real 0m4.886s user 0m0.003s sys 0m0.508s real 0m4.952s user 0m0.000s sys 0m0.513s real 0m4.973s user 0m0.001s sys 0m0.510s real 0m5.042s user 0m0.002s sys 0m0.515s The results show that system time dropped dramatically from ~4.5 seconds to ~0.5 seconds, confirming that the patch can help reduce the issue. Please correct me if I've misunderstood anything. -- Regards Yafang ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-04 9:37 ` Yafang Shao @ 2026-03-04 10:11 ` Peter Zijlstra 2026-03-04 11:52 ` Yafang Shao 0 siblings, 1 reply; 29+ messages in thread From: Peter Zijlstra @ 2026-03-04 10:11 UTC (permalink / raw) To: Yafang Shao Cc: mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Wed, Mar 04, 2026 at 05:37:31PM +0800, Yafang Shao wrote: > On Wed, Mar 4, 2026 at 5:03 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote: > > > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning > > > on the owner for specific heavy locks. This prevents long spinning times > > > that can lead to latency spikes for other tasks on the same runqueue. > > > > This makes no sense; spinning stops on need_resched(). > > Hello Peter, > > The condition to stop spinning on need_resched() relies on the mutex > owner remaining unchanged. However, when multiple tasks contend for > the same lock, the owner can change frequently. This creates a > potential TOCTOU (Time of Check to Time of Use) issue. > > mutex_optimistic_spin > owner = __mutex_trylock_or_owner(lock); > mutex_spin_on_owner > // the __mutex_owner(lock) might get a new owner. > while (__mutex_owner(lock) == owner) > How do these new owners become the owner? Are they succeeding the __mutex_trylock() that sits before mutex_optimistic_spin() and effectively starving the spinner? Something like the below would make a difference if that were so. --- diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index c867f6c15530..0796e77a8c3b 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -521,7 +521,7 @@ static __always_inline bool mutex_optimistic_spin(struct mutex *lock, struct ww_acquire_ctx *ww_ctx, struct mutex_waiter *waiter) { - return false; + return __mutex_trylock(lock); } #endif @@ -614,8 +614,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip); trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN); - if (__mutex_trylock(lock) || - mutex_optimistic_spin(lock, ww_ctx, NULL)) { + if (mutex_optimistic_spin(lock, ww_ctx, NULL)) { /* got the lock, yay! */ lock_acquired(&lock->dep_map, ip); if (ww_ctx) ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-04 10:11 ` Peter Zijlstra @ 2026-03-04 11:52 ` Yafang Shao 2026-03-04 12:41 ` Peter Zijlstra 0 siblings, 1 reply; 29+ messages in thread From: Yafang Shao @ 2026-03-04 11:52 UTC (permalink / raw) To: Peter Zijlstra Cc: mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Wed, Mar 4, 2026 at 6:11 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Mar 04, 2026 at 05:37:31PM +0800, Yafang Shao wrote: > > On Wed, Mar 4, 2026 at 5:03 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote: > > > > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning > > > > on the owner for specific heavy locks. This prevents long spinning times > > > > that can lead to latency spikes for other tasks on the same runqueue. > > > > > > This makes no sense; spinning stops on need_resched(). > > > > Hello Peter, > > > > The condition to stop spinning on need_resched() relies on the mutex > > owner remaining unchanged. However, when multiple tasks contend for > > the same lock, the owner can change frequently. This creates a > > potential TOCTOU (Time of Check to Time of Use) issue. > > > > mutex_optimistic_spin > > owner = __mutex_trylock_or_owner(lock); > > mutex_spin_on_owner > > // the __mutex_owner(lock) might get a new owner. > > while (__mutex_owner(lock) == owner) > > > > How do these new owners become the owner? Are they succeeding the > __mutex_trylock() that sits before mutex_optimistic_spin() and > effectively starving the spinner? > > Something like the below would make a difference if that were so. The following change made no difference; concurrent runs still result in prolonged system time. real 0m5.265s user 0m0.000s sys 0m4.921s real 0m5.295s user 0m0.002s sys 0m4.697s real 0m5.293s user 0m0.003s sys 0m4.844s real 0m5.303s user 0m0.001s sys 0m4.511s real 0m5.303s user 0m0.000s sys 0m4.694s real 0m5.302s user 0m0.002s sys 0m4.677s real 0m5.313s user 0m0.000s sys 0m4.837s real 0m5.327s user 0m0.000s sys 0m4.808s real 0m5.330s user 0m0.001s sys 0m4.893s real 0m5.358s user 0m0.005s sys 0m4.919s Our kernel is not built with CONFIG_PREEMPT enabled, so prolonged system time can lead to CPU pressure and potential latency spikes. Since we can reliably reproduce this unnecessary spinning, why not improve it to reduce the overhead? -- Regards Yafang ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-04 11:52 ` Yafang Shao @ 2026-03-04 12:41 ` Peter Zijlstra 2026-03-04 14:25 ` Yafang Shao 0 siblings, 1 reply; 29+ messages in thread From: Peter Zijlstra @ 2026-03-04 12:41 UTC (permalink / raw) To: Yafang Shao Cc: mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Wed, Mar 04, 2026 at 07:52:06PM +0800, Yafang Shao wrote: > On Wed, Mar 4, 2026 at 6:11 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Wed, Mar 04, 2026 at 05:37:31PM +0800, Yafang Shao wrote: > > > On Wed, Mar 4, 2026 at 5:03 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote: > > > > > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning > > > > > on the owner for specific heavy locks. This prevents long spinning times > > > > > that can lead to latency spikes for other tasks on the same runqueue. > > > > > > > > This makes no sense; spinning stops on need_resched(). > > > > > > Hello Peter, > > > > > > The condition to stop spinning on need_resched() relies on the mutex > > > owner remaining unchanged. However, when multiple tasks contend for > > > the same lock, the owner can change frequently. This creates a > > > potential TOCTOU (Time of Check to Time of Use) issue. > > > > > > mutex_optimistic_spin > > > owner = __mutex_trylock_or_owner(lock); > > > mutex_spin_on_owner > > > // the __mutex_owner(lock) might get a new owner. > > > while (__mutex_owner(lock) == owner) > > > > > > > How do these new owners become the owner? Are they succeeding the > > __mutex_trylock() that sits before mutex_optimistic_spin() and > > effectively starving the spinner? > > > > Something like the below would make a difference if that were so. > > The following change made no difference; concurrent runs still result > in prolonged system time. > > real 0m5.265s user 0m0.000s sys 0m4.921s > real 0m5.295s user 0m0.002s sys 0m4.697s > real 0m5.293s user 0m0.003s sys 0m4.844s > real 0m5.303s user 0m0.001s sys 0m4.511s > real 0m5.303s user 0m0.000s sys 0m4.694s > real 0m5.302s user 0m0.002s sys 0m4.677s > real 0m5.313s user 0m0.000s sys 0m4.837s > real 0m5.327s user 0m0.000s sys 0m4.808s > real 0m5.330s user 0m0.001s sys 0m4.893s > real 0m5.358s user 0m0.005s sys 0m4.919s > > Our kernel is not built with CONFIG_PREEMPT enabled, so prolonged > system time can lead to CPU pressure and potential latency spikes. > Since we can reliably reproduce this unnecessary spinning, why not > improve it to reduce the overhead? If you cannot explain what the problem is (you haven't), there is nothing to fix. Also, current kernels cannot be build without PREEMPT; and if you care about latency running a PREEMPT=n kernel is daft. That said, TIF_NEED_RESCHED should work irrespective of PREEMPT settings, the PREEMPT settings just affect when and how you end up in schedule(). Even without PREEMPT, if there is a task waiting either the wakeup or the tick will set TIF_NEED_RESCHED and it should stop spinning. If there is no task waiting, there is no actual latency, just burning CPU time, and that isn't a problem per-se. What should happen is that the first spinner gets the lock next, the next spinner is then promoted to first spinner and so on. This chain continues, which means the lock owner is always on-cpu and good progress is being made and there is no CPU contention, or the spinner gets marked for preemption (as said, this does not require PREEMPT=y) and will stop spinning and go sleep, or the owner goes to sleep and all the spinners stop and also go sleep. Again, you have not said anything specific enough to figure out what happens on your end. You said the owner changes, this means there is progress made. What isn't clear is if any one particular spinner is starved (that would be a problem) or if this latency spike you observe is worse than would be from running a while(1) loop, in which case, that's just how it is. What is not sane, is marking random locks with random properties just because random workload. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-04 12:41 ` Peter Zijlstra @ 2026-03-04 14:25 ` Yafang Shao 0 siblings, 0 replies; 29+ messages in thread From: Yafang Shao @ 2026-03-04 14:25 UTC (permalink / raw) To: Peter Zijlstra Cc: mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Wed, Mar 4, 2026 at 8:41 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Mar 04, 2026 at 07:52:06PM +0800, Yafang Shao wrote: > > On Wed, Mar 4, 2026 at 6:11 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Wed, Mar 04, 2026 at 05:37:31PM +0800, Yafang Shao wrote: > > > > On Wed, Mar 4, 2026 at 5:03 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > > > On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote: > > > > > > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning > > > > > > on the owner for specific heavy locks. This prevents long spinning times > > > > > > that can lead to latency spikes for other tasks on the same runqueue. > > > > > > > > > > This makes no sense; spinning stops on need_resched(). > > > > > > > > Hello Peter, > > > > > > > > The condition to stop spinning on need_resched() relies on the mutex > > > > owner remaining unchanged. However, when multiple tasks contend for > > > > the same lock, the owner can change frequently. This creates a > > > > potential TOCTOU (Time of Check to Time of Use) issue. > > > > > > > > mutex_optimistic_spin > > > > owner = __mutex_trylock_or_owner(lock); > > > > mutex_spin_on_owner > > > > // the __mutex_owner(lock) might get a new owner. > > > > while (__mutex_owner(lock) == owner) > > > > > > > > > > How do these new owners become the owner? Are they succeeding the > > > __mutex_trylock() that sits before mutex_optimistic_spin() and > > > effectively starving the spinner? > > > > > > Something like the below would make a difference if that were so. > > > > The following change made no difference; concurrent runs still result > > in prolonged system time. > > > > real 0m5.265s user 0m0.000s sys 0m4.921s > > real 0m5.295s user 0m0.002s sys 0m4.697s > > real 0m5.293s user 0m0.003s sys 0m4.844s > > real 0m5.303s user 0m0.001s sys 0m4.511s > > real 0m5.303s user 0m0.000s sys 0m4.694s > > real 0m5.302s user 0m0.002s sys 0m4.677s > > real 0m5.313s user 0m0.000s sys 0m4.837s > > real 0m5.327s user 0m0.000s sys 0m4.808s > > real 0m5.330s user 0m0.001s sys 0m4.893s > > real 0m5.358s user 0m0.005s sys 0m4.919s > > > > Our kernel is not built with CONFIG_PREEMPT enabled, so prolonged > > system time can lead to CPU pressure and potential latency spikes. > > Since we can reliably reproduce this unnecessary spinning, why not > > improve it to reduce the overhead? > > If you cannot explain what the problem is (you haven't), there is > nothing to fix. > > Also, current kernels cannot be build without PREEMPT; and if you care > about latency running a PREEMPT=n kernel is daft. That said, > TIF_NEED_RESCHED should work irrespective of PREEMPT settings, the > PREEMPT settings just affect when and how you end up in schedule(). > > Even without PREEMPT, if there is a task waiting either the wakeup or > the tick will set TIF_NEED_RESCHED and it should stop spinning. If there > is no task waiting, there is no actual latency, just burning CPU time, > and that isn't a problem per-se. > > What should happen is that the first spinner gets the lock next, the > next spinner is then promoted to first spinner and so on. > > This chain continues, which means the lock owner is always > on-cpu and good progress is being made and there is no CPU contention, > or the spinner gets marked for preemption (as said, this does not > require PREEMPT=y) and will stop spinning and go sleep, or the owner > goes to sleep and all the spinners stop and also go sleep. > > Again, you have not said anything specific enough to figure out what > happens on your end. You said the owner changes, this means there is > progress made. What isn't clear is if any one particular spinner is > starved (that would be a problem) As far as I can tell, no spinner is starved. The spinner and the impacted task are interleaved, which is expected behavior. > or if this latency spike you observe > is worse than would be from running a while(1) loop, in which case, > that's just how it is. The latency spike occurs because the impacted task must wait for the spinner to voluntarily yield the CPU. In effect, the spinner behaves similarly to a while (1) {} loop. So the real problem here is that we should avoid unnecessary spinning. Is there any reason we have to spin to speed up the ftrace_lock? I believe not. > > What is not sane, is marking random locks with random properties just > because random workload. -- Regards Yafang ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-04 9:02 ` Peter Zijlstra 2026-03-04 9:37 ` Yafang Shao @ 2026-03-04 9:54 ` David Laight 2026-03-04 20:57 ` Steven Rostedt 1 sibling, 1 reply; 29+ messages in thread From: David Laight @ 2026-03-04 9:54 UTC (permalink / raw) To: Peter Zijlstra Cc: Yafang Shao, mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Wed, 4 Mar 2026 10:02:49 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Mar 04, 2026 at 03:46:49PM +0800, Yafang Shao wrote: > > Introduce mutex_lock_nospin(), a helper that disables optimistic spinning > > on the owner for specific heavy locks. This prevents long spinning times > > that can lead to latency spikes for other tasks on the same runqueue. > > This makes no sense; spinning stops on need_resched(). > That might still be an issue if a high priority process is spinning. But a %sys spike doesn't imply a latency spike. Is this using the osq_lock.c code? That will have problems on overprovisioned VMs, it tries to find out whether the hypervisor has switched out - but ISTR that is flawed. In reality a spin lock shouldn't be held for long enough to cause any kind latency issue. So something in the code that reads the list of filter functions needs to be done differently so that the lock isn't held for as long. David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-04 9:54 ` David Laight @ 2026-03-04 20:57 ` Steven Rostedt 2026-03-04 21:44 ` David Laight 0 siblings, 1 reply; 29+ messages in thread From: Steven Rostedt @ 2026-03-04 20:57 UTC (permalink / raw) To: David Laight Cc: Peter Zijlstra, Yafang Shao, mingo, will, boqun, longman, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Wed, 4 Mar 2026 09:54:15 +0000 David Laight <david.laight.linux@gmail.com> wrote: > That might still be an issue if a high priority process is spinning. > But a %sys spike doesn't imply a latency spike. > > Is this using the osq_lock.c code? > That will have problems on overprovisioned VMs, it tries to find out > whether the hypervisor has switched out - but ISTR that is flawed. > > In reality a spin lock shouldn't be held for long enough to cause > any kind latency issue. > So something in the code that reads the list of filter functions > needs to be done differently so that the lock isn't held for as long. It's not a spinlock, it's an adaptive mutex which spins while the owner of the mutex is also still running on the CPU. If the spinner CPU triggers a NEED_RESCHED or the owner goes to sleep, the spinner stops spinning and goes to sleep too. Honestly, this still looks like a non-issue or a corner case that I don't think requires these changes. This looks like one of those "Patient: Doctor it hurts me when I do this. Doctor: Then don't do that." cases. Why is a production system having multiple users cat avaliable_filter_functions to begin with? -- Steve ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-04 20:57 ` Steven Rostedt @ 2026-03-04 21:44 ` David Laight 2026-03-05 2:17 ` Yafang Shao 0 siblings, 1 reply; 29+ messages in thread From: David Laight @ 2026-03-04 21:44 UTC (permalink / raw) To: Steven Rostedt Cc: Peter Zijlstra, Yafang Shao, mingo, will, boqun, longman, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Wed, 4 Mar 2026 15:57:42 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 4 Mar 2026 09:54:15 +0000 > David Laight <david.laight.linux@gmail.com> wrote: > > > That might still be an issue if a high priority process is spinning. > > But a %sys spike doesn't imply a latency spike. > > > > Is this using the osq_lock.c code? > > That will have problems on overprovisioned VMs, it tries to find out > > whether the hypervisor has switched out - but ISTR that is flawed. > > > > In reality a spin lock shouldn't be held for long enough to cause > > any kind latency issue. > > So something in the code that reads the list of filter functions > > needs to be done differently so that the lock isn't held for as long. > > It's not a spinlock, it's an adaptive mutex which spins while the owner of > the mutex is also still running on the CPU. If the spinner CPU triggers a > NEED_RESCHED or the owner goes to sleep, the spinner stops spinning and > goes to sleep too. I think half my brain knew that - otherwise I wouldn't have mentioned the osq_lock.c code. That all reminded me I've a patch that optimises that code a bit. But I do remember thinking it ought to have a 'I been spinning long enough, time to sleep' path. David > > Honestly, this still looks like a non-issue or a corner case that I don't > think requires these changes. > > This looks like one of those "Patient: Doctor it hurts me when I do this. > Doctor: Then don't do that." cases. > > Why is a production system having multiple users cat > avaliable_filter_functions to begin with? > > -- Steve ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-04 21:44 ` David Laight @ 2026-03-05 2:17 ` Yafang Shao 2026-03-05 2:28 ` Steven Rostedt 0 siblings, 1 reply; 29+ messages in thread From: Yafang Shao @ 2026-03-05 2:17 UTC (permalink / raw) To: David Laight Cc: Steven Rostedt, Peter Zijlstra, mingo, will, boqun, longman, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Thu, Mar 5, 2026 at 5:44 AM David Laight <david.laight.linux@gmail.com> wrote: > > On Wed, 4 Mar 2026 15:57:42 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Wed, 4 Mar 2026 09:54:15 +0000 > > David Laight <david.laight.linux@gmail.com> wrote: > > > > > That might still be an issue if a high priority process is spinning. > > > But a %sys spike doesn't imply a latency spike. > > > > > > Is this using the osq_lock.c code? > > > That will have problems on overprovisioned VMs, it tries to find out > > > whether the hypervisor has switched out - but ISTR that is flawed. > > > > > > In reality a spin lock shouldn't be held for long enough to cause > > > any kind latency issue. > > > So something in the code that reads the list of filter functions > > > needs to be done differently so that the lock isn't held for as long. > > > > It's not a spinlock, it's an adaptive mutex which spins while the owner of > > the mutex is also still running on the CPU. If the spinner CPU triggers a > > NEED_RESCHED or the owner goes to sleep, the spinner stops spinning and > > goes to sleep too. > > I think half my brain knew that - otherwise I wouldn't have mentioned > the osq_lock.c code. > That all reminded me I've a patch that optimises that code a bit. > But I do remember thinking it ought to have a 'I been spinning long > enough, time to sleep' path. > > David > > > > > Honestly, this still looks like a non-issue or a corner case that I don't > > think requires these changes. > > > > This looks like one of those "Patient: Doctor it hurts me when I do this. > > Doctor: Then don't do that." cases. > > > > Why is a production system having multiple users cat > > avaliable_filter_functions to begin with? bpftrace is a widely used tool for online debugging and dynamic tracing. However, sysadmins may unknowingly run multiple bpftrace instances concurrently without realizing the potential impact on system performance. If this is your answer, I believe we should clearly document the following warning: Warning: Do not read available_filter_functions concurrently, as doing so can significantly degrade system performance and potentially impact production workloads. -- Regards Yafang ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 2:17 ` Yafang Shao @ 2026-03-05 2:28 ` Steven Rostedt 2026-03-05 2:33 ` Yafang Shao 0 siblings, 1 reply; 29+ messages in thread From: Steven Rostedt @ 2026-03-05 2:28 UTC (permalink / raw) To: Yafang Shao Cc: David Laight, Peter Zijlstra, mingo, will, boqun, longman, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Thu, 5 Mar 2026 10:17:09 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > bpftrace is a widely used tool for online debugging and dynamic > tracing. However, sysadmins may unknowingly run multiple bpftrace > instances concurrently without realizing the potential impact on > system performance. > > If this is your answer, I believe we should clearly document the > following warning: > > Warning: Do not read available_filter_functions concurrently, as > doing so can significantly degrade system performance and potentially > impact production workloads. Or update bpftrace to cache that file. It's only updated on module load and unload, which isn't done much. There's no reason it needs to constantly read that file if bpftrace is being used constantly. -- Steve ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 2:28 ` Steven Rostedt @ 2026-03-05 2:33 ` Yafang Shao 2026-03-05 3:00 ` Steven Rostedt 0 siblings, 1 reply; 29+ messages in thread From: Yafang Shao @ 2026-03-05 2:33 UTC (permalink / raw) To: Steven Rostedt Cc: David Laight, Peter Zijlstra, mingo, will, boqun, longman, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Thu, Mar 5, 2026 at 10:28 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 5 Mar 2026 10:17:09 +0800 > Yafang Shao <laoar.shao@gmail.com> wrote: > > > bpftrace is a widely used tool for online debugging and dynamic > > tracing. However, sysadmins may unknowingly run multiple bpftrace > > instances concurrently without realizing the potential impact on > > system performance. > > > > If this is your answer, I believe we should clearly document the > > following warning: > > > > Warning: Do not read available_filter_functions concurrently, as > > doing so can significantly degrade system performance and potentially > > impact production workloads. > > Or update bpftrace to cache that file. It's only updated on module load > and unload, which isn't done much. There's no reason it needs to > constantly read that file if bpftrace is being used constantly. Other tools may also read available_filter_functions, requiring each one to be patched individually to avoid this flaw—a clearly impractical solution. -- Regards Yafang ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 2:33 ` Yafang Shao @ 2026-03-05 3:00 ` Steven Rostedt 2026-03-05 3:08 ` Yafang Shao 0 siblings, 1 reply; 29+ messages in thread From: Steven Rostedt @ 2026-03-05 3:00 UTC (permalink / raw) To: Yafang Shao Cc: David Laight, Peter Zijlstra, mingo, will, boqun, longman, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Thu, 5 Mar 2026 10:33:00 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > Other tools may also read available_filter_functions, requiring each > one to be patched individually to avoid this flaw—a clearly > impractical solution. What exactly is the issue? If a task does a while 1 in user space, it wouldn't be much different. With PREEMPT_LAZY the most a task will spin in the kernel is one extra tick over a user space task spinning in user space. available_filter_functions is definitely not a hot path, so I personally don't care if it were to use "nospin". My worry is about adding this "special" mutex for a single corner case, and I want to know that its a real bug before we add something special into the kernel. -- Steve ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 3:00 ` Steven Rostedt @ 2026-03-05 3:08 ` Yafang Shao 2026-03-05 4:30 ` Waiman Long 0 siblings, 1 reply; 29+ messages in thread From: Yafang Shao @ 2026-03-05 3:08 UTC (permalink / raw) To: Steven Rostedt Cc: David Laight, Peter Zijlstra, mingo, will, boqun, longman, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 5 Mar 2026 10:33:00 +0800 > Yafang Shao <laoar.shao@gmail.com> wrote: > > > Other tools may also read available_filter_functions, requiring each > > one to be patched individually to avoid this flaw—a clearly > > impractical solution. > > What exactly is the issue? It makes no sense to spin unnecessarily when it can be avoided. We continuously improve the kernel to do the right thing—and unnecessary spinning is certainly not the right thing. > If a task does a while 1 in user space, it > wouldn't be much different. The while loop in user space performs actual work, whereas useless spinning does nothing but burn CPU cycles. My point is simple: if this unnecessary spinning isn't already considered an issue, it should be—it's something that clearly needs improvement. > With PREEMPT_LAZY the most a task will spin > in the kernel is one extra tick over a user space task spinning in user > space. > > available_filter_functions is definitely not a hot path, so I > personally don't care if it were to use "nospin". My worry is about > adding this "special" mutex for a single corner case, and I want to know > that its a real bug before we add something special into the kernel. > > -- Steve -- Regards Yafang ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 3:08 ` Yafang Shao @ 2026-03-05 4:30 ` Waiman Long 2026-03-05 5:40 ` Yafang Shao 2026-03-05 9:32 ` David Laight 0 siblings, 2 replies; 29+ messages in thread From: Waiman Long @ 2026-03-05 4:30 UTC (permalink / raw) To: Yafang Shao, Steven Rostedt Cc: David Laight, Peter Zijlstra, mingo, will, boqun, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On 3/4/26 10:08 PM, Yafang Shao wrote: > On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote: >> On Thu, 5 Mar 2026 10:33:00 +0800 >> Yafang Shao <laoar.shao@gmail.com> wrote: >> >>> Other tools may also read available_filter_functions, requiring each >>> one to be patched individually to avoid this flaw—a clearly >>> impractical solution. >> What exactly is the issue? > It makes no sense to spin unnecessarily when it can be avoided. We > continuously improve the kernel to do the right thing—and unnecessary > spinning is certainly not the right thing. > >> If a task does a while 1 in user space, it >> wouldn't be much different. > The while loop in user space performs actual work, whereas useless > spinning does nothing but burn CPU cycles. My point is simple: if this > unnecessary spinning isn't already considered an issue, it should > be—it's something that clearly needs improvement. The whole point of optimistic spinning is to reduce the lock acquisition latency. If the waiter sleeps, the unlock operation will have to wake up the waiter which can have a variable latency depending on how busy the system is at the time. Yes, it is burning CPU cycles while spinning, Most workloads will gain performance with this optimistic spinning feature. You do have a point that for system monitoring tools that observe the system behavior, they shouldn't burn that much CPU times that affect performance of real workload that the tools are monitoring. BTW, you should expand the commit log of patch 1 to include the rationale of why we should add this feature to mutex as the information in the cover letter won't get included in the git log if this patch series is merged. You should also elaborate in comment on under what conditions should this this new mutex API be used. Cheers, Longman ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 4:30 ` Waiman Long @ 2026-03-05 5:40 ` Yafang Shao 2026-03-05 13:21 ` Steven Rostedt 2026-03-05 18:34 ` Waiman Long 2026-03-05 9:32 ` David Laight 1 sibling, 2 replies; 29+ messages in thread From: Yafang Shao @ 2026-03-05 5:40 UTC (permalink / raw) To: Waiman Long Cc: Steven Rostedt, David Laight, Peter Zijlstra, mingo, will, boqun, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Thu, Mar 5, 2026 at 12:30 PM Waiman Long <longman@redhat.com> wrote: > > On 3/4/26 10:08 PM, Yafang Shao wrote: > > On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote: > >> On Thu, 5 Mar 2026 10:33:00 +0800 > >> Yafang Shao <laoar.shao@gmail.com> wrote: > >> > >>> Other tools may also read available_filter_functions, requiring each > >>> one to be patched individually to avoid this flaw—a clearly > >>> impractical solution. > >> What exactly is the issue? > > It makes no sense to spin unnecessarily when it can be avoided. We > > continuously improve the kernel to do the right thing—and unnecessary > > spinning is certainly not the right thing. > > > >> If a task does a while 1 in user space, it > >> wouldn't be much different. > > The while loop in user space performs actual work, whereas useless > > spinning does nothing but burn CPU cycles. My point is simple: if this > > unnecessary spinning isn't already considered an issue, it should > > be—it's something that clearly needs improvement. > > The whole point of optimistic spinning is to reduce the lock acquisition > latency. If the waiter sleeps, the unlock operation will have to wake up > the waiter which can have a variable latency depending on how busy the > system is at the time. Yes, it is burning CPU cycles while spinning, > Most workloads will gain performance with this optimistic spinning > feature. You do have a point that for system monitoring tools that > observe the system behavior, they shouldn't burn that much CPU times > that affect performance of real workload that the tools are monitoring. Exactly. ftrace is intended for debugging and should not significantly impact real workloads. Therefore, it's reasonable to make it sleep if it cannot acquire the lock immediately, rather than spinning and consuming CPU cycles. > > BTW, you should expand the commit log of patch 1 to include the > rationale of why we should add this feature to mutex as the information > in the cover letter won't get included in the git log if this patch > series is merged. You should also elaborate in comment on under what > conditions should this this new mutex API be used. Sure. I will update it. BTW, these issues are notably hard to find. I suspect there are other locks out there with the same problem. -- Regards Yafang ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 5:40 ` Yafang Shao @ 2026-03-05 13:21 ` Steven Rostedt 2026-03-06 2:22 ` Yafang Shao 2026-03-05 18:34 ` Waiman Long 1 sibling, 1 reply; 29+ messages in thread From: Steven Rostedt @ 2026-03-05 13:21 UTC (permalink / raw) To: Yafang Shao Cc: Waiman Long, David Laight, Peter Zijlstra, mingo, will, boqun, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Thu, 5 Mar 2026 13:40:27 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > Exactly. ftrace is intended for debugging and should not significantly > impact real workloads. Therefore, it's reasonable to make it sleep if > it cannot acquire the lock immediately, rather than spinning and > consuming CPU cycles. Actually, ftrace is more than just debugging. It is the infrastructure for live kernel patching as well. > > > > > BTW, you should expand the commit log of patch 1 to include the > > rationale of why we should add this feature to mutex as the information > > in the cover letter won't get included in the git log if this patch > > series is merged. You should also elaborate in comment on under what > > conditions should this this new mutex API be used. > > Sure. I will update it. > > BTW, these issues are notably hard to find. I suspect there are other > locks out there with the same problem. As I mentioned, I'm not against the change. I just want to make sure the rationale is strong enough to make the change. One thing that should be modified with your patch is the name. "nospin" references the implementation of the mutex. Instead it should be called something like: "noncritical" or "slowpath" stating that the grabbing of this mutex is not of a critical section. Maybe an entirely new interface should be defined: struct slow_mutex; slow_mutex_lock() slow_mutex_unlock() etc, that makes it obvious that this mutex may be held for long periods of time. In fact, this would be useful for RT workloads, as these mutexes could be flagged to warn RT critical tasks if those tasks were to take one of them. There has been some talk to mark paths in the kernel that RT tasks would get a SIGKILL if they were to hit a path that is known to be non deterministic. -- Steve ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 13:21 ` Steven Rostedt @ 2026-03-06 2:22 ` Yafang Shao 2026-03-06 10:00 ` David Laight 0 siblings, 1 reply; 29+ messages in thread From: Yafang Shao @ 2026-03-06 2:22 UTC (permalink / raw) To: Steven Rostedt Cc: Waiman Long, David Laight, Peter Zijlstra, mingo, will, boqun, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Thu, Mar 5, 2026 at 9:20 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 5 Mar 2026 13:40:27 +0800 > Yafang Shao <laoar.shao@gmail.com> wrote: > > > Exactly. ftrace is intended for debugging and should not significantly > > impact real workloads. Therefore, it's reasonable to make it sleep if > > it cannot acquire the lock immediately, rather than spinning and > > consuming CPU cycles. > > Actually, ftrace is more than just debugging. It is the infrastructure for > live kernel patching as well. good to know. > > > > > > > > > BTW, you should expand the commit log of patch 1 to include the > > > rationale of why we should add this feature to mutex as the information > > > in the cover letter won't get included in the git log if this patch > > > series is merged. You should also elaborate in comment on under what > > > conditions should this this new mutex API be used. > > > > Sure. I will update it. > > > > BTW, these issues are notably hard to find. I suspect there are other > > locks out there with the same problem. > > As I mentioned, I'm not against the change. I just want to make sure the > rationale is strong enough to make the change. > > One thing that should be modified with your patch is the name. "nospin" > references the implementation of the mutex. Instead it should be called > something like: "noncritical" or "slowpath" stating that the grabbing of > this mutex is not of a critical section. > > Maybe an entirely new interface should be defined: > > > struct slow_mutex; Is it necessary to define a new structure for this slow mutex? We could simply reuse the existing struct mutex instead. Alternatively, should we add some new flags to this slow_mutex for debugging purposes? > > slow_mutex_lock() > slow_mutex_unlock() These two APIs appear sufficient to handle this use case. > > etc, > > that makes it obvious that this mutex may be held for long periods of time. > In fact, this would be useful for RT workloads, as these mutexes could be > flagged to warn RT critical tasks if those tasks were to take one of them. > > There has been some talk to mark paths in the kernel that RT tasks would > get a SIGKILL if they were to hit a path that is known to be non > deterministic. Thanks for your information. -- Regards Yafang ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-06 2:22 ` Yafang Shao @ 2026-03-06 10:00 ` David Laight 2026-03-09 2:34 ` Yafang Shao 0 siblings, 1 reply; 29+ messages in thread From: David Laight @ 2026-03-06 10:00 UTC (permalink / raw) To: Yafang Shao Cc: Steven Rostedt, Waiman Long, Peter Zijlstra, mingo, will, boqun, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Fri, 6 Mar 2026 10:22:11 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > On Thu, Mar 5, 2026 at 9:20 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Thu, 5 Mar 2026 13:40:27 +0800 > > Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > Exactly. ftrace is intended for debugging and should not significantly > > > impact real workloads. Therefore, it's reasonable to make it sleep if > > > it cannot acquire the lock immediately, rather than spinning and > > > consuming CPU cycles. > > > > Actually, ftrace is more than just debugging. It is the infrastructure for > > live kernel patching as well. > > good to know. > > > > > > > > > > > > > > BTW, you should expand the commit log of patch 1 to include the > > > > rationale of why we should add this feature to mutex as the information > > > > in the cover letter won't get included in the git log if this patch > > > > series is merged. You should also elaborate in comment on under what > > > > conditions should this this new mutex API be used. > > > > > > Sure. I will update it. > > > > > > BTW, these issues are notably hard to find. I suspect there are other > > > locks out there with the same problem. > > > > As I mentioned, I'm not against the change. I just want to make sure the > > rationale is strong enough to make the change. > > > > One thing that should be modified with your patch is the name. "nospin" > > references the implementation of the mutex. Instead it should be called > > something like: "noncritical" or "slowpath" stating that the grabbing of > > this mutex is not of a critical section. > > > > Maybe an entirely new interface should be defined: > > > > > > struct slow_mutex; > > Is it necessary to define a new structure for this slow mutex? We > could simply reuse the existing struct mutex instead. Alternatively, > should we add some new flags to this slow_mutex for debugging > purposes? > > > > > slow_mutex_lock() > > slow_mutex_unlock() > > These two APIs appear sufficient to handle this use case. Don't semaphores still exist? IIRC they always sleep. Although I wonder if the mutex need to be held for as long at it is. ISTR one of the tracebacks was one the 'address to name' lookup, that code will be slow. Since the mutex can't be held across the multiple reads that are done to read the full list of tracepoints it must surely be possible to release it across the name lookup? David > > > > > etc, > > > > that makes it obvious that this mutex may be held for long periods of time. > > In fact, this would be useful for RT workloads, as these mutexes could be > > flagged to warn RT critical tasks if those tasks were to take one of them. > > > > There has been some talk to mark paths in the kernel that RT tasks would > > get a SIGKILL if they were to hit a path that is known to be non > > deterministic. > > Thanks for your information. > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-06 10:00 ` David Laight @ 2026-03-09 2:34 ` Yafang Shao 0 siblings, 0 replies; 29+ messages in thread From: Yafang Shao @ 2026-03-09 2:34 UTC (permalink / raw) To: David Laight, Steven Rostedt Cc: Waiman Long, Peter Zijlstra, mingo, will, boqun, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Fri, Mar 6, 2026 at 6:00 PM David Laight <david.laight.linux@gmail.com> wrote: > > On Fri, 6 Mar 2026 10:22:11 +0800 > Yafang Shao <laoar.shao@gmail.com> wrote: > > > On Thu, Mar 5, 2026 at 9:20 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > On Thu, 5 Mar 2026 13:40:27 +0800 > > > Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > Exactly. ftrace is intended for debugging and should not significantly > > > > impact real workloads. Therefore, it's reasonable to make it sleep if > > > > it cannot acquire the lock immediately, rather than spinning and > > > > consuming CPU cycles. > > > > > > Actually, ftrace is more than just debugging. It is the infrastructure for > > > live kernel patching as well. > > > > good to know. > > > > > > > > > > > > > > > > > > > BTW, you should expand the commit log of patch 1 to include the > > > > > rationale of why we should add this feature to mutex as the information > > > > > in the cover letter won't get included in the git log if this patch > > > > > series is merged. You should also elaborate in comment on under what > > > > > conditions should this this new mutex API be used. > > > > > > > > Sure. I will update it. > > > > > > > > BTW, these issues are notably hard to find. I suspect there are other > > > > locks out there with the same problem. > > > > > > As I mentioned, I'm not against the change. I just want to make sure the > > > rationale is strong enough to make the change. > > > > > > One thing that should be modified with your patch is the name. "nospin" > > > references the implementation of the mutex. Instead it should be called > > > something like: "noncritical" or "slowpath" stating that the grabbing of > > > this mutex is not of a critical section. > > > > > > Maybe an entirely new interface should be defined: > > > > > > > > > struct slow_mutex; > > > > Is it necessary to define a new structure for this slow mutex? We > > could simply reuse the existing struct mutex instead. Alternatively, > > should we add some new flags to this slow_mutex for debugging > > purposes? > > > > > > > > slow_mutex_lock() > > > slow_mutex_unlock() > > > > These two APIs appear sufficient to handle this use case. > > Don't semaphores still exist? While semaphores may present similar challenges, I'm not currently aware of specific instances that share this exact issue. Should we encounter any problematic semaphores in production workloads, we can address them at that time. > IIRC they always sleep. > > Although I wonder if the mutex need to be held for as long at it is. > ISTR one of the tracebacks was one the 'address to name' lookup, > that code will be slow. > Since the mutex can't be held across the multiple reads that are done > to read the full list of tracepoints it must surely be possible to > release it across the name lookup? That's a great point, though I'm not entirely certain at the moment. Perhaps Steven can provide further insight. -- Regards Yafang ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 5:40 ` Yafang Shao 2026-03-05 13:21 ` Steven Rostedt @ 2026-03-05 18:34 ` Waiman Long 2026-03-05 18:44 ` Waiman Long 1 sibling, 1 reply; 29+ messages in thread From: Waiman Long @ 2026-03-05 18:34 UTC (permalink / raw) To: Yafang Shao Cc: Steven Rostedt, David Laight, Peter Zijlstra, mingo, will, boqun, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On 3/5/26 12:40 AM, Yafang Shao wrote: > On Thu, Mar 5, 2026 at 12:30 PM Waiman Long <longman@redhat.com> wrote: >> On 3/4/26 10:08 PM, Yafang Shao wrote: >>> On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote: >>>> On Thu, 5 Mar 2026 10:33:00 +0800 >>>> Yafang Shao <laoar.shao@gmail.com> wrote: >>>> >>>>> Other tools may also read available_filter_functions, requiring each >>>>> one to be patched individually to avoid this flaw—a clearly >>>>> impractical solution. >>>> What exactly is the issue? >>> It makes no sense to spin unnecessarily when it can be avoided. We >>> continuously improve the kernel to do the right thing—and unnecessary >>> spinning is certainly not the right thing. >>> >>>> If a task does a while 1 in user space, it >>>> wouldn't be much different. >>> The while loop in user space performs actual work, whereas useless >>> spinning does nothing but burn CPU cycles. My point is simple: if this >>> unnecessary spinning isn't already considered an issue, it should >>> be—it's something that clearly needs improvement. >> The whole point of optimistic spinning is to reduce the lock acquisition >> latency. If the waiter sleeps, the unlock operation will have to wake up >> the waiter which can have a variable latency depending on how busy the >> system is at the time. Yes, it is burning CPU cycles while spinning, >> Most workloads will gain performance with this optimistic spinning >> feature. You do have a point that for system monitoring tools that >> observe the system behavior, they shouldn't burn that much CPU times >> that affect performance of real workload that the tools are monitoring. > Exactly. ftrace is intended for debugging and should not significantly > impact real workloads. Therefore, it's reasonable to make it sleep if > it cannot acquire the lock immediately, rather than spinning and > consuming CPU cycles. Your patch series use wordings that give a negative connotation about optimistic spinning making it look like a bad thing. In fact, it is just a request for a new mutex API for use cases where they can suffer higher latency in order to minimize the system overhead they incur. So don't bad-mouth optimistic spinning and emphasize the use cases you want to support with the new API in your next version. My 2 cents. Cheers, Longman ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 18:34 ` Waiman Long @ 2026-03-05 18:44 ` Waiman Long 2026-03-06 2:27 ` Yafang Shao 0 siblings, 1 reply; 29+ messages in thread From: Waiman Long @ 2026-03-05 18:44 UTC (permalink / raw) To: Yafang Shao Cc: Steven Rostedt, David Laight, Peter Zijlstra, mingo, will, boqun, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On 3/5/26 1:34 PM, Waiman Long wrote: > On 3/5/26 12:40 AM, Yafang Shao wrote: >> On Thu, Mar 5, 2026 at 12:30 PM Waiman Long <longman@redhat.com> wrote: >>> On 3/4/26 10:08 PM, Yafang Shao wrote: >>>> On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt >>>> <rostedt@goodmis.org> wrote: >>>>> On Thu, 5 Mar 2026 10:33:00 +0800 >>>>> Yafang Shao <laoar.shao@gmail.com> wrote: >>>>> >>>>>> Other tools may also read available_filter_functions, requiring each >>>>>> one to be patched individually to avoid this flaw—a clearly >>>>>> impractical solution. >>>>> What exactly is the issue? >>>> It makes no sense to spin unnecessarily when it can be avoided. We >>>> continuously improve the kernel to do the right thing—and unnecessary >>>> spinning is certainly not the right thing. >>>> >>>>> If a task does a while 1 in user space, it >>>>> wouldn't be much different. >>>> The while loop in user space performs actual work, whereas useless >>>> spinning does nothing but burn CPU cycles. My point is simple: if this >>>> unnecessary spinning isn't already considered an issue, it should >>>> be—it's something that clearly needs improvement. >>> The whole point of optimistic spinning is to reduce the lock >>> acquisition >>> latency. If the waiter sleeps, the unlock operation will have to >>> wake up >>> the waiter which can have a variable latency depending on how busy the >>> system is at the time. Yes, it is burning CPU cycles while spinning, >>> Most workloads will gain performance with this optimistic spinning >>> feature. You do have a point that for system monitoring tools that >>> observe the system behavior, they shouldn't burn that much CPU times >>> that affect performance of real workload that the tools are monitoring. >> Exactly. ftrace is intended for debugging and should not significantly >> impact real workloads. Therefore, it's reasonable to make it sleep if >> it cannot acquire the lock immediately, rather than spinning and >> consuming CPU cycles. > > Your patch series use wordings that give a negative connotation about > optimistic spinning making it look like a bad thing. In fact, it is > just a request for a new mutex API for use cases where they can suffer > higher latency in order to minimize the system overhead they incur. So > don't bad-mouth optimistic spinning and emphasize the use cases you > want to support with the new API in your next version. BTW, for any new mutex API introduced, you should also provide an equivalent version in kernel/locking/rtmutex_api.c for PREEMPT_RT kernel. Cheers, Longman ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 18:44 ` Waiman Long @ 2026-03-06 2:27 ` Yafang Shao 0 siblings, 0 replies; 29+ messages in thread From: Yafang Shao @ 2026-03-06 2:27 UTC (permalink / raw) To: Waiman Long Cc: Steven Rostedt, David Laight, Peter Zijlstra, mingo, will, boqun, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Fri, Mar 6, 2026 at 2:45 AM Waiman Long <longman@redhat.com> wrote: > > On 3/5/26 1:34 PM, Waiman Long wrote: > > On 3/5/26 12:40 AM, Yafang Shao wrote: > >> On Thu, Mar 5, 2026 at 12:30 PM Waiman Long <longman@redhat.com> wrote: > >>> On 3/4/26 10:08 PM, Yafang Shao wrote: > >>>> On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt > >>>> <rostedt@goodmis.org> wrote: > >>>>> On Thu, 5 Mar 2026 10:33:00 +0800 > >>>>> Yafang Shao <laoar.shao@gmail.com> wrote: > >>>>> > >>>>>> Other tools may also read available_filter_functions, requiring each > >>>>>> one to be patched individually to avoid this flaw—a clearly > >>>>>> impractical solution. > >>>>> What exactly is the issue? > >>>> It makes no sense to spin unnecessarily when it can be avoided. We > >>>> continuously improve the kernel to do the right thing—and unnecessary > >>>> spinning is certainly not the right thing. > >>>> > >>>>> If a task does a while 1 in user space, it > >>>>> wouldn't be much different. > >>>> The while loop in user space performs actual work, whereas useless > >>>> spinning does nothing but burn CPU cycles. My point is simple: if this > >>>> unnecessary spinning isn't already considered an issue, it should > >>>> be—it's something that clearly needs improvement. > >>> The whole point of optimistic spinning is to reduce the lock > >>> acquisition > >>> latency. If the waiter sleeps, the unlock operation will have to > >>> wake up > >>> the waiter which can have a variable latency depending on how busy the > >>> system is at the time. Yes, it is burning CPU cycles while spinning, > >>> Most workloads will gain performance with this optimistic spinning > >>> feature. You do have a point that for system monitoring tools that > >>> observe the system behavior, they shouldn't burn that much CPU times > >>> that affect performance of real workload that the tools are monitoring. > >> Exactly. ftrace is intended for debugging and should not significantly > >> impact real workloads. Therefore, it's reasonable to make it sleep if > >> it cannot acquire the lock immediately, rather than spinning and > >> consuming CPU cycles. > > > > Your patch series use wordings that give a negative connotation about > > optimistic spinning making it look like a bad thing. Perhaps I didn't phrase that well. I do understand that optimistic spinning is valuable in use cases where we wouldn't want to disable CONFIG_MUTEX_SPIN_ON_OWNER. > In fact, it is > > just a request for a new mutex API for use cases where they can suffer > > higher latency in order to minimize the system overhead they incur. So > > don't bad-mouth optimistic spinning and emphasize the use cases you > > want to support with the new API in your next version. > > BTW, for any new mutex API introduced, you should also provide an > equivalent version in kernel/locking/rtmutex_api.c for PREEMPT_RT kernel. Thanks for the suggestion. -- Regards Yafang ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 4:30 ` Waiman Long 2026-03-05 5:40 ` Yafang Shao @ 2026-03-05 9:32 ` David Laight 2026-03-05 19:00 ` Waiman Long 1 sibling, 1 reply; 29+ messages in thread From: David Laight @ 2026-03-05 9:32 UTC (permalink / raw) To: Waiman Long Cc: Yafang Shao, Steven Rostedt, Peter Zijlstra, mingo, will, boqun, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Wed, 4 Mar 2026 23:30:40 -0500 Waiman Long <longman@redhat.com> wrote: > On 3/4/26 10:08 PM, Yafang Shao wrote: > > On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote: > >> On Thu, 5 Mar 2026 10:33:00 +0800 > >> Yafang Shao <laoar.shao@gmail.com> wrote: > >> > >>> Other tools may also read available_filter_functions, requiring each > >>> one to be patched individually to avoid this flaw—a clearly > >>> impractical solution. > >> What exactly is the issue? > > It makes no sense to spin unnecessarily when it can be avoided. We > > continuously improve the kernel to do the right thing—and unnecessary > > spinning is certainly not the right thing. > > > >> If a task does a while 1 in user space, it > >> wouldn't be much different. > > The while loop in user space performs actual work, whereas useless > > spinning does nothing but burn CPU cycles. My point is simple: if this > > unnecessary spinning isn't already considered an issue, it should > > be—it's something that clearly needs improvement. > > The whole point of optimistic spinning is to reduce the lock acquisition > latency. If the waiter sleeps, the unlock operation will have to wake up > the waiter which can have a variable latency depending on how busy the > system is at the time. Yes, it is burning CPU cycles while spinning, > Most workloads will gain performance with this optimistic spinning > feature. You do have a point that for system monitoring tools that > observe the system behavior, they shouldn't burn that much CPU times > that affect performance of real workload that the tools are monitoring. > > BTW, you should expand the commit log of patch 1 to include the > rationale of why we should add this feature to mutex as the information > in the cover letter won't get included in the git log if this patch > series is merged. You should also elaborate in comment on under what > conditions should this this new mutex API be used. Isn't changing mutex_lock() the wrong place anyway? What you need is for the code holding the lock to indicate that it isn't worth waiters spinning because the lock will be held for a long time. David > > Cheers, > Longman > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 9:32 ` David Laight @ 2026-03-05 19:00 ` Waiman Long 2026-03-06 2:33 ` Yafang Shao 0 siblings, 1 reply; 29+ messages in thread From: Waiman Long @ 2026-03-05 19:00 UTC (permalink / raw) To: David Laight Cc: Yafang Shao, Steven Rostedt, Peter Zijlstra, mingo, will, boqun, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On 3/5/26 4:32 AM, David Laight wrote: > On Wed, 4 Mar 2026 23:30:40 -0500 > Waiman Long <longman@redhat.com> wrote: > >> On 3/4/26 10:08 PM, Yafang Shao wrote: >>> On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote: >>>> On Thu, 5 Mar 2026 10:33:00 +0800 >>>> Yafang Shao <laoar.shao@gmail.com> wrote: >>>> >>>>> Other tools may also read available_filter_functions, requiring each >>>>> one to be patched individually to avoid this flaw—a clearly >>>>> impractical solution. >>>> What exactly is the issue? >>> It makes no sense to spin unnecessarily when it can be avoided. We >>> continuously improve the kernel to do the right thing—and unnecessary >>> spinning is certainly not the right thing. >>> >>>> If a task does a while 1 in user space, it >>>> wouldn't be much different. >>> The while loop in user space performs actual work, whereas useless >>> spinning does nothing but burn CPU cycles. My point is simple: if this >>> unnecessary spinning isn't already considered an issue, it should >>> be—it's something that clearly needs improvement. >> The whole point of optimistic spinning is to reduce the lock acquisition >> latency. If the waiter sleeps, the unlock operation will have to wake up >> the waiter which can have a variable latency depending on how busy the >> system is at the time. Yes, it is burning CPU cycles while spinning, >> Most workloads will gain performance with this optimistic spinning >> feature. You do have a point that for system monitoring tools that >> observe the system behavior, they shouldn't burn that much CPU times >> that affect performance of real workload that the tools are monitoring. >> >> BTW, you should expand the commit log of patch 1 to include the >> rationale of why we should add this feature to mutex as the information >> in the cover letter won't get included in the git log if this patch >> series is merged. You should also elaborate in comment on under what >> conditions should this this new mutex API be used. > Isn't changing mutex_lock() the wrong place anyway? > What you need is for the code holding the lock to indicate that > it isn't worth waiters spinning because the lock will be held > for a long time. I have actually thought about having a flag somewhere in the mutex itself to indicate that optimistic spinning isn't needed. However the owner field is running out of usable flag bits. The other option is to add it to osq as it doesn't really need to use the full 32 bits for the tail. In this case, we can just initialize the mutex to say that we don't need optimistic spinning and no new mutex_lock() API will be needed. Cheers, Longman ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin() 2026-03-05 19:00 ` Waiman Long @ 2026-03-06 2:33 ` Yafang Shao 0 siblings, 0 replies; 29+ messages in thread From: Yafang Shao @ 2026-03-06 2:33 UTC (permalink / raw) To: Waiman Long Cc: David Laight, Steven Rostedt, Peter Zijlstra, mingo, will, boqun, mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel, bpf On Fri, Mar 6, 2026 at 3:00 AM Waiman Long <longman@redhat.com> wrote: > > On 3/5/26 4:32 AM, David Laight wrote: > > On Wed, 4 Mar 2026 23:30:40 -0500 > > Waiman Long <longman@redhat.com> wrote: > > > >> On 3/4/26 10:08 PM, Yafang Shao wrote: > >>> On Thu, Mar 5, 2026 at 11:00 AM Steven Rostedt <rostedt@goodmis.org> wrote: > >>>> On Thu, 5 Mar 2026 10:33:00 +0800 > >>>> Yafang Shao <laoar.shao@gmail.com> wrote: > >>>> > >>>>> Other tools may also read available_filter_functions, requiring each > >>>>> one to be patched individually to avoid this flaw—a clearly > >>>>> impractical solution. > >>>> What exactly is the issue? > >>> It makes no sense to spin unnecessarily when it can be avoided. We > >>> continuously improve the kernel to do the right thing—and unnecessary > >>> spinning is certainly not the right thing. > >>> > >>>> If a task does a while 1 in user space, it > >>>> wouldn't be much different. > >>> The while loop in user space performs actual work, whereas useless > >>> spinning does nothing but burn CPU cycles. My point is simple: if this > >>> unnecessary spinning isn't already considered an issue, it should > >>> be—it's something that clearly needs improvement. > >> The whole point of optimistic spinning is to reduce the lock acquisition > >> latency. If the waiter sleeps, the unlock operation will have to wake up > >> the waiter which can have a variable latency depending on how busy the > >> system is at the time. Yes, it is burning CPU cycles while spinning, > >> Most workloads will gain performance with this optimistic spinning > >> feature. You do have a point that for system monitoring tools that > >> observe the system behavior, they shouldn't burn that much CPU times > >> that affect performance of real workload that the tools are monitoring. > >> > >> BTW, you should expand the commit log of patch 1 to include the > >> rationale of why we should add this feature to mutex as the information > >> in the cover letter won't get included in the git log if this patch > >> series is merged. You should also elaborate in comment on under what > >> conditions should this this new mutex API be used. > > Isn't changing mutex_lock() the wrong place anyway? > > What you need is for the code holding the lock to indicate that > > it isn't worth waiters spinning because the lock will be held > > for a long time. > > I have actually thought about having a flag somewhere in the mutex > itself to indicate that optimistic spinning isn't needed. However the > owner field is running out of usable flag bits. True. Introducing a new MUTEX_FLAGS would likely require a substantial refactor of the mutex code, which may not be worth it. > The other option is to > add it to osq as it doesn't really need to use the full 32 bits for the > tail. In this case, we can just initialize the mutex to say that we > don't need optimistic spinning and no new mutex_lock() API will be needed. I believe a new mutex_lock() variant would be clearer and easier to understand. It also has the advantage of requiring minimal changes to the existing mutex code. -- Regards Yafang ^ permalink raw reply [flat|nested] 29+ messages in thread
* [RFC PATCH 2/2] ftrace: disable optimistic spinning for ftrace_lock 2026-03-04 7:46 [RFC PATCH 0/2] disable optimistic spinning for ftrace_lock Yafang Shao 2026-03-04 7:46 ` [RFC PATCH 1/2] locking: add mutex_lock_nospin() Yafang Shao @ 2026-03-04 7:46 ` Yafang Shao 1 sibling, 0 replies; 29+ messages in thread From: Yafang Shao @ 2026-03-04 7:46 UTC (permalink / raw) To: peterz, mingo, will, boqun, longman, rostedt, mhiramat, mark.rutland, mathieu.desnoyers Cc: linux-kernel, linux-trace-kernel, bpf, Yafang Shao mutex_lock_nospin() is used for ftrace_lock to selectively disable optimistic spinning. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- kernel/trace/ftrace.c | 52 +++++++++++++++++++++---------------------- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 827fb9a0bf0d..b8cca4f76118 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -1284,7 +1284,7 @@ static void clear_ftrace_mod_list(struct list_head *head) if (!head) return; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); list_for_each_entry_safe(p, n, head, list) free_ftrace_mod(p); mutex_unlock(&ftrace_lock); @@ -4254,7 +4254,7 @@ static void *t_start(struct seq_file *m, loff_t *pos) void *p = NULL; loff_t l; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); if (unlikely(ftrace_disabled)) return NULL; @@ -4362,7 +4362,7 @@ static __init void ftrace_check_work_func(struct work_struct *work) struct ftrace_page *pg; struct dyn_ftrace *rec; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); do_for_each_ftrace_rec(pg, rec) { test_for_valid_rec(rec); } while_for_each_ftrace_rec(); @@ -5123,7 +5123,7 @@ static void process_mod_list(struct list_head *head, struct ftrace_ops *ops, if (!new_hash) goto out; /* warn? */ - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); list_for_each_entry_safe(ftrace_mod, n, head, list) { @@ -5159,7 +5159,7 @@ static void process_mod_list(struct list_head *head, struct ftrace_ops *ops, if (enable && list_empty(head)) new_hash->flags &= ~FTRACE_HASH_FL_MOD; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); ftrace_hash_move_and_update_ops(ops, orig_hash, new_hash, enable); @@ -5465,7 +5465,7 @@ register_ftrace_function_probe(char *glob, struct trace_array *tr, return -EINVAL; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); /* Check if the probe_ops is already registered */ list_for_each_entry(iter, &tr->func_probes, list) { if (iter->probe_ops == probe_ops) { @@ -5540,7 +5540,7 @@ register_ftrace_function_probe(char *glob, struct trace_array *tr, } } - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); if (!count) { /* Nothing was added? */ @@ -5619,7 +5619,7 @@ unregister_ftrace_function_probe_func(char *glob, struct trace_array *tr, return -EINVAL; } - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); /* Check if the probe_ops is already registered */ list_for_each_entry(iter, &tr->func_probes, list) { if (iter->probe_ops == probe_ops) { @@ -5679,7 +5679,7 @@ unregister_ftrace_function_probe_func(char *glob, struct trace_array *tr, goto out_unlock; } - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); WARN_ON(probe->ref < count); @@ -5943,7 +5943,7 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len, goto out_regex_unlock; } - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); ret = ftrace_hash_move_and_update_ops(ops, orig_hash, hash, enable); mutex_unlock(&ftrace_lock); @@ -6205,7 +6205,7 @@ __modify_ftrace_direct(struct ftrace_ops *ops, unsigned long addr) * Now the ftrace_ops_list_func() is called to do the direct callers. * We can safely change the direct functions attached to each entry. */ - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); size = 1 << hash->size_bits; for (i = 0; i < size; i++) { @@ -6625,7 +6625,7 @@ int update_ftrace_direct_mod(struct ftrace_ops *ops, struct ftrace_hash *hash, b * Now the ftrace_ops_list_func() is called to do the direct callers. * We can safely change the direct functions attached to each entry. */ - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); size = 1 << hash->size_bits; for (i = 0; i < size; i++) { @@ -6980,7 +6980,7 @@ int ftrace_regex_release(struct inode *inode, struct file *file) } else orig_hash = &iter->ops->func_hash->notrace_hash; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); ftrace_hash_move_and_update_ops(iter->ops, orig_hash, iter->hash, filter_hash); mutex_unlock(&ftrace_lock); @@ -7464,7 +7464,7 @@ void ftrace_create_filter_files(struct ftrace_ops *ops, */ void ftrace_destroy_filter_files(struct ftrace_ops *ops) { - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); if (ops->flags & FTRACE_OPS_FL_ENABLED) ftrace_shutdown(ops, 0); ops->flags |= FTRACE_OPS_FL_DELETED; @@ -7571,7 +7571,7 @@ static int ftrace_process_locs(struct module *mod, if (!start_pg) return -ENOMEM; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); /* * Core and each module needs their own pages, as @@ -7868,7 +7868,7 @@ void ftrace_release_mod(struct module *mod) struct ftrace_page *tmp_page = NULL; struct ftrace_page *pg; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); /* * To avoid the UAF problem after the module is unloaded, the @@ -7938,7 +7938,7 @@ void ftrace_module_enable(struct module *mod) struct dyn_ftrace *rec; struct ftrace_page *pg; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); if (ftrace_disabled) goto out_unlock; @@ -8267,7 +8267,7 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr) key.ip = start; key.flags = end; /* overload flags, as it is unsigned long */ - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); /* * If we are freeing module init memory, then check if @@ -8686,7 +8686,7 @@ static void clear_ftrace_pids(struct trace_array *tr, int type) void ftrace_clear_pids(struct trace_array *tr) { - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); clear_ftrace_pids(tr, TRACE_PIDS | TRACE_NO_PIDS); @@ -8695,7 +8695,7 @@ void ftrace_clear_pids(struct trace_array *tr) static void ftrace_pid_reset(struct trace_array *tr, int type) { - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); clear_ftrace_pids(tr, type); ftrace_update_pid_func(); @@ -8713,7 +8713,7 @@ static void *fpid_start(struct seq_file *m, loff_t *pos) struct trace_pid_list *pid_list; struct trace_array *tr = m->private; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); rcu_read_lock_sched(); pid_list = rcu_dereference_sched(tr->function_pids); @@ -8766,7 +8766,7 @@ static void *fnpid_start(struct seq_file *m, loff_t *pos) struct trace_pid_list *pid_list; struct trace_array *tr = m->private; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); rcu_read_lock_sched(); pid_list = rcu_dereference_sched(tr->function_no_pids); @@ -9057,7 +9057,7 @@ static int prepare_direct_functions_for_ipmodify(struct ftrace_ops *ops) unsigned long ip = entry->ip; bool found_op = false; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); do_for_each_ftrace_op(op, ftrace_ops_list) { if (!(op->flags & FTRACE_OPS_FL_DIRECT)) continue; @@ -9106,7 +9106,7 @@ static void cleanup_direct_functions_after_ipmodify(struct ftrace_ops *ops) unsigned long ip = entry->ip; bool found_op = false; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); do_for_each_ftrace_op(op, ftrace_ops_list) { if (!(op->flags & FTRACE_OPS_FL_DIRECT)) continue; @@ -9153,7 +9153,7 @@ static int register_ftrace_function_nolock(struct ftrace_ops *ops) ftrace_ops_init(ops); - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); ret = ftrace_startup(ops, 0); @@ -9200,7 +9200,7 @@ int unregister_ftrace_function(struct ftrace_ops *ops) { int ret; - mutex_lock(&ftrace_lock); + mutex_lock_nospin(&ftrace_lock); ret = ftrace_shutdown(ops, 0); mutex_unlock(&ftrace_lock); -- 2.47.3 ^ permalink raw reply related [flat|nested] 29+ messages in thread
end of thread, other threads:[~2026-03-09 2:34 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-04 7:46 [RFC PATCH 0/2] disable optimistic spinning for ftrace_lock Yafang Shao 2026-03-04 7:46 ` [RFC PATCH 1/2] locking: add mutex_lock_nospin() Yafang Shao 2026-03-04 9:02 ` Peter Zijlstra 2026-03-04 9:37 ` Yafang Shao 2026-03-04 10:11 ` Peter Zijlstra 2026-03-04 11:52 ` Yafang Shao 2026-03-04 12:41 ` Peter Zijlstra 2026-03-04 14:25 ` Yafang Shao 2026-03-04 9:54 ` David Laight 2026-03-04 20:57 ` Steven Rostedt 2026-03-04 21:44 ` David Laight 2026-03-05 2:17 ` Yafang Shao 2026-03-05 2:28 ` Steven Rostedt 2026-03-05 2:33 ` Yafang Shao 2026-03-05 3:00 ` Steven Rostedt 2026-03-05 3:08 ` Yafang Shao 2026-03-05 4:30 ` Waiman Long 2026-03-05 5:40 ` Yafang Shao 2026-03-05 13:21 ` Steven Rostedt 2026-03-06 2:22 ` Yafang Shao 2026-03-06 10:00 ` David Laight 2026-03-09 2:34 ` Yafang Shao 2026-03-05 18:34 ` Waiman Long 2026-03-05 18:44 ` Waiman Long 2026-03-06 2:27 ` Yafang Shao 2026-03-05 9:32 ` David Laight 2026-03-05 19:00 ` Waiman Long 2026-03-06 2:33 ` Yafang Shao 2026-03-04 7:46 ` [RFC PATCH 2/2] ftrace: disable optimistic spinning for ftrace_lock Yafang Shao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox