From: Yafang Shao <laoar.shao@gmail.com>
To: peterz@infradead.org, mingo@redhat.com, will@kernel.org,
boqun@kernel.org, longman@redhat.com, rostedt@goodmis.org,
mhiramat@kernel.org, mark.rutland@arm.com,
mathieu.desnoyers@efficios.com
Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
bpf@vger.kernel.org, Yafang Shao <laoar.shao@gmail.com>
Subject: [RFC PATCH 0/2] disable optimistic spinning for ftrace_lock
Date: Wed, 4 Mar 2026 15:46:48 +0800 [thread overview]
Message-ID: <20260304074650.58165-1-laoar.shao@gmail.com> (raw)
Background
==========
One of our latency-sensitive services reported random CPU %sys spikes.
After a thorough investigation, we finally identified the root cause of the
CPU %sys spikes. The key kernel stacks are as follows:
- Task A
2026-02-14-16:53:40.938243: [CPU198] 2156302(bpftrace) cgrp:4019437 pod:4019253
find_kallsyms_symbol+142
module_address_lookup+104
kallsyms_lookup_buildid+203
kallsyms_lookup+20
print_rec+64
t_show+67
seq_read_iter+709
seq_read+165
vfs_read+165
ksys_read+103
__x64_sys_read+25
do_syscall_64+56
entry_SYSCALL_64_after_hwframe+100
This task (2156302, bpftrace) is reading the
/sys/kernel/debug/tracing/available_filter_functions to check if a function
is traceable:
https://github.com/bpftrace/bpftrace/blob/master/src/tracefs/tracefs.h#L21
Reading the available_filter_functions file is time-consuming, as it
contains tens of thousands of functions:
$ cat /sys/kernel/debug/tracing/available_filter_functions | wc -l
61081
$ time cat /sys/kernel/debug/tracing/available_filter_functions > /dev/null
real 0m0.452s
user 0m0.000s
sys 0m0.452s
Consequently, the ftrace_lock is held by this task for an extended period.
- Other Tasks
2026-02-14-16:53:41.437094: [CPU79] 2156308(bpftrace) cgrp:4019437 pod:4019253
mutex_spin_on_owner+108
__mutex_lock.constprop.0+1132
__mutex_lock_slowpath+19
mutex_lock+56
t_start+51
seq_read_iter+250
seq_read+165
vfs_read+165
ksys_read+103
__x64_sys_read+25
do_syscall_64+56
entry_SYSCALL_64_after_hwframe+100
Since ftrace_lock is held by Task-A and Task-A is actively running on a
CPU, all other tasks waiting for the same lock will spin on their
respective CPUs. This leads to increased CPU pressure.
Reproduction
============
This issue can be reproduced simply by running
`cat available_filter_functions`.
- Single process reading available_filter_functions:
$ time cat /sys/kernel/tracing/available_filter_functions > /dev/null
real 0m0.452s
user 0m0.001s
sys 0m0.451s
- Six processes reading available_filter_functions simultaneously:
for i in `seq 0 5`; do
time cat /sys/kernel/tracing/available_filter_functions > /dev/null &
done
The results are as follows:
real 0m1.801s
user 0m0.000s
sys 0m1.779s
real 0m1.804s
user 0m0.001s
sys 0m1.791s
real 0m1.805s
user 0m0.000s
sys 0m1.792s
real 0m1.806s
user 0m0.001s
sys 0m1.796s
As more processes are added, the system time increases correspondingly.
Solution
========
One approach is to optimize the reading of available_filter_functions to
make it as fast as possible. However, the risk lies in the contention
caused by optimistic spin locking.
Therefore, we need to consider an alternative solution that avoids
optimistic spinning for heavy mutexes that may be held for long durations.
Note that we do not want to disable CONFIG_MUTEX_SPIN_ON_OWNER entirely, as
that could lead to unexpected performance regressions.
In this patch, a new wrapper mutex_lock_nospin() is used for ftrace_lock to
selectively disable optimistic spinning.
Yafang Shao (2):
locking: add mutex_lock_nospin()
ftrace: disable optimistic spinning for ftrace_lock
include/linux/mutex.h | 3 +++
kernel/locking/mutex.c | 39 +++++++++++++++++++++++++------
kernel/trace/ftrace.c | 52 +++++++++++++++++++++---------------------
3 files changed, 61 insertions(+), 33 deletions(-)
--
2.47.3
next reply other threads:[~2026-03-04 7:47 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 7:46 Yafang Shao [this message]
2026-03-04 7:46 ` [RFC PATCH 1/2] locking: add mutex_lock_nospin() Yafang Shao
2026-03-04 9:02 ` Peter Zijlstra
2026-03-04 9:37 ` Yafang Shao
2026-03-04 10:11 ` Peter Zijlstra
2026-03-04 11:52 ` Yafang Shao
2026-03-04 12:41 ` Peter Zijlstra
2026-03-04 14:25 ` Yafang Shao
2026-03-04 9:54 ` David Laight
2026-03-04 20:57 ` Steven Rostedt
2026-03-04 21:44 ` David Laight
2026-03-05 2:17 ` Yafang Shao
2026-03-05 2:28 ` Steven Rostedt
2026-03-05 2:33 ` Yafang Shao
2026-03-05 3:00 ` Steven Rostedt
2026-03-05 3:08 ` Yafang Shao
2026-03-05 4:30 ` Waiman Long
2026-03-05 5:40 ` Yafang Shao
2026-03-05 13:21 ` Steven Rostedt
2026-03-06 2:22 ` Yafang Shao
2026-03-06 10:00 ` David Laight
2026-03-09 2:34 ` Yafang Shao
2026-03-05 18:34 ` Waiman Long
2026-03-05 18:44 ` Waiman Long
2026-03-06 2:27 ` Yafang Shao
2026-03-05 9:32 ` David Laight
2026-03-05 19:00 ` Waiman Long
2026-03-06 2:33 ` Yafang Shao
2026-03-04 7:46 ` [RFC PATCH 2/2] ftrace: disable optimistic spinning for ftrace_lock Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260304074650.58165-1-laoar.shao@gmail.com \
--to=laoar.shao@gmail.com \
--cc=boqun@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mark.rutland@arm.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox