From: Peter Zijlstra <peterz@infradead.org>
To: Dmitry Ilvokhin <d@ilvokhin.com>
Cc: Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
Boqun Feng <boqun@kernel.org>, Waiman Long <longman@redhat.com>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Juergen Gross <jgross@suse.com>,
Ajay Kaher <ajay.kaher@broadcom.com>,
Alexey Makhalov <alexey.makhalov@broadcom.com>,
Broadcom internal kernel review list
<bcm-kernel-feedback-list@broadcom.com>,
Thomas Gleixner <tglx@kernel.org>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Arnd Bergmann <arnd@arndb.de>, Dennis Zhou <dennis@kernel.org>,
Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@gentwo.org>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
virtualization@lists.linux.dev, linux-arch@vger.kernel.org,
linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
kernel-team@meta.com, "Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH v6 5/7] locking: Add contended_release tracepoint to qspinlock
Date: Wed, 3 Jun 2026 14:08:11 +0200 [thread overview]
Message-ID: <20260603120811.GW3493090@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <agXBb0ga_6HJrrnm@shell.ilvokhin.com>
On Thu, May 14, 2026 at 12:34:55PM +0000, Dmitry Ilvokhin wrote:
> Baseline, in best case scenario of least number of executed
> instructions.
>
> 3e0: endbr64 ; 4 bytes (always executed)
> 3e4: movb $0x0,(%rdi) ; 3 bytes (unlock,
> ; always executed)
> 3e7: decl %gs:__preempt_count ; 7 bytes (always executed)
> 3ee: je 3f5 ; 2 bytes (always executed)
> 3f0: jmp __x86_return_thunk ; 5 bytes (executed if above
> ; je is not taken)
> ; rest is not executed
> 3f5: call __SCT__preempt_schedule ; 5 bytes
> 3fa: jmp __x86_return_thunk ; 5 bytes
>
> Tracepoint (again same case of least number of executed instructions).
>
> bc0: endbr64 ; 4 bytes (always executed)
> bc4: xchg %ax,%ax ; 2 bytes (always executed, this is an
> ; only addition on the execution path).
> bc6: movb $0x0,(%rdi) ; 3 bytes (unlock, always executed)
> bc9: decl %gs:__preempt_count ; 7 bytes (always executed)
> bd0: je bde ; 2 bytes (always executed)
> bd2: jmp __x86_return_thunk ; 5 bytes (executed if above
> ; je is not taken)
> ; rest is not executed
> bd7: call queued_spin_release_traced ; 5 bytes
> bdc: jmp bc9 ; 2 bytes
> bde: call __SCT__preempt_schedule ; 5 bytes
> be3: jmp __x86_return_thunk ; 5 bytes
>
So I've been playing with this a bit, and it is all really sad.
Now, since pretty much everybody+dog will have PARAVIRT_SPINLOCK=y, the
'best' solution would be changing that paravirt call with a
static_call(), that actually shrinks the code by 1 byte.
And then this tracepoint nonsense can simply use a different unlock
function, just like paravirt.
0000 00000000000001d0 <_raw_spin_unlock>:
0000 1d0: f3 0f 1e fa endbr64
0004 1d4: ff 15 00 00 00 00 call *0x0(%rip) # 1da <_raw_spin_unlock+0xa> 1d6: R_X86_64_PC32 pv_ops_lock+0x4
000a 1da: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # 1e1 <_raw_spin_unlock+0x11> 1dd: R_X86_64_PC32 __preempt_count-0x4
0011 1e1: 74 06 je 1e9 <_raw_spin_unlock+0x19>
0013 1e3: 2e e9 00 00 00 00 cs jmp 1e9 <_raw_spin_unlock+0x19> 1e5: R_X86_64_PLT32 __x86_return_thunk-0x4
0019 1e9: e8 00 00 00 00 call 1ee <_raw_spin_unlock+0x1e> 1ea: R_X86_64_PLT32 __SCT__preempt_schedule-0x4
001e 1ee: 2e e9 00 00 00 00 cs jmp 1f4 <_raw_spin_unlock+0x24> 1f0: R_X86_64_PLT32 __x86_return_thunk-0x4
0000 00000000000001d0 <_raw_spin_unlock>:
0000 1d0: f3 0f 1e fa endbr64
0004 1d4: e8 00 00 00 00 call 1d9 <_raw_spin_unlock+0x9> 1d5: R_X86_64_PLT32 __SCT__queued_spin_unlock-0x4
0009 1d9: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # 1e0 <_raw_spin_unlock+0x10> 1dc: R_X86_64_PC32 __preempt_count-0x4
0010 1e0: 74 06 je 1e8 <_raw_spin_unlock+0x18>
0012 1e2: 2e e9 00 00 00 00 cs jmp 1e8 <_raw_spin_unlock+0x18> 1e4: R_X86_64_PLT32 __x86_return_thunk-0x4
0018 1e8: e8 00 00 00 00 call 1ed <_raw_spin_unlock+0x1d> 1e9: R_X86_64_PLT32 __SCT__preempt_schedule-0x4
001d 1ed: 2e e9 00 00 00 00 cs jmp 1f3 <_raw_spin_unlock+0x23> 1ef: R_X86_64_PLT32 __x86_return_thunk-0x4
Something a little like so, which is completely untested, except to
build kernel/locking/spinlock.o (with clang-23).
Also, I think someone should go do some performance runs with
ARCH_INLINE_SPIN_* set for x86 just like for s390.
Of course, this is only x86 done, and it doesn't nicely tie in with
tracepoints, but it does give the sanest asm.
---
diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
index 210b494e4de0..6b4bdea18218 100644
--- a/arch/x86/hyperv/hv_spinlock.c
+++ b/arch/x86/hyperv/hv_spinlock.c
@@ -78,8 +78,8 @@ void __init hv_init_spinlocks(void)
pr_info("PV spinlocks enabled\n");
__pv_init_lock_hash();
- pv_ops_lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
- pv_ops_lock.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
+ static_call_update(queued_spin_lock_slowpath, __pv_queued_spin_lock_slowpath);
+ static_call_update(queued_spin_unlock, __raw_callee_save___pv_queued_spin_unlock);
pv_ops_lock.wait = hv_qlock_wait;
pv_ops_lock.kick = hv_qlock_kick;
pv_ops_lock.vcpu_is_preempted = PV_CALLEE_SAVE(hv_vcpu_is_preempted);
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1d506e5d6f46..52e7ce10df57 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -225,7 +225,6 @@
#define X86_FEATURE_EPT_AD ( 8*32+17) /* "ept_ad" Intel Extended Page Table access-dirty bit */
#define X86_FEATURE_VMCALL ( 8*32+18) /* Hypervisor supports the VMCALL instruction */
#define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* VMware prefers VMMCALL hypercall instruction */
-#define X86_FEATURE_PVUNLOCK ( 8*32+20) /* PV unlock function */
#define X86_FEATURE_VCPUPREEMPT ( 8*32+21) /* PV vcpu_is_preempted function */
#define X86_FEATURE_TDX_GUEST ( 8*32+22) /* "tdx_guest" Intel Trust Domain Extensions Guest */
diff --git a/arch/x86/include/asm/paravirt-spinlock.h b/arch/x86/include/asm/paravirt-spinlock.h
index 7beffcb08ed6..553910f92906 100644
--- a/arch/x86/include/asm/paravirt-spinlock.h
+++ b/arch/x86/include/asm/paravirt-spinlock.h
@@ -3,6 +3,7 @@
#define _ASM_X86_PARAVIRT_SPINLOCK_H
#include <asm/paravirt_types.h>
+#include <linux/static_call_types.h>
#ifdef CONFIG_SMP
#include <asm/spinlock_types.h>
@@ -11,9 +12,6 @@
struct qspinlock;
struct pv_lock_ops {
- void (*queued_spin_lock_slowpath)(struct qspinlock *lock, u32 val);
- struct paravirt_callee_save queued_spin_unlock;
-
void (*wait)(u8 *ptr, u8 val);
void (*kick)(int cpu);
@@ -26,20 +24,23 @@ extern struct pv_lock_ops pv_ops_lock;
extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
extern void __pv_init_lock_hash(void);
extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __raw_callee_save___native_queued_spin_unlock(struct qspinlock *lock);
extern void __raw_callee_save___pv_queued_spin_unlock(struct qspinlock *lock);
extern bool nopvspin;
+DECLARE_STATIC_CALL(queued_spin_lock_slowpath, native_queued_spin_lock_slowpath);
+DECLARE_STATIC_CALL(queued_spin_unlock, __raw_callee_save___native_queued_spin_unlock);
+
static __always_inline void pv_queued_spin_lock_slowpath(struct qspinlock *lock,
u32 val)
{
- PVOP_VCALL2(pv_ops_lock, queued_spin_lock_slowpath, lock, val);
+ static_call(queued_spin_lock_slowpath)(lock, val);
}
static __always_inline void pv_queued_spin_unlock(struct qspinlock *lock)
{
- PVOP_ALT_VCALLEE1(pv_ops_lock, queued_spin_unlock, lock,
- "movb $0, (%%" _ASM_ARG1 ")",
- ALT_NOT(X86_FEATURE_PVUNLOCK));
+ __STATIC_CALL_MOD_ADDRESSABLE(queued_spin_unlock);
+ asm volatile ("call " STATIC_CALL_TRAMP_STR(queued_spin_unlock) : ASM_CALL_CONSTRAINT);
}
static __always_inline bool pv_vcpu_is_preempted(long cpu)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 29226d112029..5908d9fd94bb 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -1135,9 +1135,8 @@ void __init kvm_spinlock_init(void)
pr_info("PV spinlocks enabled\n");
__pv_init_lock_hash();
- pv_ops_lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
- pv_ops_lock.queued_spin_unlock =
- PV_CALLEE_SAVE(__pv_queued_spin_unlock);
+ static_call_update(queued_spin_lock_slowpath, __pv_queued_spin_lock_slowpath);
+ static_call_update(queued_spin_unlock, __raw_callee_save___pv_queued_spin_unlock);
pv_ops_lock.wait = kvm_wait;
pv_ops_lock.kick = kvm_kick_cpu;
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index 95452444868f..28407304f123 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -25,9 +25,12 @@ __visible void __native_queued_spin_unlock(struct qspinlock *lock)
}
PV_CALLEE_SAVE_REGS_THUNK(__native_queued_spin_unlock);
+DEFINE_STATIC_CALL(queued_spin_lock_slowpath, native_queued_spin_lock_slowpath);
+DEFINE_STATIC_CALL(queued_spin_unlock, __raw_callee_save___native_queued_spin_unlock);
+
bool pv_is_native_spin_unlock(void)
{
- return pv_ops_lock.queued_spin_unlock.func ==
+ return static_call_query(queued_spin_unlock) ==
__raw_callee_save___native_queued_spin_unlock;
}
@@ -45,16 +48,11 @@ bool pv_is_native_vcpu_is_preempted(void)
void __init paravirt_set_cap(void)
{
- if (!pv_is_native_spin_unlock())
- setup_force_cpu_cap(X86_FEATURE_PVUNLOCK);
-
if (!pv_is_native_vcpu_is_preempted())
setup_force_cpu_cap(X86_FEATURE_VCPUPREEMPT);
}
struct pv_lock_ops pv_ops_lock = {
- .queued_spin_lock_slowpath = native_queued_spin_lock_slowpath,
- .queued_spin_unlock = PV_CALLEE_SAVE(__native_queued_spin_unlock),
.wait = paravirt_nop,
.kick = paravirt_nop,
.vcpu_is_preempted = PV_CALLEE_SAVE(__native_vcpu_is_preempted),
diff --git a/arch/x86/kernel/static_call.c b/arch/x86/kernel/static_call.c
index 61592e41a6b1..1268b7dc93b1 100644
--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -3,6 +3,7 @@
#include <linux/memory.h>
#include <linux/bug.h>
#include <asm/text-patching.h>
+#include <asm/paravirt-spinlock.h>
enum insn_type {
CALL = 0, /* site call */
@@ -31,6 +32,15 @@ static const u8 retinsn[] = { RET_INSN_OPCODE, 0xcc, 0xcc, 0xcc, 0xcc };
*/
static const u8 warninsn[] = { 0x67, 0x48, 0x0f, 0xb9, 0x3a };
+/*
+ * ds ds movb $0, (_ASM_ARG1)
+ */
+#ifdef CONFIG_64BIT
+static const u8 unlockinsn[] = { 0x3e, 0x3e, 0xc6, 0x07, 0x00 };
+#else
+static const u8 unlockinsn[] = { 0x3e, 0x3e, 0xc6, 0x00, 0x00 };
+#endif
+
static u8 __is_Jcc(u8 *insn) /* Jcc.d32 */
{
u8 ret = 0;
@@ -78,6 +88,10 @@ static void __ref __static_call_transform(void *insn, enum insn_type type,
emulate = code;
code = &warninsn;
}
+ if (func == &__raw_callee_save___native_queued_spin_unlock) {
+ emulate = code;
+ code = &unlockinsn;
+ }
break;
case NOP:
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 83ac24ead289..f718e535ea7c 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -134,9 +134,8 @@ void __init xen_init_spinlocks(void)
printk(KERN_DEBUG "xen: PV spinlocks enabled\n");
__pv_init_lock_hash();
- pv_ops_lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
- pv_ops_lock.queued_spin_unlock =
- PV_CALLEE_SAVE(__pv_queued_spin_unlock);
+ static_call_update(queued_spin_lock_slowpath, __pv_queued_spin_lock_slowpath);
+ static_call_update(queued_spin_unlock, __raw_callee_save___pv_queued_spin_unlock);
pv_ops_lock.wait = xen_qlock_wait;
pv_ops_lock.kick = xen_qlock_kick;
pv_ops_lock.vcpu_is_preempted = PV_CALLEE_SAVE(xen_vcpu_stolen);
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index 86d17b195e79..61541f042f74 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -225,7 +225,6 @@
#define X86_FEATURE_EPT_AD ( 8*32+17) /* "ept_ad" Intel Extended Page Table access-dirty bit */
#define X86_FEATURE_VMCALL ( 8*32+18) /* Hypervisor supports the VMCALL instruction */
#define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* VMware prefers VMMCALL hypercall instruction */
-#define X86_FEATURE_PVUNLOCK ( 8*32+20) /* PV unlock function */
#define X86_FEATURE_VCPUPREEMPT ( 8*32+21) /* PV vcpu_is_preempted function */
#define X86_FEATURE_TDX_GUEST ( 8*32+22) /* "tdx_guest" Intel Trust Domain Extensions Guest */
next prev parent reply other threads:[~2026-06-03 12:08 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 17:09 [PATCH v6 0/7] locking: contended_release tracepoint instrumentation Dmitry Ilvokhin
2026-05-05 17:09 ` [PATCH v6 1/7] tracing/lock: Remove unnecessary linux/sched.h include Dmitry Ilvokhin
2026-05-05 17:09 ` [PATCH v6 2/7] locking/percpu-rwsem: Extract __percpu_up_read() Dmitry Ilvokhin
2026-05-05 17:09 ` [PATCH v6 3/7] locking: Add contended_release tracepoint to sleepable locks Dmitry Ilvokhin
2026-05-05 17:09 ` [PATCH v6 4/7] locking: Factor out queued_spin_release() Dmitry Ilvokhin
2026-05-13 15:37 ` Steven Rostedt
2026-05-05 17:09 ` [PATCH v6 5/7] locking: Add contended_release tracepoint to qspinlock Dmitry Ilvokhin
2026-05-13 15:41 ` Steven Rostedt
2026-05-14 14:13 ` Dmitry Ilvokhin
2026-05-14 16:03 ` Steven Rostedt
2026-05-15 14:40 ` Dmitry Ilvokhin
2026-05-13 19:33 ` Peter Zijlstra
2026-05-14 12:34 ` Dmitry Ilvokhin
2026-05-27 13:30 ` Dmitry Ilvokhin
2026-06-03 12:08 ` Peter Zijlstra [this message]
2026-06-03 14:17 ` Dmitry Ilvokhin
2026-06-03 14:26 ` Peter Zijlstra
2026-05-05 17:09 ` [PATCH v6 6/7] locking: Factor out __queued_read_unlock()/__queued_write_unlock() Dmitry Ilvokhin
2026-05-13 15:41 ` Steven Rostedt
2026-05-05 17:09 ` [PATCH v6 7/7] locking: Add contended_release tracepoint to qrwlock Dmitry Ilvokhin
2026-05-13 15:43 ` Steven Rostedt
2026-05-13 19:26 ` [PATCH v6 0/7] locking: contended_release tracepoint instrumentation Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260603120811.GW3493090@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=ajay.kaher@broadcom.com \
--cc=alexey.makhalov@broadcom.com \
--cc=arnd@arndb.de \
--cc=bcm-kernel-feedback-list@broadcom.com \
--cc=boqun@kernel.org \
--cc=bp@alien8.de \
--cc=cl@gentwo.org \
--cc=d@ilvokhin.com \
--cc=dave.hansen@linux.intel.com \
--cc=dennis@kernel.org \
--cc=hpa@zytor.com \
--cc=jgross@suse.com \
--cc=kernel-team@meta.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@kernel.org \
--cc=tj@kernel.org \
--cc=tsbogend@alpha.franken.de \
--cc=virtualization@lists.linux.dev \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox