From: Dmitry Ilvokhin <d@ilvokhin.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
Boqun Feng <boqun@kernel.org>, Waiman Long <longman@redhat.com>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Juergen Gross <jgross@suse.com>,
Ajay Kaher <ajay.kaher@broadcom.com>,
Alexey Makhalov <alexey.makhalov@broadcom.com>,
Broadcom internal kernel review list
<bcm-kernel-feedback-list@broadcom.com>,
Thomas Gleixner <tglx@kernel.org>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Arnd Bergmann <arnd@arndb.de>, Dennis Zhou <dennis@kernel.org>,
Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@gentwo.org>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
virtualization@lists.linux.dev, linux-arch@vger.kernel.org,
linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
kernel-team@meta.com, "Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH v6 5/7] locking: Add contended_release tracepoint to qspinlock
Date: Wed, 3 Jun 2026 14:17:27 +0000 [thread overview]
Message-ID: <aiA3dzjKVq2_cpSY@shell.ilvokhin.com> (raw)
In-Reply-To: <20260603120811.GW3493090@noisy.programming.kicks-ass.net>
On Wed, Jun 03, 2026 at 02:08:11PM +0200, Peter Zijlstra wrote:
> On Thu, May 14, 2026 at 12:34:55PM +0000, Dmitry Ilvokhin wrote:
>
> > Baseline, in best case scenario of least number of executed
> > instructions.
> >
> > 3e0: endbr64 ; 4 bytes (always executed)
> > 3e4: movb $0x0,(%rdi) ; 3 bytes (unlock,
> > ; always executed)
> > 3e7: decl %gs:__preempt_count ; 7 bytes (always executed)
> > 3ee: je 3f5 ; 2 bytes (always executed)
> > 3f0: jmp __x86_return_thunk ; 5 bytes (executed if above
> > ; je is not taken)
> > ; rest is not executed
> > 3f5: call __SCT__preempt_schedule ; 5 bytes
> > 3fa: jmp __x86_return_thunk ; 5 bytes
> >
> > Tracepoint (again same case of least number of executed instructions).
> >
> > bc0: endbr64 ; 4 bytes (always executed)
> > bc4: xchg %ax,%ax ; 2 bytes (always executed, this is an
> > ; only addition on the execution path).
> > bc6: movb $0x0,(%rdi) ; 3 bytes (unlock, always executed)
> > bc9: decl %gs:__preempt_count ; 7 bytes (always executed)
> > bd0: je bde ; 2 bytes (always executed)
> > bd2: jmp __x86_return_thunk ; 5 bytes (executed if above
> > ; je is not taken)
> > ; rest is not executed
> > bd7: call queued_spin_release_traced ; 5 bytes
> > bdc: jmp bc9 ; 2 bytes
> > bde: call __SCT__preempt_schedule ; 5 bytes
> > be3: jmp __x86_return_thunk ; 5 bytes
> >
>
> So I've been playing with this a bit, and it is all really sad.
>
> Now, since pretty much everybody+dog will have PARAVIRT_SPINLOCK=y, the
> 'best' solution would be changing that paravirt call with a
> static_call(), that actually shrinks the code by 1 byte.
>
> And then this tracepoint nonsense can simply use a different unlock
> function, just like paravirt.
>
> 0000 00000000000001d0 <_raw_spin_unlock>:
> 0000 1d0: f3 0f 1e fa endbr64
> 0004 1d4: ff 15 00 00 00 00 call *0x0(%rip) # 1da <_raw_spin_unlock+0xa> 1d6: R_X86_64_PC32 pv_ops_lock+0x4
> 000a 1da: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # 1e1 <_raw_spin_unlock+0x11> 1dd: R_X86_64_PC32 __preempt_count-0x4
> 0011 1e1: 74 06 je 1e9 <_raw_spin_unlock+0x19>
> 0013 1e3: 2e e9 00 00 00 00 cs jmp 1e9 <_raw_spin_unlock+0x19> 1e5: R_X86_64_PLT32 __x86_return_thunk-0x4
> 0019 1e9: e8 00 00 00 00 call 1ee <_raw_spin_unlock+0x1e> 1ea: R_X86_64_PLT32 __SCT__preempt_schedule-0x4
> 001e 1ee: 2e e9 00 00 00 00 cs jmp 1f4 <_raw_spin_unlock+0x24> 1f0: R_X86_64_PLT32 __x86_return_thunk-0x4
>
>
> 0000 00000000000001d0 <_raw_spin_unlock>:
> 0000 1d0: f3 0f 1e fa endbr64
> 0004 1d4: e8 00 00 00 00 call 1d9 <_raw_spin_unlock+0x9> 1d5: R_X86_64_PLT32 __SCT__queued_spin_unlock-0x4
> 0009 1d9: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # 1e0 <_raw_spin_unlock+0x10> 1dc: R_X86_64_PC32 __preempt_count-0x4
> 0010 1e0: 74 06 je 1e8 <_raw_spin_unlock+0x18>
> 0012 1e2: 2e e9 00 00 00 00 cs jmp 1e8 <_raw_spin_unlock+0x18> 1e4: R_X86_64_PLT32 __x86_return_thunk-0x4
> 0018 1e8: e8 00 00 00 00 call 1ed <_raw_spin_unlock+0x1d> 1e9: R_X86_64_PLT32 __SCT__preempt_schedule-0x4
> 001d 1ed: 2e e9 00 00 00 00 cs jmp 1f3 <_raw_spin_unlock+0x23> 1ef: R_X86_64_PLT32 __x86_return_thunk-0x4
>
>
> Something a little like so, which is completely untested, except to
> build kernel/locking/spinlock.o (with clang-23).
Thanks a lot for taking a look, Peter.
I like the static_call idea. It's truly zero cost on x86 (and, as you
note, even a byte smaller). The one caveat is that it relies on
HAVE_STATIC_CALL_INLINE to stay free.
So my plan would be: static_call where HAVE_STATIC_CALL_INLINE is
available (x86), and a static branch fallback elsewhere, gated behind a
default-off config so it imposes nothing on arches/kernels that don't
opt in. I'm mostly interested in x86, but would like arm64 to work too,
which would use the fallback.
Concretely:
1. Split the sleepable-lock patches out and send them separately.
They're independent of the static call work and look far less
controversial.
2. Convert the paravirt spinlock unlock to a static_call, as the
foundation for the unlock tracepoint. I'm happy to take a stab at it.
Let me know if you'd rather do it yourself.
3. Build the unlock tracepoint on top: static_call where it's cheap,
config-gated static_branch fallback where it isn't.
Does this plan sound reasonable to you?
>
> Also, I think someone should go do some performance runs with
> ARCH_INLINE_SPIN_* set for x86 just like for s390.
That's a good point, I'll run benchmarks and report back with the
results.
next prev parent reply other threads:[~2026-06-03 14:17 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 17:09 [PATCH v6 0/7] locking: contended_release tracepoint instrumentation Dmitry Ilvokhin
2026-05-05 17:09 ` [PATCH v6 1/7] tracing/lock: Remove unnecessary linux/sched.h include Dmitry Ilvokhin
2026-05-05 17:09 ` [PATCH v6 2/7] locking/percpu-rwsem: Extract __percpu_up_read() Dmitry Ilvokhin
2026-05-05 17:09 ` [PATCH v6 3/7] locking: Add contended_release tracepoint to sleepable locks Dmitry Ilvokhin
2026-05-05 17:09 ` [PATCH v6 4/7] locking: Factor out queued_spin_release() Dmitry Ilvokhin
2026-05-13 15:37 ` Steven Rostedt
2026-05-05 17:09 ` [PATCH v6 5/7] locking: Add contended_release tracepoint to qspinlock Dmitry Ilvokhin
2026-05-13 15:41 ` Steven Rostedt
2026-05-14 14:13 ` Dmitry Ilvokhin
2026-05-14 16:03 ` Steven Rostedt
2026-05-15 14:40 ` Dmitry Ilvokhin
2026-05-13 19:33 ` Peter Zijlstra
2026-05-14 12:34 ` Dmitry Ilvokhin
2026-05-27 13:30 ` Dmitry Ilvokhin
2026-06-03 12:08 ` Peter Zijlstra
2026-06-03 14:17 ` Dmitry Ilvokhin [this message]
2026-06-03 14:26 ` Peter Zijlstra
2026-06-11 7:17 ` Dmitry Ilvokhin
2026-06-11 13:44 ` Peter Zijlstra
2026-05-05 17:09 ` [PATCH v6 6/7] locking: Factor out __queued_read_unlock()/__queued_write_unlock() Dmitry Ilvokhin
2026-05-13 15:41 ` Steven Rostedt
2026-05-05 17:09 ` [PATCH v6 7/7] locking: Add contended_release tracepoint to qrwlock Dmitry Ilvokhin
2026-05-13 15:43 ` Steven Rostedt
2026-05-13 19:26 ` [PATCH v6 0/7] locking: contended_release tracepoint instrumentation Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiA3dzjKVq2_cpSY@shell.ilvokhin.com \
--to=d@ilvokhin.com \
--cc=ajay.kaher@broadcom.com \
--cc=alexey.makhalov@broadcom.com \
--cc=arnd@arndb.de \
--cc=bcm-kernel-feedback-list@broadcom.com \
--cc=boqun@kernel.org \
--cc=bp@alien8.de \
--cc=cl@gentwo.org \
--cc=dave.hansen@linux.intel.com \
--cc=dennis@kernel.org \
--cc=hpa@zytor.com \
--cc=jgross@suse.com \
--cc=kernel-team@meta.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@kernel.org \
--cc=tj@kernel.org \
--cc=tsbogend@alpha.franken.de \
--cc=virtualization@lists.linux.dev \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.