Linux virtualization list
 help / color / mirror / Atom feed
From: Dmitry Ilvokhin <d@ilvokhin.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	Boqun Feng <boqun@kernel.org>, Waiman Long <longman@redhat.com>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Juergen Gross <jgross@suse.com>,
	Ajay Kaher <ajay.kaher@broadcom.com>,
	Alexey Makhalov <alexey.makhalov@broadcom.com>,
	Broadcom internal kernel review list
	<bcm-kernel-feedback-list@broadcom.com>,
	Thomas Gleixner <tglx@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Arnd Bergmann <arnd@arndb.de>, Dennis Zhou <dennis@kernel.org>,
	Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@gentwo.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
	virtualization@lists.linux.dev, linux-arch@vger.kernel.org,
	linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
	kernel-team@meta.com, "Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH v6 5/7] locking: Add contended_release tracepoint to qspinlock
Date: Wed, 3 Jun 2026 14:17:27 +0000	[thread overview]
Message-ID: <aiA3dzjKVq2_cpSY@shell.ilvokhin.com> (raw)
In-Reply-To: <20260603120811.GW3493090@noisy.programming.kicks-ass.net>

On Wed, Jun 03, 2026 at 02:08:11PM +0200, Peter Zijlstra wrote:
> On Thu, May 14, 2026 at 12:34:55PM +0000, Dmitry Ilvokhin wrote:
> 
> > Baseline, in best case scenario of least number of executed
> > instructions.
> > 
> >     3e0:  endbr64                          ; 4 bytes (always executed)
> >     3e4:  movb $0x0,(%rdi)                 ; 3 bytes (unlock,
> >                                            ; always executed)
> >     3e7:  decl %gs:__preempt_count         ; 7 bytes (always executed)
> >     3ee:  je   3f5                         ; 2 bytes (always executed)
> >     3f0:  jmp  __x86_return_thunk          ; 5 bytes (executed if above
> >                                            ; je is not taken)
> >                                            ; rest is not executed
> >     3f5:  call __SCT__preempt_schedule     ; 5 bytes
> >     3fa:  jmp  __x86_return_thunk          ; 5 bytes
> > 
> > Tracepoint (again same case of least number of executed instructions).
> > 
> >     bc0:  endbr64                          ; 4 bytes (always executed)
> >     bc4:  xchg %ax,%ax                     ; 2 bytes (always executed, this is an
> >                                            ; only addition on the execution path).
> >     bc6:  movb $0x0,(%rdi)                 ; 3 bytes (unlock, always executed)
> >     bc9:  decl %gs:__preempt_count         ; 7 bytes (always executed)
> >     bd0:  je   bde                         ; 2 bytes (always executed)
> >     bd2:  jmp  __x86_return_thunk          ; 5 bytes (executed if above
> >                                            ; je is not taken)
> >                                            ; rest is not executed
> >     bd7:  call queued_spin_release_traced  ; 5 bytes
> >     bdc:  jmp  bc9                         ; 2 bytes
> >     bde:  call __SCT__preempt_schedule     ; 5 bytes
> >     be3:  jmp  __x86_return_thunk          ; 5 bytes
> > 
> 
> So I've been playing with this a bit, and it is all really sad.
> 
> Now, since pretty much everybody+dog will have PARAVIRT_SPINLOCK=y, the
> 'best' solution would be changing that paravirt call with a
> static_call(), that actually shrinks the code by 1 byte.
> 
> And then this tracepoint nonsense can simply use a different unlock
> function, just like paravirt.
> 
> 0000 00000000000001d0 <_raw_spin_unlock>:
> 0000  1d0:	f3 0f 1e fa          	endbr64
> 0004  1d4:	ff 15 00 00 00 00    	call   *0x0(%rip)        # 1da <_raw_spin_unlock+0xa>	1d6: R_X86_64_PC32	pv_ops_lock+0x4
> 000a  1da:	65 ff 0d 00 00 00 00 	decl   %gs:0x0(%rip)        # 1e1 <_raw_spin_unlock+0x11>	1dd: R_X86_64_PC32	__preempt_count-0x4
> 0011  1e1:	74 06                	je     1e9 <_raw_spin_unlock+0x19>
> 0013  1e3:	2e e9 00 00 00 00    	cs jmp 1e9 <_raw_spin_unlock+0x19>	1e5: R_X86_64_PLT32	__x86_return_thunk-0x4
> 0019  1e9:	e8 00 00 00 00       	call   1ee <_raw_spin_unlock+0x1e>	1ea: R_X86_64_PLT32	__SCT__preempt_schedule-0x4
> 001e  1ee:	2e e9 00 00 00 00    	cs jmp 1f4 <_raw_spin_unlock+0x24>	1f0: R_X86_64_PLT32	__x86_return_thunk-0x4
> 
> 
> 0000 00000000000001d0 <_raw_spin_unlock>:
> 0000  1d0:	f3 0f 1e fa          	endbr64
> 0004  1d4:	e8 00 00 00 00       	call   1d9 <_raw_spin_unlock+0x9>	1d5: R_X86_64_PLT32	__SCT__queued_spin_unlock-0x4
> 0009  1d9:	65 ff 0d 00 00 00 00 	decl   %gs:0x0(%rip)        # 1e0 <_raw_spin_unlock+0x10>	1dc: R_X86_64_PC32	__preempt_count-0x4
> 0010  1e0:	74 06                	je     1e8 <_raw_spin_unlock+0x18>
> 0012  1e2:	2e e9 00 00 00 00    	cs jmp 1e8 <_raw_spin_unlock+0x18>	1e4: R_X86_64_PLT32	__x86_return_thunk-0x4
> 0018  1e8:	e8 00 00 00 00       	call   1ed <_raw_spin_unlock+0x1d>	1e9: R_X86_64_PLT32	__SCT__preempt_schedule-0x4
> 001d  1ed:	2e e9 00 00 00 00    	cs jmp 1f3 <_raw_spin_unlock+0x23>	1ef: R_X86_64_PLT32	__x86_return_thunk-0x4
> 
> 
> Something a little like so, which is completely untested, except to
> build kernel/locking/spinlock.o (with clang-23).

Thanks a lot for taking a look, Peter.

I like the static_call idea. It's truly zero cost on x86 (and, as you
note, even a byte smaller). The one caveat is that it relies on
HAVE_STATIC_CALL_INLINE to stay free. 

So my plan would be: static_call where HAVE_STATIC_CALL_INLINE is
available (x86), and a static branch fallback elsewhere, gated behind a
default-off config so it imposes nothing on arches/kernels that don't
opt in. I'm mostly interested in x86, but would like arm64 to work too,
which would use the fallback.

Concretely:

1. Split the sleepable-lock patches out and send them separately.
   They're independent of the static call work and look far less
   controversial.

2. Convert the paravirt spinlock unlock to a static_call, as the
   foundation for the unlock tracepoint. I'm happy to take a stab at it.
   Let me know if you'd rather do it yourself.

3. Build the unlock tracepoint on top: static_call where it's cheap,
   config-gated static_branch fallback where it isn't.

Does this plan sound reasonable to you?

> 
> Also, I think someone should go do some performance runs with
> ARCH_INLINE_SPIN_* set for x86 just like for s390.

That's a good point, I'll run benchmarks and report back with the
results.

  reply	other threads:[~2026-06-03 14:17 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05 17:09 [PATCH v6 0/7] locking: contended_release tracepoint instrumentation Dmitry Ilvokhin
2026-05-05 17:09 ` [PATCH v6 1/7] tracing/lock: Remove unnecessary linux/sched.h include Dmitry Ilvokhin
2026-05-05 17:09 ` [PATCH v6 2/7] locking/percpu-rwsem: Extract __percpu_up_read() Dmitry Ilvokhin
2026-05-05 17:09 ` [PATCH v6 3/7] locking: Add contended_release tracepoint to sleepable locks Dmitry Ilvokhin
2026-05-05 17:09 ` [PATCH v6 4/7] locking: Factor out queued_spin_release() Dmitry Ilvokhin
2026-05-13 15:37   ` Steven Rostedt
2026-05-05 17:09 ` [PATCH v6 5/7] locking: Add contended_release tracepoint to qspinlock Dmitry Ilvokhin
2026-05-13 15:41   ` Steven Rostedt
2026-05-14 14:13     ` Dmitry Ilvokhin
2026-05-14 16:03       ` Steven Rostedt
2026-05-15 14:40         ` Dmitry Ilvokhin
2026-05-13 19:33   ` Peter Zijlstra
2026-05-14 12:34     ` Dmitry Ilvokhin
2026-05-27 13:30       ` Dmitry Ilvokhin
2026-06-03 12:08       ` Peter Zijlstra
2026-06-03 14:17         ` Dmitry Ilvokhin [this message]
2026-06-03 14:26           ` Peter Zijlstra
2026-05-05 17:09 ` [PATCH v6 6/7] locking: Factor out __queued_read_unlock()/__queued_write_unlock() Dmitry Ilvokhin
2026-05-13 15:41   ` Steven Rostedt
2026-05-05 17:09 ` [PATCH v6 7/7] locking: Add contended_release tracepoint to qrwlock Dmitry Ilvokhin
2026-05-13 15:43   ` Steven Rostedt
2026-05-13 19:26 ` [PATCH v6 0/7] locking: contended_release tracepoint instrumentation Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiA3dzjKVq2_cpSY@shell.ilvokhin.com \
    --to=d@ilvokhin.com \
    --cc=ajay.kaher@broadcom.com \
    --cc=alexey.makhalov@broadcom.com \
    --cc=arnd@arndb.de \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=boqun@kernel.org \
    --cc=bp@alien8.de \
    --cc=cl@gentwo.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=dennis@kernel.org \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=kernel-team@meta.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@kernel.org \
    --cc=tj@kernel.org \
    --cc=tsbogend@alpha.franken.de \
    --cc=virtualization@lists.linux.dev \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox