From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 45249CD6E55 for ; Wed, 3 Jun 2026 14:17:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1389B6B0098; Wed, 3 Jun 2026 10:17:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E98D6B009B; Wed, 3 Jun 2026 10:17:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F19C56B009D; Wed, 3 Jun 2026 10:17:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DC9726B0098 for ; Wed, 3 Jun 2026 10:17:34 -0400 (EDT) Received: from smtpin24.hostedemail.com (lb01a-stub [10.200.18.249]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7EB5D8D82D for ; Wed, 3 Jun 2026 14:17:34 +0000 (UTC) X-FDA: 84838804428.24.94A8B67 Received: from mail.ilvokhin.com (mail.ilvokhin.com [178.62.254.231]) by imf31.hostedemail.com (Postfix) with ESMTP id A07F320011 for ; Wed, 3 Jun 2026 14:17:32 +0000 (UTC) Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=ilvokhin.com header.s=mail header.b=gLhilNgA; spf=pass (imf31.hostedemail.com: domain of d@ilvokhin.com designates 178.62.254.231 as permitted sender) smtp.mailfrom=d@ilvokhin.com; dmarc=pass (policy=reject) header.from=ilvokhin.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1780496252; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SfhK82Xil3vBLBzbtyq+NFAA8bzLZV/4SRFeV2fyIBU=; b=FH8VSw46kWxGZPcfhJeYpr08hl/JGuZExf/jw4YZ6I5xO0Djmo7BE2QOMPH9apja9mKF8g NFjX+X4wdzhBE8Djwe1w05uxw+9CHmx1DhaMO41podwToPPU6zkU65lHS00k3nbV6ejSSh xALPYu9lLh9ZCvN+rP/8XU39pyS8oBo= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=ilvokhin.com header.s=mail header.b=gLhilNgA; spf=pass (imf31.hostedemail.com: domain of d@ilvokhin.com designates 178.62.254.231 as permitted sender) smtp.mailfrom=d@ilvokhin.com; dmarc=pass (policy=reject) header.from=ilvokhin.com ARC-Seal: i=1; a=rsa-sha256; d=hostedemail.com; s=arc-20220608; cv=none; t=1780496253; b=MVMH2b/t3/YN79g3M8u6Knooc+5vvfY0An5BOLSkjPx4/LZLMXC5vuHCd7nEzmshueohCp CXGSQMmW6liiW8YGXyGycxbnTVqwRFzz5QyemwSclRMWN8H9SsNyhhYm1Y8MWnseMO8u9L KGDeQWW9Zmlb++oJWnk42gd93crz/GY= Received: from shell.ilvokhin.com (shell.ilvokhin.com [138.68.190.75]) (Authenticated sender: d@ilvokhin.com) by mail.ilvokhin.com (Postfix) with ESMTPSA id A76B4D10A6; Wed, 03 Jun 2026 14:17:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ilvokhin.com; s=mail; t=1780496251; bh=SfhK82Xil3vBLBzbtyq+NFAA8bzLZV/4SRFeV2fyIBU=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=gLhilNgA/246de7OKuDBzuIzyl+ldRCWUnvUz13vpuaTrLa5lQCdfs0BRiS9CyABu A885RQ4sztnBP3xCfblx2wxbXYzH+/yU+KJrXASNY9Wy8+6TjWfuu8hmB6q0torO+N UusytLcEBa+RdN7aUdMerhdk06+/T4dloG+6ZunE= Date: Wed, 3 Jun 2026 14:17:27 +0000 From: Dmitry Ilvokhin To: Peter Zijlstra Cc: Ingo Molnar , Will Deacon , Boqun Feng , Waiman Long , Thomas Bogendoerfer , Juergen Gross , Ajay Kaher , Alexey Makhalov , Broadcom internal kernel review list , Thomas Gleixner , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Arnd Bergmann , Dennis Zhou , Tejun Heo , Christoph Lameter , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, virtualization@lists.linux.dev, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, kernel-team@meta.com, "Paul E. McKenney" Subject: Re: [PATCH v6 5/7] locking: Add contended_release tracepoint to qspinlock Message-ID: References: <5d7ea75ffe74a785e6b234ada9f23c6373d4b4c1.1777999826.git.d@ilvokhin.com> <20260513193342.GB2545104@noisy.programming.kicks-ass.net> <20260603120811.GW3493090@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260603120811.GW3493090@noisy.programming.kicks-ass.net> X-Rspamd-Queue-Id: A07F320011 X-Stat-Signature: atz688w6rdy81r4srf9aqqh18kuk68pc X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1780496252-271958 X-HE-Meta: U2FsdGVkX1+dGPrv+LZJyuKt05xZdOVkjNHd4W602QOUqp6HRfhIiP0ewP7EOM883fTsgvAIcZQnBqbMscNNUWXJr8l3W2Gb2oFu3Pyn4TvBn+RlwqXYo2T5ysWfLa1AiOwCVDIQJnoQlJboA+y34xH2evPKTk6IvEAM0dsU1F5fecg/wUvDmx1W6MeKXPAhlBHNngTFfzBXTe+tSzhrvimX0B+9gPXXGF3B+GRVwS5rgJy9Y/Ug6tEB+m5kHPzF3TyKrJoxFDVBFLPjXKQK0W8thIJl2ziCTkYaV8ApD+BU7yT3wkXOMU/qB6jVWl+EBT5MoxGb48u7+WKbz3K6u7vt2tg1N30ivF3fi+070OSWvhVGN3GCByYM61UDBnkxLo2/Xkr9n1q3yvt5XvlxPF2O6WNVTePl+He9BsGU5q0G7CzFB9ORyQcelXTq4vtmoC5q2JCM8gtQuZyDTALdQyDe3ZGlm/kwvwVr/xJq7K2KidK1qjlGRi+GAktikXoZd6ULFpJKhkjAnBStXlPGEc3tvgSq4i7n8YErz2fXl3miuxgS5VL0Xm6kJWxp2OHN6nweOWxgkk3N4ImmUj2cvBXu1J6cVdeBv/x2+P1HbVUskZonhpgTcO0yGVCIImTcJYSOD4EXOoeav6sfZHXN8a+2JzldHeK1flHQqmBfUJwhgqLlyc18Y0w5QzkTXdlwNoRf+BNCg5uwLJWbnKvrBMWDkBA0sdOVpuHt7pMgrixZnQOlQ/5y4xirrpoxAktnCsMPJf1q1KUHFqRXyQAUatCXxBSOM+nKcq389ZQqC6HbcixgRwgB9ErjyjAJTHv1Bw8hyOy0HAgCLlyXPj23V9duKYWZ0mGU+XlNluPZw8tkbTlfRo8AYM9YE3cc2Gfam1oU17Bq9UD+FRnXfJMS79cdJI7aFuqxo51rEEFrvLgrgg+jE7LiFTseNgZ2lCmLZ+ZreGKWWDRT478wWIy UnCYOd0T ruYw7ULMph96InhpfuC1MZcfNhzgtes1f8+EaaMkLE5llA3NfMPs1+xXc15utbcp4QwD/U9qLBtU6ocCZN1xMRPqkdPwDcaO9b6YK9lXgsc+KjCNZHpEpIzzisFuHSm3ZgcapMlk9NQy4ZaAcyePL7OGLheNqWwOzrteUKuNKyy/Tvd80BpJ4x7gvxieJtCc+FhnADwjs5eJ2HAnVeXK2y4mMtAvQh7AcPONv8qcJZz7EnEzyBvV23qSOYNX3B/KhyjL7wj9Ln/Thfcj0Bjo7AyJrew== Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 03, 2026 at 02:08:11PM +0200, Peter Zijlstra wrote: > On Thu, May 14, 2026 at 12:34:55PM +0000, Dmitry Ilvokhin wrote: > > > Baseline, in best case scenario of least number of executed > > instructions. > > > > 3e0: endbr64 ; 4 bytes (always executed) > > 3e4: movb $0x0,(%rdi) ; 3 bytes (unlock, > > ; always executed) > > 3e7: decl %gs:__preempt_count ; 7 bytes (always executed) > > 3ee: je 3f5 ; 2 bytes (always executed) > > 3f0: jmp __x86_return_thunk ; 5 bytes (executed if above > > ; je is not taken) > > ; rest is not executed > > 3f5: call __SCT__preempt_schedule ; 5 bytes > > 3fa: jmp __x86_return_thunk ; 5 bytes > > > > Tracepoint (again same case of least number of executed instructions). > > > > bc0: endbr64 ; 4 bytes (always executed) > > bc4: xchg %ax,%ax ; 2 bytes (always executed, this is an > > ; only addition on the execution path). > > bc6: movb $0x0,(%rdi) ; 3 bytes (unlock, always executed) > > bc9: decl %gs:__preempt_count ; 7 bytes (always executed) > > bd0: je bde ; 2 bytes (always executed) > > bd2: jmp __x86_return_thunk ; 5 bytes (executed if above > > ; je is not taken) > > ; rest is not executed > > bd7: call queued_spin_release_traced ; 5 bytes > > bdc: jmp bc9 ; 2 bytes > > bde: call __SCT__preempt_schedule ; 5 bytes > > be3: jmp __x86_return_thunk ; 5 bytes > > > > So I've been playing with this a bit, and it is all really sad. > > Now, since pretty much everybody+dog will have PARAVIRT_SPINLOCK=y, the > 'best' solution would be changing that paravirt call with a > static_call(), that actually shrinks the code by 1 byte. > > And then this tracepoint nonsense can simply use a different unlock > function, just like paravirt. > > 0000 00000000000001d0 <_raw_spin_unlock>: > 0000 1d0: f3 0f 1e fa endbr64 > 0004 1d4: ff 15 00 00 00 00 call *0x0(%rip) # 1da <_raw_spin_unlock+0xa> 1d6: R_X86_64_PC32 pv_ops_lock+0x4 > 000a 1da: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # 1e1 <_raw_spin_unlock+0x11> 1dd: R_X86_64_PC32 __preempt_count-0x4 > 0011 1e1: 74 06 je 1e9 <_raw_spin_unlock+0x19> > 0013 1e3: 2e e9 00 00 00 00 cs jmp 1e9 <_raw_spin_unlock+0x19> 1e5: R_X86_64_PLT32 __x86_return_thunk-0x4 > 0019 1e9: e8 00 00 00 00 call 1ee <_raw_spin_unlock+0x1e> 1ea: R_X86_64_PLT32 __SCT__preempt_schedule-0x4 > 001e 1ee: 2e e9 00 00 00 00 cs jmp 1f4 <_raw_spin_unlock+0x24> 1f0: R_X86_64_PLT32 __x86_return_thunk-0x4 > > > 0000 00000000000001d0 <_raw_spin_unlock>: > 0000 1d0: f3 0f 1e fa endbr64 > 0004 1d4: e8 00 00 00 00 call 1d9 <_raw_spin_unlock+0x9> 1d5: R_X86_64_PLT32 __SCT__queued_spin_unlock-0x4 > 0009 1d9: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # 1e0 <_raw_spin_unlock+0x10> 1dc: R_X86_64_PC32 __preempt_count-0x4 > 0010 1e0: 74 06 je 1e8 <_raw_spin_unlock+0x18> > 0012 1e2: 2e e9 00 00 00 00 cs jmp 1e8 <_raw_spin_unlock+0x18> 1e4: R_X86_64_PLT32 __x86_return_thunk-0x4 > 0018 1e8: e8 00 00 00 00 call 1ed <_raw_spin_unlock+0x1d> 1e9: R_X86_64_PLT32 __SCT__preempt_schedule-0x4 > 001d 1ed: 2e e9 00 00 00 00 cs jmp 1f3 <_raw_spin_unlock+0x23> 1ef: R_X86_64_PLT32 __x86_return_thunk-0x4 > > > Something a little like so, which is completely untested, except to > build kernel/locking/spinlock.o (with clang-23). Thanks a lot for taking a look, Peter. I like the static_call idea. It's truly zero cost on x86 (and, as you note, even a byte smaller). The one caveat is that it relies on HAVE_STATIC_CALL_INLINE to stay free. So my plan would be: static_call where HAVE_STATIC_CALL_INLINE is available (x86), and a static branch fallback elsewhere, gated behind a default-off config so it imposes nothing on arches/kernels that don't opt in. I'm mostly interested in x86, but would like arm64 to work too, which would use the fallback. Concretely: 1. Split the sleepable-lock patches out and send them separately. They're independent of the static call work and look far less controversial. 2. Convert the paravirt spinlock unlock to a static_call, as the foundation for the unlock tracepoint. I'm happy to take a stab at it. Let me know if you'd rather do it yourself. 3. Build the unlock tracepoint on top: static_call where it's cheap, config-gated static_branch fallback where it isn't. Does this plan sound reasonable to you? > > Also, I think someone should go do some performance runs with > ARCH_INLINE_SPIN_* set for x86 just like for s390. That's a good point, I'll run benchmarks and report back with the results.