From: Heiko Carstens <hca@linux.ibm.com>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: David Laight <david.laight.linux@gmail.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Juergen Christ <jchrist@linux.ibm.com>,
"Christoph Lameter (Ampere)" <cl@gentwo.org>,
Peter Zijlstra <peterz@infradead.org>,
Shrikanth Hegde <sshegde@linux.ibm.com>,
linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH v3 0/9] s390: Improve this_cpu operations
Date: Thu, 28 May 2026 16:14:41 +0200 [thread overview]
Message-ID: <20260528141441.15387D07-hca@linux.ibm.com> (raw)
In-Reply-To: <cfdb20a6-5621-417b-9202-d788ca34251b@os.amperecomputing.com>
On Wed, May 27, 2026 at 04:44:31PM -0700, Yang Shi wrote:
> On 5/22/26 2:18 AM, Heiko Carstens wrote:
> > It is amazing to see the performance improvements you see on arm64, however
> > I believe that is mainly because of the large amount of code which is
> > generated by the arm64 implementations of the preempt primitives
> > __preempt_count_add() and __preempt_count_dec_and_test().
>
> Yes, we need 4 instructions on ARM64 for disabling/enabling preempt (one
> instruction is used to load current pointer, the other 3 instructions are
> used to RMW preempt_count). So I can remove 8 instructions in total for a
> single this_cpu ops. That's a lot. Given this_cpu ops are heavily used in
> kernel, we end up running fewer instructions and having better icache hit
> rate, the better icache hit rate also helps reduce cross node traffic for
> 2-socket system.
You save more. Look at arm64's __preempt_count_dec_and_test()
implementation: it is RMW + compare + READ + compare.
preempt_enable() generates this code, where x1 seems to contain the
preempt_count pointer:
80: f9400420 ldr x0, [x1, #8]
84: d1000400 sub x0, x0, #0x1
88: b9000820 str w0, [x1, #8]
8c: b4000060 cbz x0, 98 <bar+0x58>
90: f9400420 ldr x0, [x1, #8]
94: b5000040 cbnz x0, 9c <bar+0x5c>
98: 94000000 bl 0 <preempt_schedule_notrace>
9c: ...
I assume arm64's instruction set does not allow for better code for
__preempt_count_dec_and_test() if you would fold the need_resched bit into
preempt_count and use atomic instructions + inline assembly with flag
output operands when modifying preempt_count.
As of now only x86 and s390 are doing that.
next prev parent reply other threads:[~2026-05-28 14:14 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 9:22 [PATCH v3 0/9] s390: Improve this_cpu operations Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 1/9] s390/alternatives: Add new ALT_TYPE_PERCPU type Heiko Carstens
2026-05-20 12:43 ` David Laight
2026-05-20 13:50 ` Heiko Carstens
2026-05-20 14:16 ` Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 2/9] s390/percpu: Infrastructure for more efficient this_cpu operations Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 3/9] s390/percpu: Add missing do { } while (0) constructs Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 4/9] s390/percpu: Use new percpu code section for arch_this_cpu_add() Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 5/9] s390/percpu: Use new percpu code section for arch_this_cpu_add_return() Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 6/9] s390/percpu: Use new percpu code section for arch_this_cpu_[and|or]() Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 7/9] s390/percpu: Provide arch_this_cpu_read() implementation Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 8/9] s390/percpu: Provide arch_this_cpu_write() implementation Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 9/9] s390/percpu: Remove one and two byte this_cpu operation implementation Heiko Carstens
2026-05-20 18:42 ` [PATCH v3 0/9] s390: Improve this_cpu operations Yang Shi
2026-05-20 22:34 ` David Laight
2026-05-21 0:23 ` Yang Shi
2026-05-21 10:17 ` David Laight
2026-05-21 16:57 ` Yang Shi
2026-05-21 17:55 ` David Laight
2026-05-21 20:46 ` Yang Shi
2026-05-21 22:13 ` David Laight
2026-05-21 23:41 ` Yang Shi
2026-05-21 10:23 ` David Laight
2026-05-21 17:48 ` Yang Shi
2026-05-21 10:37 ` Heiko Carstens
2026-05-21 17:47 ` Yang Shi
2026-05-22 9:18 ` Heiko Carstens
2026-05-27 19:09 ` Christoph Lameter (Ampere)
2026-05-27 20:38 ` Yang Shi
2026-05-28 8:36 ` David Laight
2026-05-27 23:44 ` Yang Shi
2026-05-28 9:03 ` David Laight
2026-05-28 19:19 ` Yang Shi
2026-05-28 20:34 ` David Laight
2026-05-28 14:14 ` Heiko Carstens [this message]
2026-05-28 17:14 ` David Laight
2026-05-28 18:39 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260528141441.15387D07-hca@linux.ibm.com \
--to=hca@linux.ibm.com \
--cc=agordeev@linux.ibm.com \
--cc=borntraeger@linux.ibm.com \
--cc=cl@gentwo.org \
--cc=david.laight.linux@gmail.com \
--cc=gor@linux.ibm.com \
--cc=jchrist@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=sshegde@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=yang@os.amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox