From: Heiko Carstens <hca@linux.ibm.com>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: David Laight <david.laight.linux@gmail.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Juergen Christ <jchrist@linux.ibm.com>,
"Christoph Lameter (Ampere)" <cl@gentwo.org>,
Peter Zijlstra <peterz@infradead.org>,
Shrikanth Hegde <sshegde@linux.ibm.com>,
linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH v3 0/9] s390: Improve this_cpu operations
Date: Thu, 28 May 2026 16:14:41 +0200 [thread overview]
Message-ID: <20260528141441.15387D07-hca@linux.ibm.com> (raw)
In-Reply-To: <cfdb20a6-5621-417b-9202-d788ca34251b@os.amperecomputing.com>
On Wed, May 27, 2026 at 04:44:31PM -0700, Yang Shi wrote:
> On 5/22/26 2:18 AM, Heiko Carstens wrote:
> > It is amazing to see the performance improvements you see on arm64, however
> > I believe that is mainly because of the large amount of code which is
> > generated by the arm64 implementations of the preempt primitives
> > __preempt_count_add() and __preempt_count_dec_and_test().
>
> Yes, we need 4 instructions on ARM64 for disabling/enabling preempt (one
> instruction is used to load current pointer, the other 3 instructions are
> used to RMW preempt_count). So I can remove 8 instructions in total for a
> single this_cpu ops. That's a lot. Given this_cpu ops are heavily used in
> kernel, we end up running fewer instructions and having better icache hit
> rate, the better icache hit rate also helps reduce cross node traffic for
> 2-socket system.
You save more. Look at arm64's __preempt_count_dec_and_test()
implementation: it is RMW + compare + READ + compare.
preempt_enable() generates this code, where x1 seems to contain the
preempt_count pointer:
80: f9400420 ldr x0, [x1, #8]
84: d1000400 sub x0, x0, #0x1
88: b9000820 str w0, [x1, #8]
8c: b4000060 cbz x0, 98 <bar+0x58>
90: f9400420 ldr x0, [x1, #8]
94: b5000040 cbnz x0, 9c <bar+0x5c>
98: 94000000 bl 0 <preempt_schedule_notrace>
9c: ...
I assume arm64's instruction set does not allow for better code for
__preempt_count_dec_and_test() if you would fold the need_resched bit into
preempt_count and use atomic instructions + inline assembly with flag
output operands when modifying preempt_count.
As of now only x86 and s390 are doing that.
next prev parent reply other threads:[~2026-05-28 14:14 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 9:22 [PATCH v3 0/9] s390: Improve this_cpu operations Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 1/9] s390/alternatives: Add new ALT_TYPE_PERCPU type Heiko Carstens
2026-05-20 12:43 ` David Laight
2026-05-20 13:50 ` Heiko Carstens
2026-05-20 14:16 ` Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 2/9] s390/percpu: Infrastructure for more efficient this_cpu operations Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 3/9] s390/percpu: Add missing do { } while (0) constructs Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 4/9] s390/percpu: Use new percpu code section for arch_this_cpu_add() Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 5/9] s390/percpu: Use new percpu code section for arch_this_cpu_add_return() Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 6/9] s390/percpu: Use new percpu code section for arch_this_cpu_[and|or]() Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 7/9] s390/percpu: Provide arch_this_cpu_read() implementation Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 8/9] s390/percpu: Provide arch_this_cpu_write() implementation Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 9/9] s390/percpu: Remove one and two byte this_cpu operation implementation Heiko Carstens
2026-05-20 18:42 ` [PATCH v3 0/9] s390: Improve this_cpu operations Yang Shi
2026-05-20 22:34 ` David Laight
2026-05-21 0:23 ` Yang Shi
2026-05-21 10:17 ` David Laight
2026-05-21 16:57 ` Yang Shi
2026-05-21 17:55 ` David Laight
2026-05-21 20:46 ` Yang Shi
2026-05-21 22:13 ` David Laight
2026-05-21 23:41 ` Yang Shi
2026-05-21 10:23 ` David Laight
2026-05-21 17:48 ` Yang Shi
2026-05-21 10:37 ` Heiko Carstens
2026-05-21 17:47 ` Yang Shi
2026-05-22 9:18 ` Heiko Carstens
2026-05-27 19:09 ` Christoph Lameter (Ampere)
2026-05-27 20:38 ` Yang Shi
2026-05-28 8:36 ` David Laight
2026-05-27 23:44 ` Yang Shi
2026-05-28 9:03 ` David Laight
2026-05-28 19:19 ` Yang Shi
2026-05-28 20:34 ` David Laight
2026-05-28 14:14 ` Heiko Carstens [this message]
2026-05-28 17:14 ` David Laight
2026-05-28 18:39 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260528141441.15387D07-hca@linux.ibm.com \
--to=hca@linux.ibm.com \
--cc=agordeev@linux.ibm.com \
--cc=borntraeger@linux.ibm.com \
--cc=cl@gentwo.org \
--cc=david.laight.linux@gmail.com \
--cc=gor@linux.ibm.com \
--cc=jchrist@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=sshegde@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=yang@os.amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.