From: David Laight <david.laight.linux@gmail.com>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: Heiko Carstens <hca@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Juergen Christ <jchrist@linux.ibm.com>,
"Christoph Lameter (Ampere)" <cl@gentwo.org>,
Peter Zijlstra <peterz@infradead.org>,
Shrikanth Hegde <sshegde@linux.ibm.com>,
linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH v3 0/9] s390: Improve this_cpu operations
Date: Thu, 28 May 2026 21:34:05 +0100 [thread overview]
Message-ID: <20260528213406.134cf354@pumpkin> (raw)
In-Reply-To: <37aa12be-f3ee-4b6d-8fcc-33ccdec2725b@os.amperecomputing.com>
On Thu, 28 May 2026 12:19:43 -0700
Yang Shi <yang@os.amperecomputing.com> wrote:
> On 5/28/26 2:03 AM, David Laight wrote:
> > On Wed, 27 May 2026 16:44:31 -0700
> > Yang Shi <yang@os.amperecomputing.com> wrote:
> >
> >> On 5/22/26 2:18 AM, Heiko Carstens wrote:
> > ...
> >>> It is amazing to see the performance improvements you see on arm64, however
> >>> I believe that is mainly because of the large amount of code which is
> >>> generated by the arm64 implementations of the preempt primitives
> >>> __preempt_count_add() and __preempt_count_dec_and_test().
> >> Yes, we need 4 instructions on ARM64 for disabling/enabling preempt (one
> >> instruction is used to load current pointer, the other 3 instructions
> >> are used to RMW preempt_count). So I can remove 8 instructions in total
> >> for a single this_cpu ops. That's a lot. Given this_cpu ops are heavily
> >> used in kernel, we end up running fewer instructions and having better
> >> icache hit rate, the better icache hit rate also helps reduce cross node
> >> traffic for 2-socket system.
> > Is 'current' kept in a cpu hardware register?
>
> Yes, sp_el0. But it is a special register, we need move it to a general
> register before any ARM64 instructions can access it.
That is what I thought.
(Hmm... isn't that the userspace stack register?)
>
> > With the process switch code updating current->per_cpu_data.
> >
> > That might mean that you can access per-cpu data without disabling
> > preemption (for single ops) using the same technique as s390.
> > So something like:
> > mov %ra, current
> > movb per_cpu_reg(%ra), $b
> > mov %rb, per_cpu_data(%ra)
> > // per-cpu access using %rb, process switch code will update %rb
> > movb per_cpu_reg(%ra), $255
> >
> > An add will need to use a cmpxchg loop.
> > For simplicity use a fixed register for %rb.
>
> TBH, I can't say I fully understand what you proposed. But it sounds
> like this one
> https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?id=84ee5f23f93d4a650e828f831da9ed29c54623c5
Not really, although it does describe one way to do an atomic add.
For things like per-cpu stats you don't really care if the
'wrong' stats are changed, but the R and W (of the RMW) need to go to the
same address.
That proposal reserved a 'general register' for the per-cpu data all the time.
Like the s390 code this all started with, I'm suggesting that the code
tells the context switch code that a specific register contains the base
of the per-cpu data, on context switch that register is changed to be the
base address of the per-cpu data for the new cpu.
So outside of the code accessing per-cpu data the register can be used normally.
I don't think you need to look at the opcode if the process switch (the s390
code did), even checking that %rb (above) contains the per-cpu data address
is really optional.
I suggested using a fixed register meaning 'always use the same register'
to save the difficultly of generating $n from %rn.
-- David
>
> Thanks,
> Yang
>
> >
> > -- David
>
next prev parent reply other threads:[~2026-05-28 20:34 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 9:22 [PATCH v3 0/9] s390: Improve this_cpu operations Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 1/9] s390/alternatives: Add new ALT_TYPE_PERCPU type Heiko Carstens
2026-05-20 12:43 ` David Laight
2026-05-20 13:50 ` Heiko Carstens
2026-05-20 14:16 ` Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 2/9] s390/percpu: Infrastructure for more efficient this_cpu operations Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 3/9] s390/percpu: Add missing do { } while (0) constructs Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 4/9] s390/percpu: Use new percpu code section for arch_this_cpu_add() Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 5/9] s390/percpu: Use new percpu code section for arch_this_cpu_add_return() Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 6/9] s390/percpu: Use new percpu code section for arch_this_cpu_[and|or]() Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 7/9] s390/percpu: Provide arch_this_cpu_read() implementation Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 8/9] s390/percpu: Provide arch_this_cpu_write() implementation Heiko Carstens
2026-05-20 9:22 ` [PATCH v3 9/9] s390/percpu: Remove one and two byte this_cpu operation implementation Heiko Carstens
2026-05-20 18:42 ` [PATCH v3 0/9] s390: Improve this_cpu operations Yang Shi
2026-05-20 22:34 ` David Laight
2026-05-21 0:23 ` Yang Shi
2026-05-21 10:17 ` David Laight
2026-05-21 16:57 ` Yang Shi
2026-05-21 17:55 ` David Laight
2026-05-21 20:46 ` Yang Shi
2026-05-21 22:13 ` David Laight
2026-05-21 23:41 ` Yang Shi
2026-05-21 10:23 ` David Laight
2026-05-21 17:48 ` Yang Shi
2026-05-21 10:37 ` Heiko Carstens
2026-05-21 17:47 ` Yang Shi
2026-05-22 9:18 ` Heiko Carstens
2026-05-27 19:09 ` Christoph Lameter (Ampere)
2026-05-27 20:38 ` Yang Shi
2026-05-28 8:36 ` David Laight
2026-05-27 23:44 ` Yang Shi
2026-05-28 9:03 ` David Laight
2026-05-28 19:19 ` Yang Shi
2026-05-28 20:34 ` David Laight [this message]
2026-05-28 14:14 ` Heiko Carstens
2026-05-28 17:14 ` David Laight
2026-05-28 18:39 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260528213406.134cf354@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=agordeev@linux.ibm.com \
--cc=borntraeger@linux.ibm.com \
--cc=cl@gentwo.org \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=jchrist@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=sshegde@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=yang@os.amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.