Linux s390 Architecture development
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: Heiko Carstens <hca@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Juergen Christ <jchrist@linux.ibm.com>,
	"Christoph Lameter (Ampere)" <cl@gentwo.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH v3 0/9] s390: Improve this_cpu operations
Date: Thu, 28 May 2026 21:34:05 +0100	[thread overview]
Message-ID: <20260528213406.134cf354@pumpkin> (raw)
In-Reply-To: <37aa12be-f3ee-4b6d-8fcc-33ccdec2725b@os.amperecomputing.com>

On Thu, 28 May 2026 12:19:43 -0700
Yang Shi <yang@os.amperecomputing.com> wrote:

> On 5/28/26 2:03 AM, David Laight wrote:
> > On Wed, 27 May 2026 16:44:31 -0700
> > Yang Shi <yang@os.amperecomputing.com> wrote:
> >  
> >> On 5/22/26 2:18 AM, Heiko Carstens wrote:  
> > ...  
> >>> It is amazing to see the performance improvements you see on arm64, however
> >>> I believe that is mainly because of the large amount of code which is
> >>> generated by the arm64 implementations of the preempt primitives
> >>> __preempt_count_add() and __preempt_count_dec_and_test().  
> >> Yes, we need 4 instructions on ARM64 for disabling/enabling preempt (one
> >> instruction is used to load current pointer, the other 3 instructions
> >> are used to RMW preempt_count). So I can remove 8 instructions in total
> >> for a single this_cpu ops. That's a lot. Given this_cpu ops are heavily
> >> used in kernel, we end up running fewer instructions and having better
> >> icache hit rate, the better icache hit rate also helps reduce cross node
> >> traffic for 2-socket system.  
> > Is 'current' kept in a cpu hardware register?  
> 
> Yes, sp_el0. But it is a special register, we need move it to a general 
> register before any ARM64 instructions can access it.

That is what I thought.
(Hmm... isn't that the userspace stack register?)

> 
> > With the process switch code updating current->per_cpu_data.
> >
> > That might mean that you can access per-cpu data without disabling
> > preemption (for single ops) using the same technique as s390.
> > So something like:
> > 	mov %ra, current
> > 	movb per_cpu_reg(%ra), $b
> > 	mov %rb, per_cpu_data(%ra)
> > 	// per-cpu access using %rb, process switch code will update %rb
> > 	movb per_cpu_reg(%ra), $255
> >
> > An add will need to use a cmpxchg loop.
> > For simplicity use a fixed register for %rb.  
> 
> TBH, I can't say I fully understand what you proposed. But it sounds 
> like this one 
> https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?id=84ee5f23f93d4a650e828f831da9ed29c54623c5

Not really, although it does describe one way to do an atomic add.
For things like per-cpu stats you don't really care if the
'wrong' stats are changed, but the R and W (of the RMW) need to go to the
same address.

That proposal reserved a 'general register' for the per-cpu data all the time.

Like the s390 code this all started with, I'm suggesting that the code
tells the context switch code that a specific register contains the base
of the per-cpu data, on context switch that register is changed to be the
base address of the per-cpu data for the new cpu.
So outside of the code accessing per-cpu data the register can be used normally.

I don't think you need to look at the opcode if the process switch (the s390
code did), even checking that %rb (above) contains the per-cpu data address
is really optional.

I suggested using a fixed register meaning 'always use the same register'
to save the difficultly of generating $n from %rn.

-- David



 


> 
> Thanks,
> Yang
> 
> >
> > -- David  
> 


  reply	other threads:[~2026-05-28 20:34 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20  9:22 [PATCH v3 0/9] s390: Improve this_cpu operations Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 1/9] s390/alternatives: Add new ALT_TYPE_PERCPU type Heiko Carstens
2026-05-20 12:43   ` David Laight
2026-05-20 13:50     ` Heiko Carstens
2026-05-20 14:16       ` Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 2/9] s390/percpu: Infrastructure for more efficient this_cpu operations Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 3/9] s390/percpu: Add missing do { } while (0) constructs Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 4/9] s390/percpu: Use new percpu code section for arch_this_cpu_add() Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 5/9] s390/percpu: Use new percpu code section for arch_this_cpu_add_return() Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 6/9] s390/percpu: Use new percpu code section for arch_this_cpu_[and|or]() Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 7/9] s390/percpu: Provide arch_this_cpu_read() implementation Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 8/9] s390/percpu: Provide arch_this_cpu_write() implementation Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 9/9] s390/percpu: Remove one and two byte this_cpu operation implementation Heiko Carstens
2026-05-20 18:42 ` [PATCH v3 0/9] s390: Improve this_cpu operations Yang Shi
2026-05-20 22:34   ` David Laight
2026-05-21  0:23     ` Yang Shi
2026-05-21 10:17       ` David Laight
2026-05-21 16:57         ` Yang Shi
2026-05-21 17:55           ` David Laight
2026-05-21 20:46             ` Yang Shi
2026-05-21 22:13               ` David Laight
2026-05-21 23:41                 ` Yang Shi
2026-05-21 10:23       ` David Laight
2026-05-21 17:48         ` Yang Shi
2026-05-21 10:37       ` Heiko Carstens
2026-05-21 17:47         ` Yang Shi
2026-05-22  9:18           ` Heiko Carstens
2026-05-27 19:09             ` Christoph Lameter (Ampere)
2026-05-27 20:38               ` Yang Shi
2026-05-28  8:36                 ` David Laight
2026-05-27 23:44             ` Yang Shi
2026-05-28  9:03               ` David Laight
2026-05-28 19:19                 ` Yang Shi
2026-05-28 20:34                   ` David Laight [this message]
2026-05-28 14:14               ` Heiko Carstens
2026-05-28 17:14                 ` David Laight
2026-05-28 18:39                 ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260528213406.134cf354@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=agordeev@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=cl@gentwo.org \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=jchrist@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sshegde@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox