All of lore.kernel.org
 help / color / mirror / Atom feed
From: Heiko Carstens <hca@linux.ibm.com>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: David Laight <david.laight.linux@gmail.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Juergen Christ <jchrist@linux.ibm.com>,
	"Christoph Lameter (Ampere)" <cl@gentwo.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH v3 0/9] s390: Improve this_cpu operations
Date: Thu, 28 May 2026 16:14:41 +0200	[thread overview]
Message-ID: <20260528141441.15387D07-hca@linux.ibm.com> (raw)
In-Reply-To: <cfdb20a6-5621-417b-9202-d788ca34251b@os.amperecomputing.com>

On Wed, May 27, 2026 at 04:44:31PM -0700, Yang Shi wrote:
> On 5/22/26 2:18 AM, Heiko Carstens wrote:
> > It is amazing to see the performance improvements you see on arm64, however
> > I believe that is mainly because of the large amount of code which is
> > generated by the arm64 implementations of the preempt primitives
> > __preempt_count_add() and __preempt_count_dec_and_test().
> 
> Yes, we need 4 instructions on ARM64 for disabling/enabling preempt (one
> instruction is used to load current pointer, the other 3 instructions are
> used to RMW preempt_count). So I can remove 8 instructions in total for a
> single this_cpu ops. That's a lot. Given this_cpu ops are heavily used in
> kernel, we end up running fewer instructions and having better icache hit
> rate, the better icache hit rate also helps reduce cross node traffic for
> 2-socket system.

You save more. Look at arm64's __preempt_count_dec_and_test()
implementation: it is RMW + compare + READ + compare.

preempt_enable() generates this code, where x1 seems to contain the
preempt_count pointer:

  80:   f9400420        ldr     x0, [x1, #8]
  84:   d1000400        sub     x0, x0, #0x1
  88:   b9000820        str     w0, [x1, #8]
  8c:   b4000060        cbz     x0, 98 <bar+0x58>
  90:   f9400420        ldr     x0, [x1, #8]
  94:   b5000040        cbnz    x0, 9c <bar+0x5c>
  98:   94000000        bl      0 <preempt_schedule_notrace>
  9c:   ...

I assume arm64's instruction set does not allow for better code for
__preempt_count_dec_and_test() if you would fold the need_resched bit into
preempt_count and use atomic instructions + inline assembly with flag
output operands when modifying preempt_count.
As of now only x86 and s390 are doing that.

  parent reply	other threads:[~2026-05-28 14:14 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20  9:22 [PATCH v3 0/9] s390: Improve this_cpu operations Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 1/9] s390/alternatives: Add new ALT_TYPE_PERCPU type Heiko Carstens
2026-05-20 12:43   ` David Laight
2026-05-20 13:50     ` Heiko Carstens
2026-05-20 14:16       ` Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 2/9] s390/percpu: Infrastructure for more efficient this_cpu operations Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 3/9] s390/percpu: Add missing do { } while (0) constructs Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 4/9] s390/percpu: Use new percpu code section for arch_this_cpu_add() Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 5/9] s390/percpu: Use new percpu code section for arch_this_cpu_add_return() Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 6/9] s390/percpu: Use new percpu code section for arch_this_cpu_[and|or]() Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 7/9] s390/percpu: Provide arch_this_cpu_read() implementation Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 8/9] s390/percpu: Provide arch_this_cpu_write() implementation Heiko Carstens
2026-05-20  9:22 ` [PATCH v3 9/9] s390/percpu: Remove one and two byte this_cpu operation implementation Heiko Carstens
2026-05-20 18:42 ` [PATCH v3 0/9] s390: Improve this_cpu operations Yang Shi
2026-05-20 22:34   ` David Laight
2026-05-21  0:23     ` Yang Shi
2026-05-21 10:17       ` David Laight
2026-05-21 16:57         ` Yang Shi
2026-05-21 17:55           ` David Laight
2026-05-21 20:46             ` Yang Shi
2026-05-21 22:13               ` David Laight
2026-05-21 23:41                 ` Yang Shi
2026-05-21 10:23       ` David Laight
2026-05-21 17:48         ` Yang Shi
2026-05-21 10:37       ` Heiko Carstens
2026-05-21 17:47         ` Yang Shi
2026-05-22  9:18           ` Heiko Carstens
2026-05-27 19:09             ` Christoph Lameter (Ampere)
2026-05-27 20:38               ` Yang Shi
2026-05-28  8:36                 ` David Laight
2026-05-27 23:44             ` Yang Shi
2026-05-28  9:03               ` David Laight
2026-05-28 19:19                 ` Yang Shi
2026-05-28 20:34                   ` David Laight
2026-05-28 14:14               ` Heiko Carstens [this message]
2026-05-28 17:14                 ` David Laight
2026-05-28 18:39                 ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260528141441.15387D07-hca@linux.ibm.com \
    --to=hca@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=cl@gentwo.org \
    --cc=david.laight.linux@gmail.com \
    --cc=gor@linux.ibm.com \
    --cc=jchrist@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sshegde@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.