All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ankur Arora <ankur.a.arora@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-pm@vger.kernel.org,
	bpf@vger.kernel.org, arnd@arndb.de, catalin.marinas@arm.com,
	will@kernel.org, peterz@infradead.org, mark.rutland@arm.com,
	harisokn@amazon.com, cl@gentwo.org, ast@kernel.org,
	rafael@kernel.org, daniel.lezcano@linaro.org, memxor@gmail.com,
	zhenglifeng1@huawei.com, xueshuai@linux.alibaba.com,
	rdunlap@infradead.org, david.laight.linux@gmail.com,
	joao.m.martins@oracle.com, boris.ostrovsky@oracle.com,
	konrad.wilk@oracle.com
Subject: Re: [PATCH v10 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout()
Date: Mon, 16 Mar 2026 15:08:07 -0700	[thread overview]
Message-ID: <874imftol4.fsf@oracle.com> (raw)
In-Reply-To: <20260315184925.b6f93386e918ca79614843e3@linux-foundation.org>


Andrew Morton <akpm@linux-foundation.org> writes:

> On Sun, 15 Mar 2026 18:36:39 -0700 Ankur Arora <ankur.a.arora@oracle.com> wrote:
>
>> Hi,
>>
>> This series adds waited variants of the smp_cond_load() primitives:
>> smp_cond_load_relaxed_timeout(), and smp_cond_load_acquire_timeout().
>>
>> ...
>>
>
> How are we to determine that this change is successful, useful, etc?

Good point. So this series was split off from this one here:
  https://lore.kernel.org/lkml/20250218213337.377987-1-ankur.a.arora@oracle.com/

The series enables ARCH_HAS_CPU_RELAX on arm64 which should allow
relatively cheap polling in idle on arm64.
However, it does need a few more patches from the series above to do that.

> Reduced CPU consumption?  Reduced energy usage?  Improved latencies?

With the additional patches this should improve wakeup latency:

  I ran the sched-pipe test with processes on VCPUs 4 and 5 with
  kvm-arm.wfi_trap_policy=notrap.

  # perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions,sched:sched_wake_idle_without_ipi \
  perf bench sched pipe -l 1000000 -c 4

  # No haltpoll (and, no TIF_POLLING_NRFLAG):

  Performance counter stats for 'CPU(s) 4,5' (5 runs):

         25,229.57 msec task-clock                       #    2.000 CPUs utilized               ( +-  7.75% )
    45,821,250,284      cycles                           #    1.816 GHz                         ( +- 10.07% )
    26,557,496,665      instructions                     #    0.58  insn per cycle              ( +-  0.21% )
                 0      sched:sched_wake_idle_without_ipi #    0.000 /sec

       12.615 +- 0.977 seconds time elapsed  ( +-  7.75% )


  # Haltpoll:

  Performance counter stats for 'CPU(s) 4,5' (5 runs):

         15,131.58 msec task-clock                       #    2.000 CPUs utilized               ( +- 10.00% )
    34,158,188,839      cycles                           #    2.257 GHz                         ( +-  6.91% )
    20,824,950,916      instructions                     #    0.61  insn per cycle              ( +-  0.09% )
         1,983,822      sched:sched_wake_idle_without_ipi #  131.105 K/sec                       ( +-  0.78% )

        7.566 +- 0.756 seconds time elapsed  ( +- 10.00% )

  We get a decent boost just because we are executing ~20% fewer
  instructions. Not sure how the cpu frequency scaling works in a VM but
  we also run at a higher frequency.

(That specifically applies to guests but that series also adds enables this
with acpi-idle for baremetal.)

(From: https://lore.kernel.org/lkml/877c9zhk68.fsf@oracle.com/)

>> Finally update poll_idle() and resilient queued spinlocks to use them.
>
> Have you identified other suitable sites for conversion?

Haven't found other places in the core kernel where this could be used.
I think one reason is that the typical kernel wait is unbounded.

There are some in drivers/ that have this pattern. For instance I think
this in drivers/iommu/arm/arm-smmu-v3 could be converted:
__arm_smmu_cmdq_poll_until_msi().

However, as David Laight pointed out in this thread
(https://lore.kernel.org/lkml/20260214113122.70627a8b@pumpkin/)
that this would be fine so long as the polling is on memory, but would
need some work to handle MMIO.

>>  Documentation/atomic_t.txt           | 14 +++--
>>  arch/arm64/Kconfig                   |  3 +
>>  arch/arm64/include/asm/barrier.h     | 23 +++++++
>>  arch/arm64/include/asm/cmpxchg.h     | 62 +++++++++++++++----
>>  arch/arm64/include/asm/delay-const.h | 27 +++++++++
>>  arch/arm64/include/asm/rqspinlock.h  | 85 --------------------------
>>  arch/arm64/lib/delay.c               | 15 ++---
>>  drivers/cpuidle/poll_state.c         | 21 +------
>>  drivers/soc/qcom/rpmh-rsc.c          |  8 +--
>>  include/asm-generic/barrier.h        | 90 ++++++++++++++++++++++++++++
>>  include/linux/atomic.h               | 10 ++++
>>  include/linux/atomic/atomic-long.h   | 18 +++---
>>  include/linux/sched/idle.h           | 29 +++++++++
>>  kernel/bpf/rqspinlock.c              | 77 +++++++++++++++---------
>>  scripts/atomic/gen-atomic-long.sh    | 16 +++--
>>  15 files changed, 320 insertions(+), 178 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/delay-const.h
>
> Some sort of testing in lib/tests/ would be appropriate and useful.

Makes sense. Will add.

Thanks
--
ankur

  reply	other threads:[~2026-03-16 22:09 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-16  1:36 [PATCH v10 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
2026-03-16  1:36 ` [PATCH v10 01/12] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
2026-03-16  1:36 ` [PATCH v10 02/12] arm64: barrier: Support smp_cond_load_relaxed_timeout() Ankur Arora
2026-03-16  1:36 ` [PATCH v10 03/12] arm64/delay: move some constants out to a separate header Ankur Arora
2026-03-16  1:36 ` [PATCH v10 04/12] arm64: support WFET in smp_cond_load_relaxed_timeout() Ankur Arora
2026-04-01 10:44   ` Catalin Marinas
2026-04-01 22:31     ` Ankur Arora
2026-03-16  1:36 ` [PATCH v10 05/12] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait() Ankur Arora
2026-03-24  1:41   ` Kumar Kartikeya Dwivedi
2026-03-25  5:58     ` Ankur Arora
2026-03-16  1:36 ` [PATCH v10 06/12] asm-generic: barrier: Add smp_cond_load_acquire_timeout() Ankur Arora
2026-03-16  1:36 ` [PATCH v10 07/12] atomic: Add atomic_cond_read_*_timeout() Ankur Arora
2026-03-16  1:36 ` [PATCH v10 08/12] locking/atomic: scripts: build atomic_long_cond_read_*_timeout() Ankur Arora
2026-03-16  1:36 ` [PATCH v10 09/12] bpf/rqspinlock: switch check_timeout() to a clock interface Ankur Arora
2026-03-24  1:43   ` Kumar Kartikeya Dwivedi
2026-03-25  5:57     ` Ankur Arora
2026-03-16  1:36 ` [PATCH v10 10/12] bpf/rqspinlock: Use smp_cond_load_acquire_timeout() Ankur Arora
2026-03-24  1:46   ` Kumar Kartikeya Dwivedi
2026-03-16  1:36 ` [PATCH v10 11/12] sched: add need-resched timed wait interface Ankur Arora
2026-03-16  1:36 ` [PATCH v10 12/12] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait() Ankur Arora
2026-03-16  1:49 ` [PATCH v10 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Andrew Morton
2026-03-16 22:08   ` Ankur Arora [this message]
2026-03-16 23:37     ` David Laight
2026-03-17  6:53       ` Ankur Arora
2026-03-17  9:17         ` David Laight
2026-03-25 13:53           ` Catalin Marinas
2026-03-25 15:42             ` David Laight
2026-03-25 16:32               ` Catalin Marinas
2026-03-25 20:23                 ` David Laight
2026-03-26 15:39                   ` Catalin Marinas
2026-03-25 15:55       ` Catalin Marinas
2026-03-25 19:36         ` David Laight
2026-04-02  7:01 ` Ankur Arora
2026-04-03 16:12 ` [PATCH v10 00/12] barrier: Add smp_cond_load_{relaxed, acquire}_timeout() Okanovic, Haris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874imftol4.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=ast@kernel.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bpf@vger.kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=david.laight.linux@gmail.com \
    --cc=harisokn@amazon.com \
    --cc=joao.m.martins@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=memxor@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    --cc=zhenglifeng1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.