The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Ankur Arora <ankur.a.arora@oracle.com>
To: Ankur Arora <ankur.a.arora@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-pm@vger.kernel.org,
	bpf@vger.kernel.org, arnd@arndb.de, catalin.marinas@arm.com,
	will@kernel.org, peterz@infradead.org, akpm@linux-foundation.org,
	mark.rutland@arm.com, harisokn@amazon.com, cl@gentwo.org,
	ast@kernel.org, rafael@kernel.org, daniel.lezcano@linaro.org,
	memxor@gmail.com, zhenglifeng1@huawei.com,
	xueshuai@linux.alibaba.com, rdunlap@infradead.org,
	david.laight.linux@gmail.com, broonie@kernel.org,
	joao.m.martins@oracle.com, boris.ostrovsky@oracle.com,
	konrad.wilk@oracle.com, ashok.bhat@arm.com
Subject: Re: [PATCH v12 00/15] barrier: Add smp_cond_load_{relaxed,acquire}_timeout()
Date: Tue, 09 Jun 2026 23:44:32 -0700	[thread overview]
Message-ID: <87h5na6gj3.fsf@oracle.com> (raw)
In-Reply-To: <20260608080440.127491-1-ankur.a.arora@oracle.com>


Summarizing all of the bot reviews (sashiko/bpf-bot):

Most of the comments are same as v11. Let me outline the ones I think
are notable:

  - edge cases around (timeout is -1, S64_MAX, U64_MAX).

    I've noted in the first patch how these cases are probably best
    addressed at review time instead of complicating the implementation
    like in https://lore.kernel.org/lkml/874iklm1uy.fsf@oracle.com/

  - as a side-effect of enabling ARCH_HAS_CPU_RELAX, acpi_processor_setup_cstate()
    enables a NOP poll_idle() unintentionally (patch-2). I've described
    it in more detail in my reply to that patch.

    Will fix this.

  - potentially missed control dependency in the timeout case of
    smp_cond_load_acquire_timeout(). Probably need a better fix for
    this than I have.

    Need more thinktime as the bots would say.  Will address this one
    and the one below in reply to patches 6, 7.

  - possibly torn reads with atomic64_cond_read_*_timeout() on 32-bit
    architectures.

Ankur

Ankur Arora <ankur.a.arora@oracle.com> writes:

> Hi,
>
> Main change in this version:
>
>   - addressed some review comments from sashiko (see commit notes)
>     - The one notable change is to the implementation of
>       smp_cond_load_acquire_timeout() where there was a missed
>       control dependency in the timeout case.
>       All the others are minor.
>   - fixed a low probability race in the kunit test added in v11.
>   - added a bunch of kunit tests validating the implementation's
>     use of the clock.
>
> Andrew, if the changes look okay, could we take this in your mm-nomm
> tree as before?
>
> The core kernel often uses smp_cond_load_{relaxed,acquire}() to spin
> on condition variables with architectural primitives used to avoid
> hammering the relevant cachelines.
>
> (This primitive can vary greatly across architectures: on x86 it's a
> cpu_relax() to slow down the pipeline. On arm64, this is a __cmpwait()
> which waits for a cacheline to change state in a time limited fashion.)
>
> Regardless of architectural details, typical smp_cond_load*() usage
> does not allow for termination until the condition change occurs.
>
> Beyond the core kernel, there are cases where it is useful to additionally
> terminate on a timeout. Two cases:
>
>   - cpuidle poll_idle(): wait for need-resched until the cpuidle polling
>     duration expires.
>
>   - rqspinlock: nested qspinlock acquisition that terminates on timeout
>     or deadlock.
>
> Accordingly add two interfaces (with their generic and arm64 specific
> implementations):
>
>    smp_cond_load_relaxed_timeout(ptr, cond_expr, time_expr, timeout)
>    smp_cond_load_acquire_timeout(ptr, cond_expr, time_expr, timeout)
>
> Also add tif_need_resched_relaxed_wait() which wraps the polling
> pattern and its scheduler specific details in poll_idle().
> In addition add atomic_cond_read_*_timeout(),
> atomic64_cond_read_*_timeout(), and atomic_long wrappers.
>
> Structurally, both the smp_cond_load_*_timeout() interfaces are similar
> to smp_cond_load*(), with the addition of a rate-limited time-check.
>
> Usage
> ==
>
> These interfaces drop straight-forwardly into the rqspinlock logic
> since qspinlock already uses smp_cond_load*(), and the time-check
> extension can now be used for timeout and deadlock handling.
>
> Using tif_need_resched_relaxed_wait() in poll_idle() removes any
> architectural details allowing arm64 to straight-forwardly support
> that path.
> (However, for efficiency reasons cpuidle/poll_state.c continues to
> depend on ARCH_HAS_CPU_RELAX since that is defined on architectures
> with an optimized architectural primitive.)
>
>
> Performance
> ==
>
> Apart from simplifications due to this change, supporting polling in
> cpuidle on arm64 helps improve wakeup latency (needs a few cpuidle/acpi
> patches):
>
>
>   # perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions,sched:sched_wake_idle_without_ipi \
>   perf bench sched pipe -l 1000000 -c 4
>
>   # No haltpoll (and, no TIF_POLLING_NRFLAG):
>
>   Performance counter stats for 'CPU(s) 4,5' (5 runs):
>
>          25,229.57 msec task-clock                       #    2.000 CPUs utilized               ( +-  7.75% )
>     45,821,250,284      cycles                           #    1.816 GHz                         ( +- 10.07% )
>     26,557,496,665      instructions                     #    0.58  insn per cycle              ( +-  0.21% )
>                  0      sched:sched_wake_idle_without_ipi #    0.000 /sec
>
>        12.615 +- 0.977 seconds time elapsed  ( +-  7.75% )
>
>
>   # Haltpoll:
>
>   Performance counter stats for 'CPU(s) 4,5' (5 runs):
>
>          15,131.58 msec task-clock                       #    2.000 CPUs utilized               ( +- 10.00% )
>     34,158,188,839      cycles                           #    2.257 GHz                         ( +-  6.91% )
>     20,824,950,916      instructions                     #    0.61  insn per cycle              ( +-  0.09% )
>          1,983,822      sched:sched_wake_idle_without_ipi #  131.105 K/sec                       ( +-  0.78% )
>
>         7.566 +- 0.756 seconds time elapsed  ( +- 10.00% )
>
>   We get improved latency because we don't switch in and out of a
>   deeper sleep state or from the hypervisor. This also causes us to
>   execute ~20% fewer instructions.
>
>
> Haris Okanovic also saw improvement in real workloads due to the
> cpuidle changes: "observed 4-6% improvements in memcahed, cassandra,
> mysql, and postgresql under certain loads. Other applications likely
> benefit too." [12]
>
>
> Changelog:
>   v11 [13] (as listed above):
>     - addressed some review comments from sashiko (see commit notes)
>       - The one notable change is to the implementation of
>         smp_cond_load_acquire_timeout() where there was a missed
>         control dependency in the timeout case.
>       All the others are minor.
>     - fixed a low probability race in the kunit test added in v11.
>     - added a bunch of kunit tests validating the implementation's
>       use of the clock.
>
>   v10 [10]:
>    - add a comment mentioning that smp_cond_load_relaxed_timeout() might
>      be using architectural primitives that don't support MMIO.
>      (David Laight, Catalin Marinas)
>    - added a kunit test for smp_cond_load_relaxed_timeout() (Andrew
>      Morton.)
>
>   v9 [9]:
>    - s/@cond/@cond_expr/ (Randy Dunlap)
>    - Clarify that SMP_TIMEOUT_POLL_COUNT is only around memory
>      addresses. (David Laight)
>    - Add the missing config ARCH_HAS_CPU_RELAX in arch/arm64/Kconfig.
>      (Catalin Marinas).
>    - Switch to arch_counter_get_cntvct_stable() (via __delay_cycles())
>      in the cmpwait path instead of using arch_timer_read_counter().
>      (Catalin Marinas)
>
>   v8 [0]:
>    - Defer evaluation of @time_expr_ns to when we hit the slowpath.
>       (comment from Alexei Starovoitov).
>
>    - Mention that cpu_poll_relax() is better than raw CPU polling
>      only where ARCH_HAS_CPU_RELAX is defined.
>      - also define ARCH_HAS_CPU_RELAX for arm64.
>       (Came out of a discussion with Will Deacon.)
>
>    - Split out WFET and WFE handling. I was doing both of these
>      in a common handler.
>      (From Will Deacon and in an earlier revision by Catalin Marinas.)
>
>    - Add mentions of atomic_cond_read_{relaxed,acquire}(),
>      atomic_cond_read_{relaxed,acquire}_timeout() in
>      Documentation/atomic_t.txt.
>
>    - Use the BIT() macro to do the checking in tif_bitset_relaxed_wait().
>
>    - Cleanup unnecessary assignments, casts etc in poll_idle().
>      (From Rafael Wysocki.)
>
>    - Fixup warnings from kernel build robot
>
>
>   v7 [1]:
>    - change the interface to separately provide the timeout. This is
>      useful for supporting WFET and similar primitives which can do
>      timed waiting (suggested by Arnd Bergmann).
>
>    - Adapting rqspinlock code to this changed interface also
>      necessitated allowing time_expr to fail.
>    - rqspinlock changes to adapt to the new smp_cond_load_acquire_timeout().
>
>    - add WFET support (suggested by Arnd Bergmann).
>    - add support for atomic-long wrappers.
>    - add a new scheduler interface tif_need_resched_relaxed_wait() which
>      encapsulates the polling logic used by poll_idle().
>      - interface suggested by (Rafael J. Wysocki).
>
>
>   v6 [2]:
>    - fixup missing timeout parameters in atomic64_cond_read_*_timeout()
>    - remove a race between setting of TIF_NEED_RESCHED and the call to
>      smp_cond_load_relaxed_timeout(). This would mean that dev->poll_time_limit
>      would be set even if we hadn't spent any time waiting.
>      (The original check compared against local_clock(), which would have been
>      fine, but I was instead using a cheaper check against _TIF_NEED_RESCHED.)
>    (Both from meta-CI bot)
>
>
>   v5 [3]:
>    - use cpu_poll_relax() instead of cpu_relax().
>    - instead of defining an arm64 specific
>      smp_cond_load_relaxed_timeout(), just define the appropriate
>      cpu_poll_relax().
>    - re-read the target pointer when we exit due to the time-check.
>    - s/SMP_TIMEOUT_SPIN_COUNT/SMP_TIMEOUT_POLL_COUNT/
>    (Suggested by Will Deacon)
>
>    - add atomic_cond_read_*_timeout() and atomic64_cond_read_*_timeout()
>      interfaces.
>    - rqspinlock: use atomic_cond_read_acquire_timeout().
>    - cpuidle: use smp_cond_load_relaxed_tiemout() for polling.
>    (Suggested by Catalin Marinas)
>
>    - rqspinlock: define SMP_TIMEOUT_POLL_COUNT to be 16k for non arm64
>
>
>   v4 [4]:
>     - naming change 's/timewait/timeout/'
>     - resilient spinlocks: get rid of res_smp_cond_load_acquire_waiting()
>       and fixup use of RES_CHECK_TIMEOUT().
>     (Both suggested by Catalin Marinas)
>
>   v3 [5]:
>     - further interface simplifications (suggested by Catalin Marinas)
>
>   v2 [6]:
>     - simplified the interface (suggested by Catalin Marinas)
>        - get rid of wait_policy, and a multitude of constants
>        - adds a slack parameter
>       This helped remove a fair amount of duplicated code duplication and in
>       hindsight unnecessary constants.
>
>   v1 [7]:
>      - add wait_policy (coarse and fine)
>      - derive spin-count etc at runtime instead of using arbitrary
>        constants.
>
> Haris Okanovic tested v4 of this series with poll_idle()/haltpoll patches. [8]
>
> Comments appreciated!
>
> Thanks
> Ankur
>
>  [0] https://lore.kernel.org/lkml/20251215044919.460086-1-ankur.a.arora@oracle.com/
>  [1] https://lore.kernel.org/lkml/20251028053136.692462-1-ankur.a.arora@oracle.com/
>  [2] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@oracle.com/
>  [3] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@oracle.com/
>  [4] https://lore.kernel.org/lkml/20250829080735.3598416-1-ankur.a.arora@oracle.com/
>  [5] https://lore.kernel.org/lkml/20250627044805.945491-1-ankur.a.arora@oracle.com/
>  [6] https://lore.kernel.org/lkml/20250502085223.1316925-1-ankur.a.arora@oracle.com/
>  [7] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com/
>  [8] https://lore.kernel.org/lkml/2cecbf7fb23ee83a4ce027e1be3f46f97efd585c.camel@amazon.com/
>  [9] https://lore.kernel.org/lkml/20260209023153.2661784-1-ankur.a.arora@oracle.com/
>  [10] https://lore.kernel.org/lkml/20260316013651.3225328-1-ankur.a.arora@oracle.com/
>  [11] https://lore.kernel.org/lkml/20230809134837.GM212435@hirez.programming.kicks-ass.net/
>  [12] https://lore.kernel.org/lkml/c6f3c8d3f1f2e89a9dc7ae22482973b5a51b08cb.camel@amazon.com/
>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: bpf@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-pm@vger.kernel.org
>
> Ankur Arora (15):
>   asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
>   arm64: barrier: Support smp_cond_load_relaxed_timeout()
>   arm64/delay: move some constants out to a separate header
>   arm64: support WFET in smp_cond_load_relaxed_timeout()
>   arm64: rqspinlock: Remove private copy of
>     smp_cond_load_acquire_timewait()
>   asm-generic: barrier: Add smp_cond_load_acquire_timeout()
>   atomic: Add atomic_cond_read_*_timeout()
>   locking/atomic: scripts: build atomic_long_cond_read_*_timeout()
>   bpf/rqspinlock: switch check_timeout() to a clock interface
>   bpf/rqspinlock: Use smp_cond_load_acquire_timeout()
>   sched: add need-resched timed wait interface
>   cpuidle/poll_state: Wait for need-resched via
>     tif_need_resched_relaxed_wait()
>   arm64/delay: enable testing smp_cond_load_relaxed_timeout()
>   barrier: add tests for smp_cond_load_*_timeout()
>   barrier: add clock tests for smp_cond_load_relaxed_timeout()
>
>  Documentation/atomic_t.txt           |  14 +-
>  arch/arm64/Kconfig                   |   3 +
>  arch/arm64/include/asm/barrier.h     |  23 ++++
>  arch/arm64/include/asm/cmpxchg.h     |  62 +++++++--
>  arch/arm64/include/asm/delay-const.h |  28 ++++
>  arch/arm64/include/asm/rqspinlock.h  |  85 ------------
>  arch/arm64/lib/delay.c               |  17 +--
>  drivers/clocksource/arm_arch_timer.c |   2 +
>  drivers/cpuidle/poll_state.c         |  21 +--
>  drivers/soc/qcom/rpmh-rsc.c          |   8 +-
>  include/asm-generic/barrier.h        |  97 ++++++++++++++
>  include/linux/atomic.h               |  10 ++
>  include/linux/atomic/atomic-long.h   |  18 ++-
>  include/linux/sched/idle.h           |  29 +++++
>  kernel/bpf/rqspinlock.c              |  77 +++++++----
>  lib/Kconfig.debug                    |  10 ++
>  lib/tests/Makefile                   |   1 +
>  lib/tests/barrier-timeout-test.c     | 185 +++++++++++++++++++++++++++
>  scripts/atomic/gen-atomic-long.sh    |  16 ++-
>  19 files changed, 528 insertions(+), 178 deletions(-)
>  create mode 100644 arch/arm64/include/asm/delay-const.h
>  create mode 100644 lib/tests/barrier-timeout-test.c


--
ankur

      parent reply	other threads:[~2026-06-10  6:45 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08  8:04 [PATCH v12 00/15] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
2026-06-08  8:04 ` [PATCH v12 01/15] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
2026-06-08  8:53   ` bot+bpf-ci
2026-06-08  8:04 ` [PATCH v12 02/15] arm64: barrier: Support smp_cond_load_relaxed_timeout() Ankur Arora
2026-06-08  8:53   ` bot+bpf-ci
2026-06-10  6:32     ` Ankur Arora
2026-06-08  8:04 ` [PATCH v12 03/15] arm64/delay: move some constants out to a separate header Ankur Arora
2026-06-08  8:04 ` [PATCH v12 04/15] arm64: support WFET in smp_cond_load_relaxed_timeout() Ankur Arora
2026-06-08  8:04 ` [PATCH v12 05/15] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait() Ankur Arora
2026-06-08  8:53   ` bot+bpf-ci
2026-06-08  8:04 ` [PATCH v12 06/15] asm-generic: barrier: Add smp_cond_load_acquire_timeout() Ankur Arora
2026-06-08  8:04 ` [PATCH v12 07/15] atomic: Add atomic_cond_read_*_timeout() Ankur Arora
2026-06-08  8:04 ` [PATCH v12 08/15] locking/atomic: scripts: build atomic_long_cond_read_*_timeout() Ankur Arora
2026-06-08  8:04 ` [PATCH v12 09/15] bpf/rqspinlock: switch check_timeout() to a clock interface Ankur Arora
2026-06-08  8:04 ` [PATCH v12 10/15] bpf/rqspinlock: Use smp_cond_load_acquire_timeout() Ankur Arora
2026-06-08  9:04   ` bot+bpf-ci
2026-06-08  8:04 ` [PATCH v12 11/15] sched: add need-resched timed wait interface Ankur Arora
2026-06-08  8:04 ` [PATCH v12 12/15] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait() Ankur Arora
2026-06-08  8:04 ` [PATCH v12 13/15] arm64/delay: enable testing smp_cond_load_relaxed_timeout() Ankur Arora
2026-06-08  8:04 ` [PATCH v12 14/15] barrier: add tests for smp_cond_load_*_timeout() Ankur Arora
2026-06-08  8:04 ` [PATCH v12 15/15] barrier: add clock tests for smp_cond_load_relaxed_timeout() Ankur Arora
2026-06-10  6:44 ` Ankur Arora [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h5na6gj3.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=ashok.bhat@arm.com \
    --cc=ast@kernel.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bpf@vger.kernel.org \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=david.laight.linux@gmail.com \
    --cc=harisokn@amazon.com \
    --cc=joao.m.martins@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=memxor@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    --cc=zhenglifeng1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox