From: Ankur Arora <ankur.a.arora@oracle.com>
To: Ankur Arora <ankur.a.arora@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, linux-pm@vger.kernel.org,
bpf@vger.kernel.org, arnd@arndb.de, catalin.marinas@arm.com,
will@kernel.org, peterz@infradead.org, akpm@linux-foundation.org,
mark.rutland@arm.com, harisokn@amazon.com, cl@gentwo.org,
ast@kernel.org, rafael@kernel.org, daniel.lezcano@linaro.org,
memxor@gmail.com, zhenglifeng1@huawei.com,
xueshuai@linux.alibaba.com, rdunlap@infradead.org,
david.laight.linux@gmail.com, broonie@kernel.org,
joao.m.martins@oracle.com, boris.ostrovsky@oracle.com,
konrad.wilk@oracle.com, ashok.bhat@arm.com
Subject: Re: [PATCH v12 00/15] barrier: Add smp_cond_load_{relaxed,acquire}_timeout()
Date: Tue, 09 Jun 2026 23:44:32 -0700 [thread overview]
Message-ID: <87h5na6gj3.fsf@oracle.com> (raw)
In-Reply-To: <20260608080440.127491-1-ankur.a.arora@oracle.com>
Summarizing all of the bot reviews (sashiko/bpf-bot):
Most of the comments are same as v11. Let me outline the ones I think
are notable:
- edge cases around (timeout is -1, S64_MAX, U64_MAX).
I've noted in the first patch how these cases are probably best
addressed at review time instead of complicating the implementation
like in https://lore.kernel.org/lkml/874iklm1uy.fsf@oracle.com/
- as a side-effect of enabling ARCH_HAS_CPU_RELAX, acpi_processor_setup_cstate()
enables a NOP poll_idle() unintentionally (patch-2). I've described
it in more detail in my reply to that patch.
Will fix this.
- potentially missed control dependency in the timeout case of
smp_cond_load_acquire_timeout(). Probably need a better fix for
this than I have.
Need more thinktime as the bots would say. Will address this one
and the one below in reply to patches 6, 7.
- possibly torn reads with atomic64_cond_read_*_timeout() on 32-bit
architectures.
Ankur
Ankur Arora <ankur.a.arora@oracle.com> writes:
> Hi,
>
> Main change in this version:
>
> - addressed some review comments from sashiko (see commit notes)
> - The one notable change is to the implementation of
> smp_cond_load_acquire_timeout() where there was a missed
> control dependency in the timeout case.
> All the others are minor.
> - fixed a low probability race in the kunit test added in v11.
> - added a bunch of kunit tests validating the implementation's
> use of the clock.
>
> Andrew, if the changes look okay, could we take this in your mm-nomm
> tree as before?
>
> The core kernel often uses smp_cond_load_{relaxed,acquire}() to spin
> on condition variables with architectural primitives used to avoid
> hammering the relevant cachelines.
>
> (This primitive can vary greatly across architectures: on x86 it's a
> cpu_relax() to slow down the pipeline. On arm64, this is a __cmpwait()
> which waits for a cacheline to change state in a time limited fashion.)
>
> Regardless of architectural details, typical smp_cond_load*() usage
> does not allow for termination until the condition change occurs.
>
> Beyond the core kernel, there are cases where it is useful to additionally
> terminate on a timeout. Two cases:
>
> - cpuidle poll_idle(): wait for need-resched until the cpuidle polling
> duration expires.
>
> - rqspinlock: nested qspinlock acquisition that terminates on timeout
> or deadlock.
>
> Accordingly add two interfaces (with their generic and arm64 specific
> implementations):
>
> smp_cond_load_relaxed_timeout(ptr, cond_expr, time_expr, timeout)
> smp_cond_load_acquire_timeout(ptr, cond_expr, time_expr, timeout)
>
> Also add tif_need_resched_relaxed_wait() which wraps the polling
> pattern and its scheduler specific details in poll_idle().
> In addition add atomic_cond_read_*_timeout(),
> atomic64_cond_read_*_timeout(), and atomic_long wrappers.
>
> Structurally, both the smp_cond_load_*_timeout() interfaces are similar
> to smp_cond_load*(), with the addition of a rate-limited time-check.
>
> Usage
> ==
>
> These interfaces drop straight-forwardly into the rqspinlock logic
> since qspinlock already uses smp_cond_load*(), and the time-check
> extension can now be used for timeout and deadlock handling.
>
> Using tif_need_resched_relaxed_wait() in poll_idle() removes any
> architectural details allowing arm64 to straight-forwardly support
> that path.
> (However, for efficiency reasons cpuidle/poll_state.c continues to
> depend on ARCH_HAS_CPU_RELAX since that is defined on architectures
> with an optimized architectural primitive.)
>
>
> Performance
> ==
>
> Apart from simplifications due to this change, supporting polling in
> cpuidle on arm64 helps improve wakeup latency (needs a few cpuidle/acpi
> patches):
>
>
> # perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions,sched:sched_wake_idle_without_ipi \
> perf bench sched pipe -l 1000000 -c 4
>
> # No haltpoll (and, no TIF_POLLING_NRFLAG):
>
> Performance counter stats for 'CPU(s) 4,5' (5 runs):
>
> 25,229.57 msec task-clock # 2.000 CPUs utilized ( +- 7.75% )
> 45,821,250,284 cycles # 1.816 GHz ( +- 10.07% )
> 26,557,496,665 instructions # 0.58 insn per cycle ( +- 0.21% )
> 0 sched:sched_wake_idle_without_ipi # 0.000 /sec
>
> 12.615 +- 0.977 seconds time elapsed ( +- 7.75% )
>
>
> # Haltpoll:
>
> Performance counter stats for 'CPU(s) 4,5' (5 runs):
>
> 15,131.58 msec task-clock # 2.000 CPUs utilized ( +- 10.00% )
> 34,158,188,839 cycles # 2.257 GHz ( +- 6.91% )
> 20,824,950,916 instructions # 0.61 insn per cycle ( +- 0.09% )
> 1,983,822 sched:sched_wake_idle_without_ipi # 131.105 K/sec ( +- 0.78% )
>
> 7.566 +- 0.756 seconds time elapsed ( +- 10.00% )
>
> We get improved latency because we don't switch in and out of a
> deeper sleep state or from the hypervisor. This also causes us to
> execute ~20% fewer instructions.
>
>
> Haris Okanovic also saw improvement in real workloads due to the
> cpuidle changes: "observed 4-6% improvements in memcahed, cassandra,
> mysql, and postgresql under certain loads. Other applications likely
> benefit too." [12]
>
>
> Changelog:
> v11 [13] (as listed above):
> - addressed some review comments from sashiko (see commit notes)
> - The one notable change is to the implementation of
> smp_cond_load_acquire_timeout() where there was a missed
> control dependency in the timeout case.
> All the others are minor.
> - fixed a low probability race in the kunit test added in v11.
> - added a bunch of kunit tests validating the implementation's
> use of the clock.
>
> v10 [10]:
> - add a comment mentioning that smp_cond_load_relaxed_timeout() might
> be using architectural primitives that don't support MMIO.
> (David Laight, Catalin Marinas)
> - added a kunit test for smp_cond_load_relaxed_timeout() (Andrew
> Morton.)
>
> v9 [9]:
> - s/@cond/@cond_expr/ (Randy Dunlap)
> - Clarify that SMP_TIMEOUT_POLL_COUNT is only around memory
> addresses. (David Laight)
> - Add the missing config ARCH_HAS_CPU_RELAX in arch/arm64/Kconfig.
> (Catalin Marinas).
> - Switch to arch_counter_get_cntvct_stable() (via __delay_cycles())
> in the cmpwait path instead of using arch_timer_read_counter().
> (Catalin Marinas)
>
> v8 [0]:
> - Defer evaluation of @time_expr_ns to when we hit the slowpath.
> (comment from Alexei Starovoitov).
>
> - Mention that cpu_poll_relax() is better than raw CPU polling
> only where ARCH_HAS_CPU_RELAX is defined.
> - also define ARCH_HAS_CPU_RELAX for arm64.
> (Came out of a discussion with Will Deacon.)
>
> - Split out WFET and WFE handling. I was doing both of these
> in a common handler.
> (From Will Deacon and in an earlier revision by Catalin Marinas.)
>
> - Add mentions of atomic_cond_read_{relaxed,acquire}(),
> atomic_cond_read_{relaxed,acquire}_timeout() in
> Documentation/atomic_t.txt.
>
> - Use the BIT() macro to do the checking in tif_bitset_relaxed_wait().
>
> - Cleanup unnecessary assignments, casts etc in poll_idle().
> (From Rafael Wysocki.)
>
> - Fixup warnings from kernel build robot
>
>
> v7 [1]:
> - change the interface to separately provide the timeout. This is
> useful for supporting WFET and similar primitives which can do
> timed waiting (suggested by Arnd Bergmann).
>
> - Adapting rqspinlock code to this changed interface also
> necessitated allowing time_expr to fail.
> - rqspinlock changes to adapt to the new smp_cond_load_acquire_timeout().
>
> - add WFET support (suggested by Arnd Bergmann).
> - add support for atomic-long wrappers.
> - add a new scheduler interface tif_need_resched_relaxed_wait() which
> encapsulates the polling logic used by poll_idle().
> - interface suggested by (Rafael J. Wysocki).
>
>
> v6 [2]:
> - fixup missing timeout parameters in atomic64_cond_read_*_timeout()
> - remove a race between setting of TIF_NEED_RESCHED and the call to
> smp_cond_load_relaxed_timeout(). This would mean that dev->poll_time_limit
> would be set even if we hadn't spent any time waiting.
> (The original check compared against local_clock(), which would have been
> fine, but I was instead using a cheaper check against _TIF_NEED_RESCHED.)
> (Both from meta-CI bot)
>
>
> v5 [3]:
> - use cpu_poll_relax() instead of cpu_relax().
> - instead of defining an arm64 specific
> smp_cond_load_relaxed_timeout(), just define the appropriate
> cpu_poll_relax().
> - re-read the target pointer when we exit due to the time-check.
> - s/SMP_TIMEOUT_SPIN_COUNT/SMP_TIMEOUT_POLL_COUNT/
> (Suggested by Will Deacon)
>
> - add atomic_cond_read_*_timeout() and atomic64_cond_read_*_timeout()
> interfaces.
> - rqspinlock: use atomic_cond_read_acquire_timeout().
> - cpuidle: use smp_cond_load_relaxed_tiemout() for polling.
> (Suggested by Catalin Marinas)
>
> - rqspinlock: define SMP_TIMEOUT_POLL_COUNT to be 16k for non arm64
>
>
> v4 [4]:
> - naming change 's/timewait/timeout/'
> - resilient spinlocks: get rid of res_smp_cond_load_acquire_waiting()
> and fixup use of RES_CHECK_TIMEOUT().
> (Both suggested by Catalin Marinas)
>
> v3 [5]:
> - further interface simplifications (suggested by Catalin Marinas)
>
> v2 [6]:
> - simplified the interface (suggested by Catalin Marinas)
> - get rid of wait_policy, and a multitude of constants
> - adds a slack parameter
> This helped remove a fair amount of duplicated code duplication and in
> hindsight unnecessary constants.
>
> v1 [7]:
> - add wait_policy (coarse and fine)
> - derive spin-count etc at runtime instead of using arbitrary
> constants.
>
> Haris Okanovic tested v4 of this series with poll_idle()/haltpoll patches. [8]
>
> Comments appreciated!
>
> Thanks
> Ankur
>
> [0] https://lore.kernel.org/lkml/20251215044919.460086-1-ankur.a.arora@oracle.com/
> [1] https://lore.kernel.org/lkml/20251028053136.692462-1-ankur.a.arora@oracle.com/
> [2] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@oracle.com/
> [3] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@oracle.com/
> [4] https://lore.kernel.org/lkml/20250829080735.3598416-1-ankur.a.arora@oracle.com/
> [5] https://lore.kernel.org/lkml/20250627044805.945491-1-ankur.a.arora@oracle.com/
> [6] https://lore.kernel.org/lkml/20250502085223.1316925-1-ankur.a.arora@oracle.com/
> [7] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com/
> [8] https://lore.kernel.org/lkml/2cecbf7fb23ee83a4ce027e1be3f46f97efd585c.camel@amazon.com/
> [9] https://lore.kernel.org/lkml/20260209023153.2661784-1-ankur.a.arora@oracle.com/
> [10] https://lore.kernel.org/lkml/20260316013651.3225328-1-ankur.a.arora@oracle.com/
> [11] https://lore.kernel.org/lkml/20230809134837.GM212435@hirez.programming.kicks-ass.net/
> [12] https://lore.kernel.org/lkml/c6f3c8d3f1f2e89a9dc7ae22482973b5a51b08cb.camel@amazon.com/
>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: bpf@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-pm@vger.kernel.org
>
> Ankur Arora (15):
> asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
> arm64: barrier: Support smp_cond_load_relaxed_timeout()
> arm64/delay: move some constants out to a separate header
> arm64: support WFET in smp_cond_load_relaxed_timeout()
> arm64: rqspinlock: Remove private copy of
> smp_cond_load_acquire_timewait()
> asm-generic: barrier: Add smp_cond_load_acquire_timeout()
> atomic: Add atomic_cond_read_*_timeout()
> locking/atomic: scripts: build atomic_long_cond_read_*_timeout()
> bpf/rqspinlock: switch check_timeout() to a clock interface
> bpf/rqspinlock: Use smp_cond_load_acquire_timeout()
> sched: add need-resched timed wait interface
> cpuidle/poll_state: Wait for need-resched via
> tif_need_resched_relaxed_wait()
> arm64/delay: enable testing smp_cond_load_relaxed_timeout()
> barrier: add tests for smp_cond_load_*_timeout()
> barrier: add clock tests for smp_cond_load_relaxed_timeout()
>
> Documentation/atomic_t.txt | 14 +-
> arch/arm64/Kconfig | 3 +
> arch/arm64/include/asm/barrier.h | 23 ++++
> arch/arm64/include/asm/cmpxchg.h | 62 +++++++--
> arch/arm64/include/asm/delay-const.h | 28 ++++
> arch/arm64/include/asm/rqspinlock.h | 85 ------------
> arch/arm64/lib/delay.c | 17 +--
> drivers/clocksource/arm_arch_timer.c | 2 +
> drivers/cpuidle/poll_state.c | 21 +--
> drivers/soc/qcom/rpmh-rsc.c | 8 +-
> include/asm-generic/barrier.h | 97 ++++++++++++++
> include/linux/atomic.h | 10 ++
> include/linux/atomic/atomic-long.h | 18 ++-
> include/linux/sched/idle.h | 29 +++++
> kernel/bpf/rqspinlock.c | 77 +++++++----
> lib/Kconfig.debug | 10 ++
> lib/tests/Makefile | 1 +
> lib/tests/barrier-timeout-test.c | 185 +++++++++++++++++++++++++++
> scripts/atomic/gen-atomic-long.sh | 16 ++-
> 19 files changed, 528 insertions(+), 178 deletions(-)
> create mode 100644 arch/arm64/include/asm/delay-const.h
> create mode 100644 lib/tests/barrier-timeout-test.c
--
ankur
prev parent reply other threads:[~2026-06-10 6:45 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-08 8:04 [PATCH v12 00/15] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
2026-06-08 8:04 ` [PATCH v12 01/15] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
2026-06-08 8:25 ` sashiko-bot
2026-06-08 8:53 ` bot+bpf-ci
2026-06-08 8:04 ` [PATCH v12 02/15] arm64: barrier: Support smp_cond_load_relaxed_timeout() Ankur Arora
2026-06-08 8:31 ` sashiko-bot
2026-06-08 8:53 ` bot+bpf-ci
2026-06-10 6:32 ` Ankur Arora
2026-06-08 8:04 ` [PATCH v12 03/15] arm64/delay: move some constants out to a separate header Ankur Arora
2026-06-08 8:22 ` sashiko-bot
2026-06-08 8:04 ` [PATCH v12 04/15] arm64: support WFET in smp_cond_load_relaxed_timeout() Ankur Arora
2026-06-08 8:27 ` sashiko-bot
2026-06-08 8:04 ` [PATCH v12 05/15] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait() Ankur Arora
2026-06-08 8:19 ` sashiko-bot
2026-06-08 8:53 ` bot+bpf-ci
2026-06-08 8:04 ` [PATCH v12 06/15] asm-generic: barrier: Add smp_cond_load_acquire_timeout() Ankur Arora
2026-06-08 8:27 ` sashiko-bot
2026-06-08 8:04 ` [PATCH v12 07/15] atomic: Add atomic_cond_read_*_timeout() Ankur Arora
2026-06-08 8:23 ` sashiko-bot
2026-06-08 8:04 ` [PATCH v12 08/15] locking/atomic: scripts: build atomic_long_cond_read_*_timeout() Ankur Arora
2026-06-08 8:04 ` [PATCH v12 09/15] bpf/rqspinlock: switch check_timeout() to a clock interface Ankur Arora
2026-06-08 8:04 ` [PATCH v12 10/15] bpf/rqspinlock: Use smp_cond_load_acquire_timeout() Ankur Arora
2026-06-08 9:04 ` bot+bpf-ci
2026-06-08 8:04 ` [PATCH v12 11/15] sched: add need-resched timed wait interface Ankur Arora
2026-06-08 8:04 ` [PATCH v12 12/15] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait() Ankur Arora
2026-06-08 8:31 ` sashiko-bot
2026-06-08 8:04 ` [PATCH v12 13/15] arm64/delay: enable testing smp_cond_load_relaxed_timeout() Ankur Arora
2026-06-08 8:32 ` sashiko-bot
2026-06-08 8:04 ` [PATCH v12 14/15] barrier: add tests for smp_cond_load_*_timeout() Ankur Arora
2026-06-08 8:04 ` [PATCH v12 15/15] barrier: add clock tests for smp_cond_load_relaxed_timeout() Ankur Arora
2026-06-08 8:34 ` sashiko-bot
2026-06-10 6:44 ` Ankur Arora [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h5na6gj3.fsf@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=ashok.bhat@arm.com \
--cc=ast@kernel.org \
--cc=boris.ostrovsky@oracle.com \
--cc=bpf@vger.kernel.org \
--cc=broonie@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=daniel.lezcano@linaro.org \
--cc=david.laight.linux@gmail.com \
--cc=harisokn@amazon.com \
--cc=joao.m.martins@oracle.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=memxor@gmail.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=will@kernel.org \
--cc=xueshuai@linux.alibaba.com \
--cc=zhenglifeng1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.