From: David Laight <david.laight.linux@gmail.com>
To: Marco Elver <elver@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Will Deacon <will@kernel.org>, Ingo Molnar <mingo@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Boqun Feng <boqun.feng@gmail.com>,
Waiman Long <longman@redhat.com>,
Bart Van Assche <bvanassche@acm.org>,
llvm@lists.linux.dev, Catalin Marinas <catalin.marinas@arm.com>,
Arnd Bergmann <arnd@arndb.de>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
Date: Fri, 30 Jan 2026 15:11:30 +0000 [thread overview]
Message-ID: <20260130151130.6026999b@pumpkin> (raw)
In-Reply-To: <20260130132951.2714396-3-elver@google.com>
On Fri, 30 Jan 2026 14:28:25 +0100
Marco Elver <elver@google.com> wrote:
> Rework arm64 LTO __READ_ONCE() to improve code generation as follows:
>
> 1. Replace _Generic-based __unqual_scalar_typeof() with more complete
> __rwonce_typeof_unqual(). This strips qualifiers from all types, not
> just integer types, which is required to be able to assign (must be
> non-const) to __u.__val in the non-atomic case (required for #2).
>
> Once our minimum compiler versions are bumped, this just becomes
> TYPEOF_UNQUAL() (or typeof_unqual() should we decide to adopt C23
> naming). Sadly the fallback version of __rwonce_typeof_unqual() cannot
> be used as a general TYPEOF_UNQUAL() fallback (see code comments).
>
> One subtle point here is that non-integer types of __val could be const
> or volatile within the union with the old __unqual_scalar_typeof(), if
> the passed variable is const or volatile. This would then result in a
> forced load from the stack if __u.__val is volatile; in the case of
> const, it does look odd if the underlying storage changes, but the
> compiler is told said member is "const" -- it smells like UB.
>
> 2. Eliminate the atomic flag and ternary conditional expression. Move
> the fallback volatile load into the default case of the switch,
> ensuring __u is unconditionally initialized across all paths.
> The statement expression now unconditionally returns __u.__val.
>
> This refactoring appears to help the compiler improve (or fix) code
> generation.
>
> With a defconfig + LTO + debug options builds, we observe different
> codegen for the following functions:
>
> btrfs_reclaim_sweep (708 -> 1032 bytes)
> btrfs_sinfo_bg_reclaim_threshold_store (200 -> 204 bytes)
> check_mem_access (3652 -> 3692 bytes) [inlined bpf_map_is_rdonly]
> console_flush_all (1268 -> 1264 bytes)
> console_lock_spinning_disable_and_check (180 -> 176 bytes)
> igb_add_filter (640 -> 636 bytes)
> igb_config_tx_modes (2404 -> 2400 bytes)
> kvm_vcpu_on_spin (480 -> 476 bytes)
> map_freeze (376 -> 380 bytes)
> netlink_bind (1664 -> 1656 bytes)
> nmi_cpu_backtrace (404 -> 400 bytes)
> set_rps_cpu (516 -> 520 bytes)
> swap_cluster_readahead (944 -> 932 bytes)
> tcp_accecn_third_ack (328 -> 336 bytes)
> tcp_create_openreq_child (1764 -> 1772 bytes)
> tcp_data_queue (5784 -> 5892 bytes)
> tcp_ecn_rcv_synack (620 -> 628 bytes)
> xen_manage_runstate_time (944 -> 896 bytes)
> xen_steal_clock (340 -> 296 bytes)
>
> Increase of some functions are due to more aggressive inlining due to
> better codegen (in this build, e.g. bpf_map_is_rdonly is no longer
> present due to being inlined completely).
>
> Signed-off-by: Marco Elver <elver@google.com>
Having most of the comment in the commit message and a short one in the
code look good.
I think it will also fix a 'bleat' from min() about a signed v unsigned compare.
The ?: causes 'u8' to be promoted to 'int' with the expected outcome.
Reviewed-by: David Laight <david.laight.linux>@gmail.com
> ---
> v3:
> * Comment.
>
> v2:
> * Add __rwonce_typeof_unqual() as fallback for old compilers.
> ---
> arch/arm64/include/asm/rwonce.h | 21 +++++++++++++++++----
> 1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> index fc0fb42b0b64..42c9e8429274 100644
> --- a/arch/arm64/include/asm/rwonce.h
> +++ b/arch/arm64/include/asm/rwonce.h
> @@ -19,6 +19,20 @@
> "ldapr" #sfx "\t" #regs, \
> ARM64_HAS_LDAPR)
>
> +#ifdef USE_TYPEOF_UNQUAL
> +#define __rwonce_typeof_unqual(x) TYPEOF_UNQUAL(x)
> +#else
> +/*
> + * Fallback for older compilers (Clang < 19).
> + *
> + * Uses the fact that, for all supported Clang versions, 'auto' correctly drops
> + * qualifiers. Unlike typeof_unqual(), the type must be completely defined, i.e.
> + * no forward-declared struct pointer dereferences. The array-to-pointer decay
> + * case does not matter for usage in READ_ONCE() either.
> + */
> +#define __rwonce_typeof_unqual(x) typeof(({ auto ____t = (x); ____t; }))
> +#endif
> +
> /*
> * When building with LTO, there is an increased risk of the compiler
> * converting an address dependency headed by a READ_ONCE() invocation
> @@ -32,8 +46,7 @@
> #define __READ_ONCE(x) \
> ({ \
> typeof(&(x)) __x = &(x); \
> - int atomic = 1; \
> - union { __unqual_scalar_typeof(*__x) __val; char __c[1]; } __u; \
> + union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u; \
> switch (sizeof(x)) { \
> case 1: \
> asm volatile(__LOAD_RCPC(b, %w0, %1) \
> @@ -56,9 +69,9 @@
> : "Q" (*__x) : "memory"); \
> break; \
> default: \
> - atomic = 0; \
> + __u.__val = *(volatile typeof(*__x) *)__x; \
> } \
> - atomic ? (typeof(*__x))__u.__val : (*(volatile typeof(*__x) *)__x);\
> + __u.__val; \
> })
>
> #endif /* !BUILD_VDSO */
next prev parent reply other threads:[~2026-01-30 15:11 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-30 13:28 [PATCH v3 0/3] arm64: Fixes for __READ_ONCE() with CONFIG_LTO=y Marco Elver
2026-01-30 13:28 ` [PATCH v3 1/3] arm64: Fix non-atomic " Marco Elver
2026-01-30 15:06 ` David Laight
2026-01-30 13:28 ` [PATCH v3 2/3] arm64: Optimize " Marco Elver
2026-01-30 15:11 ` David Laight [this message]
2026-02-02 15:36 ` Will Deacon
2026-02-02 16:01 ` Peter Zijlstra
2026-02-02 16:05 ` Will Deacon
2026-02-02 17:48 ` Marco Elver
2026-02-02 19:28 ` David Laight
2026-01-30 13:28 ` [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through " Marco Elver
2026-01-30 15:13 ` David Laight
2026-02-02 15:39 ` Will Deacon
2026-02-02 19:29 ` David Laight
2026-02-03 11:47 ` Will Deacon
2026-02-04 10:46 ` Marco Elver
2026-02-04 13:14 ` Peter Zijlstra
2026-02-04 14:15 ` Will Deacon
2026-02-06 15:09 ` Marco Elver
2026-02-06 18:26 ` David Laight
2026-02-15 21:55 ` Marco Elver
2026-02-15 22:16 ` David Laight
2026-02-15 22:43 ` Marco Elver
2026-02-15 23:18 ` David Laight
2026-02-15 23:40 ` Linus Torvalds
2026-02-16 11:09 ` David Laight
2026-02-16 15:32 ` Linus Torvalds
2026-02-16 17:43 ` David Laight
2026-02-17 12:16 ` Peter Zijlstra
2026-02-17 14:25 ` David Laight
2026-02-17 16:23 ` Linus Torvalds
2026-02-17 16:32 ` Linus Torvalds
2026-02-18 19:34 ` Boqun Feng
2026-02-18 20:18 ` Linus Torvalds
2026-02-19 15:21 ` Gary Guo
2026-02-19 18:36 ` Linus Torvalds
2026-02-02 19:13 ` [PATCH v3 0/3] arm64: Fixes for " Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260130151130.6026999b@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=arnd@arndb.de \
--cc=boqun.feng@gmail.com \
--cc=bvanassche@acm.org \
--cc=catalin.marinas@arm.com \
--cc=elver@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=longman@redhat.com \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.