[PATCH v3 0/3] arm64: Fixes for __READ_ONCE() with CONFIG

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH v3 0/3] arm64: Fixes for __READ_ONCE() with CONFIG_LTO=y
@ 2026-01-30 13:28 Marco Elver
  2026-01-30 13:28 ` [PATCH v3 1/3] arm64: Fix non-atomic " Marco Elver
                   ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Marco Elver @ 2026-01-30 13:28 UTC (permalink / raw)
  To: elver, Peter Zijlstra, Will Deacon
  Cc: Ingo Molnar, Thomas Gleixner, Boqun Feng, Waiman Long,
	Bart Van Assche, llvm, David Laight, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel

While investigating a Clang Context Analysis [1] false positive [2], I started
to dig deeper into arm64's __READ_ONCE() implementation with LTO. That rabbit
hole led me to find one critical bug with the current implementation (patch 1),
and subtle improvements that then enabled me to fix the original false positive.

Patch 1 fixes a bug where READ_ONCE() on types larger than 8 bytes (non-atomic
fallback) incorrectly qualified the pointer rather than the pointee as volatile.
This resulted in a lack of "once" semantics for large struct loads.

Patch 2 refactors the macro to use __rwonce_typeof_unqual() and
eliminates the ternary conditional.

Building on the refactor, patch 3 fixes the context analysis false positive, by
helping its alias analysis "see through" the __READ_ONCE despite the inline asm.

## Note on Alternative for Patch 3

An alternative considered for the Context Analysis fix was introducing a helper
function to redirect the pointer alias; specifically passing a pointer to
const-pointer does not invalidate an alias either (casting away the const is a
deliberate escape hatch, albeit somewhat unusual looking). This approach was
slightly more verbose, so the simpler approach was chosen for now. It is
preserved here for future reference in case we need it for something else:

	static __always_inline void __set_pointer_opaque(void *const *dst, const void *val)
	{
	    *(void **)dst = (void *)val;
	}

	...
	__set_pointer_opaque((void *const *)&__ret, &__u.__val);
	...

[1] https://docs.kernel.org/next/dev-tools/context-analysis.html
[2] https://lore.kernel.org/all/202601221040.TeM0ihff-lkp@intel.com/

---

v3:
* Comments-smithing.
* Use 'typeof(*__ret) __val'

v2:
* Add __rwonce_typeof_unqual() as fallback for old compilers.

Marco Elver (3):
  arm64: Fix non-atomic __READ_ONCE() with CONFIG_LTO=y
  arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
  arm64, compiler-context-analysis: Permit alias analysis through
    __READ_ONCE() with CONFIG_LTO=y

 arch/arm64/include/asm/rwonce.h | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

-- 
2.53.0.rc1.225.gd81095ad13-goog

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 1/3] arm64: Fix non-atomic __READ_ONCE() with CONFIG_LTO=y
  2026-01-30 13:28 [PATCH v3 0/3] arm64: Fixes for __READ_ONCE() with CONFIG_LTO=y Marco Elver
@ 2026-01-30 13:28 ` Marco Elver
  2026-01-30 15:06   ` David Laight
  2026-01-30 13:28 ` [PATCH v3 2/3] arm64: Optimize " Marco Elver
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 37+ messages in thread
From: Marco Elver @ 2026-01-30 13:28 UTC (permalink / raw)
  To: elver, Peter Zijlstra, Will Deacon
  Cc: Ingo Molnar, Thomas Gleixner, Boqun Feng, Waiman Long,
	Bart Van Assche, llvm, David Laight, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, stable, Boqun Feng

The implementation of __READ_ONCE() under CONFIG_LTO=y incorrectly
qualified the fallback "once" access for types larger than 8 bytes,
which are not atomic but should still happen "once" and suppress common
compiler optimizations.

The cast `volatile typeof(__x)` applied the volatile qualifier to the
pointer type itself rather than the pointee. This created a volatile
pointer to a non-volatile type, which violated __READ_ONCE() semantics.

Fix this by casting to `volatile typeof(*__x) *`.

With a defconfig + LTO + debug options build, we see the following
functions to be affected:

	xen_manage_runstate_time (884 -> 944 bytes)
	xen_steal_clock (248 -> 340 bytes)
	  ^-- use __READ_ONCE() to load vcpu_runstate_info structs

Fixes: e35123d83ee3 ("arm64: lto: Strengthen READ_ONCE() to acquire when CONFIG_LTO=y")
Cc: <stable@vger.kernel.org>
Reviewed-by: Boqun Feng <boqun@kernel.org>
Signed-off-by: Marco Elver <elver@google.com>
---
 arch/arm64/include/asm/rwonce.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
index 78beceec10cd..fc0fb42b0b64 100644
--- a/arch/arm64/include/asm/rwonce.h
+++ b/arch/arm64/include/asm/rwonce.h
@@ -58,7 +58,7 @@
 	default:							\
 		atomic = 0;						\
 	}								\
-	atomic ? (typeof(*__x))__u.__val : (*(volatile typeof(__x))__x);\
+	atomic ? (typeof(*__x))__u.__val : (*(volatile typeof(*__x) *)__x);\
 })
 
 #endif	/* !BUILD_VDSO */
-- 
2.53.0.rc1.225.gd81095ad13-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
  2026-01-30 13:28 [PATCH v3 0/3] arm64: Fixes for __READ_ONCE() with CONFIG_LTO=y Marco Elver
  2026-01-30 13:28 ` [PATCH v3 1/3] arm64: Fix non-atomic " Marco Elver
@ 2026-01-30 13:28 ` Marco Elver
  2026-01-30 15:11   ` David Laight
  2026-02-02 15:36   ` Will Deacon
  2026-01-30 13:28 ` [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through " Marco Elver
  2026-02-02 19:13 ` [PATCH v3 0/3] arm64: Fixes for " Will Deacon
  3 siblings, 2 replies; 37+ messages in thread
From: Marco Elver @ 2026-01-30 13:28 UTC (permalink / raw)
  To: elver, Peter Zijlstra, Will Deacon
  Cc: Ingo Molnar, Thomas Gleixner, Boqun Feng, Waiman Long,
	Bart Van Assche, llvm, David Laight, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel

Rework arm64 LTO __READ_ONCE() to improve code generation as follows:

1. Replace _Generic-based __unqual_scalar_typeof() with more complete
   __rwonce_typeof_unqual(). This strips qualifiers from all types, not
   just integer types, which is required to be able to assign (must be
   non-const) to __u.__val in the non-atomic case (required for #2).

Once our minimum compiler versions are bumped, this just becomes
TYPEOF_UNQUAL() (or typeof_unqual() should we decide to adopt C23
naming).  Sadly the fallback version of __rwonce_typeof_unqual() cannot
be used as a general TYPEOF_UNQUAL() fallback (see code comments).

One subtle point here is that non-integer types of __val could be const
or volatile within the union with the old __unqual_scalar_typeof(), if
the passed variable is const or volatile. This would then result in a
forced load from the stack if __u.__val is volatile; in the case of
const, it does look odd if the underlying storage changes, but the
compiler is told said member is "const" -- it smells like UB.

2. Eliminate the atomic flag and ternary conditional expression. Move
   the fallback volatile load into the default case of the switch,
   ensuring __u is unconditionally initialized across all paths.
   The statement expression now unconditionally returns __u.__val.

This refactoring appears to help the compiler improve (or fix) code
generation.

With a defconfig + LTO + debug options builds, we observe different
codegen for the following functions:

	btrfs_reclaim_sweep (708 -> 1032 bytes)
	btrfs_sinfo_bg_reclaim_threshold_store (200 -> 204 bytes)
	check_mem_access (3652 -> 3692 bytes) [inlined bpf_map_is_rdonly]
	console_flush_all (1268 -> 1264 bytes)
	console_lock_spinning_disable_and_check (180 -> 176 bytes)
	igb_add_filter (640 -> 636 bytes)
	igb_config_tx_modes (2404 -> 2400 bytes)
	kvm_vcpu_on_spin (480 -> 476 bytes)
	map_freeze (376 -> 380 bytes)
	netlink_bind (1664 -> 1656 bytes)
	nmi_cpu_backtrace (404 -> 400 bytes)
	set_rps_cpu (516 -> 520 bytes)
	swap_cluster_readahead (944 -> 932 bytes)
	tcp_accecn_third_ack (328 -> 336 bytes)
	tcp_create_openreq_child (1764 -> 1772 bytes)
	tcp_data_queue (5784 -> 5892 bytes)
	tcp_ecn_rcv_synack (620 -> 628 bytes)
	xen_manage_runstate_time (944 -> 896 bytes)
	xen_steal_clock (340 -> 296 bytes)

Increase of some functions are due to more aggressive inlining due to
better codegen (in this build, e.g. bpf_map_is_rdonly is no longer
present due to being inlined completely).

Signed-off-by: Marco Elver <elver@google.com>
---
v3:
* Comment.

v2:
* Add __rwonce_typeof_unqual() as fallback for old compilers.
---
 arch/arm64/include/asm/rwonce.h | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
index fc0fb42b0b64..42c9e8429274 100644
--- a/arch/arm64/include/asm/rwonce.h
+++ b/arch/arm64/include/asm/rwonce.h
@@ -19,6 +19,20 @@
 		"ldapr"	#sfx "\t" #regs,				\
 	ARM64_HAS_LDAPR)
 
+#ifdef USE_TYPEOF_UNQUAL
+#define __rwonce_typeof_unqual(x) TYPEOF_UNQUAL(x)
+#else
+/*
+ * Fallback for older compilers (Clang < 19).
+ *
+ * Uses the fact that, for all supported Clang versions, 'auto' correctly drops
+ * qualifiers. Unlike typeof_unqual(), the type must be completely defined, i.e.
+ * no forward-declared struct pointer dereferences.  The array-to-pointer decay
+ * case does not matter for usage in READ_ONCE() either.
+ */
+#define __rwonce_typeof_unqual(x) typeof(({ auto ____t = (x); ____t; }))
+#endif
+
 /*
  * When building with LTO, there is an increased risk of the compiler
  * converting an address dependency headed by a READ_ONCE() invocation
@@ -32,8 +46,7 @@
 #define __READ_ONCE(x)							\
 ({									\
 	typeof(&(x)) __x = &(x);					\
-	int atomic = 1;							\
-	union { __unqual_scalar_typeof(*__x) __val; char __c[1]; } __u;	\
+	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
 	switch (sizeof(x)) {						\
 	case 1:								\
 		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
@@ -56,9 +69,9 @@
 			: "Q" (*__x) : "memory");			\
 		break;							\
 	default:							\
-		atomic = 0;						\
+		__u.__val = *(volatile typeof(*__x) *)__x;		\
 	}								\
-	atomic ? (typeof(*__x))__u.__val : (*(volatile typeof(*__x) *)__x);\
+	__u.__val;							\
 })
 
 #endif	/* !BUILD_VDSO */
-- 
2.53.0.rc1.225.gd81095ad13-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-01-30 13:28 [PATCH v3 0/3] arm64: Fixes for __READ_ONCE() with CONFIG_LTO=y Marco Elver
  2026-01-30 13:28 ` [PATCH v3 1/3] arm64: Fix non-atomic " Marco Elver
  2026-01-30 13:28 ` [PATCH v3 2/3] arm64: Optimize " Marco Elver
@ 2026-01-30 13:28 ` Marco Elver
  2026-01-30 15:13   ` David Laight
  2026-02-02 15:39   ` Will Deacon
  2026-02-02 19:13 ` [PATCH v3 0/3] arm64: Fixes for " Will Deacon
  3 siblings, 2 replies; 37+ messages in thread
From: Marco Elver @ 2026-01-30 13:28 UTC (permalink / raw)
  To: elver, Peter Zijlstra, Will Deacon
  Cc: Ingo Molnar, Thomas Gleixner, Boqun Feng, Waiman Long,
	Bart Van Assche, llvm, David Laight, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng

When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
could see:

| kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
|      982 | }
|          | ^
|    kernel/futex/core.c:976:2: note: spinlock acquired here
|      976 |         spin_lock(lock_ptr);
|          |         ^
| kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
|      982 | }
|          | ^
|    kernel/futex/core.c:966:6: note: spinlock acquired here
|      966 | void futex_q_lockptr_lock(struct futex_q *q)
|          |      ^
|    2 warnings generated.

Where we have:

	extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
	..
	void futex_q_lockptr_lock(struct futex_q *q)
	{
		spinlock_t *lock_ptr;

		/*
		 * See futex_unqueue() why lock_ptr can change.
		 */
		guard(rcu)();
	retry:
>>		lock_ptr = READ_ONCE(q->lock_ptr);
		spin_lock(lock_ptr);
	...
	}

At the time of the above report (prior to removal of the 'atomic' flag),
Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
'atomic ?  __u.__val : q->lock_ptr' (now just '__u.__val'), and used
this as the identity of the context lock given it cannot "see through"
the inline assembly; however, we want 'q->lock_ptr' as the canonical
context lock.

While for code generation the compiler simplified to '__u.__val' for
pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
much earlier on the AST, and (b) would be the wrong deduction.

Now that we've gotten rid of the 'atomic' ternary comparison, we can
return '__u.__val' through a pointer that we initialize with '&x', but
then update via a pointer-to-pointer. When READ_ONCE()'ing a context
lock pointer, TSA's alias analysis does not invalidate the initial alias
when updated through the pointer-to-pointer, and we make it effectively
"see through" the __READ_ONCE().

Code generation is unchanged.

Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Boqun Feng <boqun@kernel.org>
Signed-off-by: Marco Elver <elver@google.com>
---
v3:
* Use 'typeof(*__ret)'.
* Commit message.

v2:
* Rebase.
---
 arch/arm64/include/asm/rwonce.h | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
index 42c9e8429274..b7de74d4bf07 100644
--- a/arch/arm64/include/asm/rwonce.h
+++ b/arch/arm64/include/asm/rwonce.h
@@ -45,8 +45,12 @@
  */
 #define __READ_ONCE(x)							\
 ({									\
-	typeof(&(x)) __x = &(x);					\
-	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
+	auto __x = &(x);						\
+	auto __ret = (__rwonce_typeof_unqual(*__x) *)__x;		\
+	/* Hides alias reassignment from Clang's -Wthread-safety. */	\
+	auto __retp = &__ret;						\
+	union { typeof(*__ret) __val; char __c[1]; } __u;		\
+	*__retp = &__u.__val;						\
 	switch (sizeof(x)) {						\
 	case 1:								\
 		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
@@ -71,7 +75,7 @@
 	default:							\
 		__u.__val = *(volatile typeof(*__x) *)__x;		\
 	}								\
-	__u.__val;							\
+	*__ret;								\
 })
 
 #endif	/* !BUILD_VDSO */
-- 
2.53.0.rc1.225.gd81095ad13-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/3] arm64: Fix non-atomic __READ_ONCE() with CONFIG_LTO=y
  2026-01-30 13:28 ` [PATCH v3 1/3] arm64: Fix non-atomic " Marco Elver
@ 2026-01-30 15:06   ` David Laight
  0 siblings, 0 replies; 37+ messages in thread
From: David Laight @ 2026-01-30 15:06 UTC (permalink / raw)
  To: Marco Elver
  Cc: Peter Zijlstra, Will Deacon, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, stable, Boqun Feng

On Fri, 30 Jan 2026 14:28:24 +0100
Marco Elver <elver@google.com> wrote:

> The implementation of __READ_ONCE() under CONFIG_LTO=y incorrectly
> qualified the fallback "once" access for types larger than 8 bytes,
> which are not atomic but should still happen "once" and suppress common
> compiler optimizations.
> 
> The cast `volatile typeof(__x)` applied the volatile qualifier to the
> pointer type itself rather than the pointee. This created a volatile
> pointer to a non-volatile type, which violated __READ_ONCE() semantics.
> 
> Fix this by casting to `volatile typeof(*__x) *`.
> 
> With a defconfig + LTO + debug options build, we see the following
> functions to be affected:
> 
> 	xen_manage_runstate_time (884 -> 944 bytes)
> 	xen_steal_clock (248 -> 340 bytes)
> 	  ^-- use __READ_ONCE() to load vcpu_runstate_info structs
> 
> Fixes: e35123d83ee3 ("arm64: lto: Strengthen READ_ONCE() to acquire when CONFIG_LTO=y")
> Cc: <stable@vger.kernel.org>
> Reviewed-by: Boqun Feng <boqun@kernel.org>
> Signed-off-by: Marco Elver <elver@google.com>

I found this in some testing (on godbolt), so:

Tested-by: David Laight <david.laight.linux@gmail.com>

> ---
>  arch/arm64/include/asm/rwonce.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> index 78beceec10cd..fc0fb42b0b64 100644
> --- a/arch/arm64/include/asm/rwonce.h
> +++ b/arch/arm64/include/asm/rwonce.h
> @@ -58,7 +58,7 @@
>  	default:							\
>  		atomic = 0;						\
>  	}								\
> -	atomic ? (typeof(*__x))__u.__val : (*(volatile typeof(__x))__x);\
> +	atomic ? (typeof(*__x))__u.__val : (*(volatile typeof(*__x) *)__x);\
>  })
>  
>  #endif	/* !BUILD_VDSO */



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
  2026-01-30 13:28 ` [PATCH v3 2/3] arm64: Optimize " Marco Elver
@ 2026-01-30 15:11   ` David Laight
  2026-02-02 15:36   ` Will Deacon
  1 sibling, 0 replies; 37+ messages in thread
From: David Laight @ 2026-01-30 15:11 UTC (permalink / raw)
  To: Marco Elver
  Cc: Peter Zijlstra, Will Deacon, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel

On Fri, 30 Jan 2026 14:28:25 +0100
Marco Elver <elver@google.com> wrote:

> Rework arm64 LTO __READ_ONCE() to improve code generation as follows:
> 
> 1. Replace _Generic-based __unqual_scalar_typeof() with more complete
>    __rwonce_typeof_unqual(). This strips qualifiers from all types, not
>    just integer types, which is required to be able to assign (must be
>    non-const) to __u.__val in the non-atomic case (required for #2).
> 
> Once our minimum compiler versions are bumped, this just becomes
> TYPEOF_UNQUAL() (or typeof_unqual() should we decide to adopt C23
> naming).  Sadly the fallback version of __rwonce_typeof_unqual() cannot
> be used as a general TYPEOF_UNQUAL() fallback (see code comments).
> 
> One subtle point here is that non-integer types of __val could be const
> or volatile within the union with the old __unqual_scalar_typeof(), if
> the passed variable is const or volatile. This would then result in a
> forced load from the stack if __u.__val is volatile; in the case of
> const, it does look odd if the underlying storage changes, but the
> compiler is told said member is "const" -- it smells like UB.
> 
> 2. Eliminate the atomic flag and ternary conditional expression. Move
>    the fallback volatile load into the default case of the switch,
>    ensuring __u is unconditionally initialized across all paths.
>    The statement expression now unconditionally returns __u.__val.
> 
> This refactoring appears to help the compiler improve (or fix) code
> generation.
> 
> With a defconfig + LTO + debug options builds, we observe different
> codegen for the following functions:
> 
> 	btrfs_reclaim_sweep (708 -> 1032 bytes)
> 	btrfs_sinfo_bg_reclaim_threshold_store (200 -> 204 bytes)
> 	check_mem_access (3652 -> 3692 bytes) [inlined bpf_map_is_rdonly]
> 	console_flush_all (1268 -> 1264 bytes)
> 	console_lock_spinning_disable_and_check (180 -> 176 bytes)
> 	igb_add_filter (640 -> 636 bytes)
> 	igb_config_tx_modes (2404 -> 2400 bytes)
> 	kvm_vcpu_on_spin (480 -> 476 bytes)
> 	map_freeze (376 -> 380 bytes)
> 	netlink_bind (1664 -> 1656 bytes)
> 	nmi_cpu_backtrace (404 -> 400 bytes)
> 	set_rps_cpu (516 -> 520 bytes)
> 	swap_cluster_readahead (944 -> 932 bytes)
> 	tcp_accecn_third_ack (328 -> 336 bytes)
> 	tcp_create_openreq_child (1764 -> 1772 bytes)
> 	tcp_data_queue (5784 -> 5892 bytes)
> 	tcp_ecn_rcv_synack (620 -> 628 bytes)
> 	xen_manage_runstate_time (944 -> 896 bytes)
> 	xen_steal_clock (340 -> 296 bytes)
> 
> Increase of some functions are due to more aggressive inlining due to
> better codegen (in this build, e.g. bpf_map_is_rdonly is no longer
> present due to being inlined completely).
> 
> Signed-off-by: Marco Elver <elver@google.com>

Having most of the comment in the commit message and a short one in the
code look good.

I think it will also fix a 'bleat' from min() about a signed v unsigned compare.
The ?: causes 'u8' to be promoted to 'int' with the expected outcome.

Reviewed-by: David Laight <david.laight.linux>@gmail.com

> ---
> v3:
> * Comment.
> 
> v2:
> * Add __rwonce_typeof_unqual() as fallback for old compilers.
> ---
>  arch/arm64/include/asm/rwonce.h | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> index fc0fb42b0b64..42c9e8429274 100644
> --- a/arch/arm64/include/asm/rwonce.h
> +++ b/arch/arm64/include/asm/rwonce.h
> @@ -19,6 +19,20 @@
>  		"ldapr"	#sfx "\t" #regs,				\
>  	ARM64_HAS_LDAPR)
>  
> +#ifdef USE_TYPEOF_UNQUAL
> +#define __rwonce_typeof_unqual(x) TYPEOF_UNQUAL(x)
> +#else
> +/*
> + * Fallback for older compilers (Clang < 19).
> + *
> + * Uses the fact that, for all supported Clang versions, 'auto' correctly drops
> + * qualifiers. Unlike typeof_unqual(), the type must be completely defined, i.e.
> + * no forward-declared struct pointer dereferences.  The array-to-pointer decay
> + * case does not matter for usage in READ_ONCE() either.
> + */
> +#define __rwonce_typeof_unqual(x) typeof(({ auto ____t = (x); ____t; }))
> +#endif
> +
>  /*
>   * When building with LTO, there is an increased risk of the compiler
>   * converting an address dependency headed by a READ_ONCE() invocation
> @@ -32,8 +46,7 @@
>  #define __READ_ONCE(x)							\
>  ({									\
>  	typeof(&(x)) __x = &(x);					\
> -	int atomic = 1;							\
> -	union { __unqual_scalar_typeof(*__x) __val; char __c[1]; } __u;	\
> +	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
>  	switch (sizeof(x)) {						\
>  	case 1:								\
>  		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
> @@ -56,9 +69,9 @@
>  			: "Q" (*__x) : "memory");			\
>  		break;							\
>  	default:							\
> -		atomic = 0;						\
> +		__u.__val = *(volatile typeof(*__x) *)__x;		\
>  	}								\
> -	atomic ? (typeof(*__x))__u.__val : (*(volatile typeof(*__x) *)__x);\
> +	__u.__val;							\
>  })
>  
>  #endif	/* !BUILD_VDSO */



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-01-30 13:28 ` [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through " Marco Elver
@ 2026-01-30 15:13   ` David Laight
  2026-02-02 15:39   ` Will Deacon
  1 sibling, 0 replies; 37+ messages in thread
From: David Laight @ 2026-01-30 15:13 UTC (permalink / raw)
  To: Marco Elver
  Cc: Peter Zijlstra, Will Deacon, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng

On Fri, 30 Jan 2026 14:28:26 +0100
Marco Elver <elver@google.com> wrote:

> When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
> kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
> could see:
> 
> | kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
> |      982 | }
> |          | ^
> |    kernel/futex/core.c:976:2: note: spinlock acquired here
> |      976 |         spin_lock(lock_ptr);
> |          |         ^
> | kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
> |      982 | }
> |          | ^
> |    kernel/futex/core.c:966:6: note: spinlock acquired here
> |      966 | void futex_q_lockptr_lock(struct futex_q *q)
> |          |      ^
> |    2 warnings generated.
> 
> Where we have:
> 
> 	extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
> 	..
> 	void futex_q_lockptr_lock(struct futex_q *q)
> 	{
> 		spinlock_t *lock_ptr;
> 
> 		/*
> 		 * See futex_unqueue() why lock_ptr can change.
> 		 */
> 		guard(rcu)();
> 	retry:
> >>		lock_ptr = READ_ONCE(q->lock_ptr);  
> 		spin_lock(lock_ptr);
> 	...
> 	}
> 
> At the time of the above report (prior to removal of the 'atomic' flag),
> Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
> 'atomic ?  __u.__val : q->lock_ptr' (now just '__u.__val'), and used
> this as the identity of the context lock given it cannot "see through"
> the inline assembly; however, we want 'q->lock_ptr' as the canonical
> context lock.
> 
> While for code generation the compiler simplified to '__u.__val' for
> pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
> much earlier on the AST, and (b) would be the wrong deduction.
> 
> Now that we've gotten rid of the 'atomic' ternary comparison, we can
> return '__u.__val' through a pointer that we initialize with '&x', but
> then update via a pointer-to-pointer. When READ_ONCE()'ing a context
> lock pointer, TSA's alias analysis does not invalidate the initial alias
> when updated through the pointer-to-pointer, and we make it effectively
> "see through" the __READ_ONCE().
> 
> Code generation is unchanged.
> 
> Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
> Cc: Peter Zijlstra <peterz@infradead.org>
> Tested-by: Boqun Feng <boqun@kernel.org>
> Signed-off-by: Marco Elver <elver@google.com>

LGTM (for an obscure definition of G).

Reviewed-by: David Laight <david.laight.linux@gmail.com>

> ---
> v3:
> * Use 'typeof(*__ret)'.
> * Commit message.
> 
> v2:
> * Rebase.
> ---
>  arch/arm64/include/asm/rwonce.h | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> index 42c9e8429274..b7de74d4bf07 100644
> --- a/arch/arm64/include/asm/rwonce.h
> +++ b/arch/arm64/include/asm/rwonce.h
> @@ -45,8 +45,12 @@
>   */
>  #define __READ_ONCE(x)							\
>  ({									\
> -	typeof(&(x)) __x = &(x);					\
> -	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
> +	auto __x = &(x);						\
> +	auto __ret = (__rwonce_typeof_unqual(*__x) *)__x;		\
> +	/* Hides alias reassignment from Clang's -Wthread-safety. */	\
> +	auto __retp = &__ret;						\
> +	union { typeof(*__ret) __val; char __c[1]; } __u;		\
> +	*__retp = &__u.__val;						\
>  	switch (sizeof(x)) {						\
>  	case 1:								\
>  		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
> @@ -71,7 +75,7 @@
>  	default:							\
>  		__u.__val = *(volatile typeof(*__x) *)__x;		\
>  	}								\
> -	__u.__val;							\
> +	*__ret;								\
>  })
>  
>  #endif	/* !BUILD_VDSO */



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
  2026-01-30 13:28 ` [PATCH v3 2/3] arm64: Optimize " Marco Elver
  2026-01-30 15:11   ` David Laight
@ 2026-02-02 15:36   ` Will Deacon
  2026-02-02 16:01     ` Peter Zijlstra
  2026-02-02 19:28     ` David Laight
  1 sibling, 2 replies; 37+ messages in thread
From: Will Deacon @ 2026-02-02 15:36 UTC (permalink / raw)
  To: Marco Elver
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Boqun Feng,
	Waiman Long, Bart Van Assche, llvm, David Laight, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel

On Fri, Jan 30, 2026 at 02:28:25PM +0100, Marco Elver wrote:
> Rework arm64 LTO __READ_ONCE() to improve code generation as follows:
> 
> 1. Replace _Generic-based __unqual_scalar_typeof() with more complete
>    __rwonce_typeof_unqual(). This strips qualifiers from all types, not
>    just integer types, which is required to be able to assign (must be
>    non-const) to __u.__val in the non-atomic case (required for #2).
> 
> Once our minimum compiler versions are bumped, this just becomes
> TYPEOF_UNQUAL() (or typeof_unqual() should we decide to adopt C23
> naming).  Sadly the fallback version of __rwonce_typeof_unqual() cannot
> be used as a general TYPEOF_UNQUAL() fallback (see code comments).
> 
> One subtle point here is that non-integer types of __val could be const
> or volatile within the union with the old __unqual_scalar_typeof(), if
> the passed variable is const or volatile. This would then result in a
> forced load from the stack if __u.__val is volatile; in the case of
> const, it does look odd if the underlying storage changes, but the
> compiler is told said member is "const" -- it smells like UB.
> 
> 2. Eliminate the atomic flag and ternary conditional expression. Move
>    the fallback volatile load into the default case of the switch,
>    ensuring __u is unconditionally initialized across all paths.
>    The statement expression now unconditionally returns __u.__val.
> 
> This refactoring appears to help the compiler improve (or fix) code
> generation.
> 
> With a defconfig + LTO + debug options builds, we observe different
> codegen for the following functions:
> 
> 	btrfs_reclaim_sweep (708 -> 1032 bytes)
> 	btrfs_sinfo_bg_reclaim_threshold_store (200 -> 204 bytes)
> 	check_mem_access (3652 -> 3692 bytes) [inlined bpf_map_is_rdonly]
> 	console_flush_all (1268 -> 1264 bytes)
> 	console_lock_spinning_disable_and_check (180 -> 176 bytes)
> 	igb_add_filter (640 -> 636 bytes)
> 	igb_config_tx_modes (2404 -> 2400 bytes)
> 	kvm_vcpu_on_spin (480 -> 476 bytes)
> 	map_freeze (376 -> 380 bytes)
> 	netlink_bind (1664 -> 1656 bytes)
> 	nmi_cpu_backtrace (404 -> 400 bytes)
> 	set_rps_cpu (516 -> 520 bytes)
> 	swap_cluster_readahead (944 -> 932 bytes)
> 	tcp_accecn_third_ack (328 -> 336 bytes)
> 	tcp_create_openreq_child (1764 -> 1772 bytes)
> 	tcp_data_queue (5784 -> 5892 bytes)
> 	tcp_ecn_rcv_synack (620 -> 628 bytes)
> 	xen_manage_runstate_time (944 -> 896 bytes)
> 	xen_steal_clock (340 -> 296 bytes)
> 
> Increase of some functions are due to more aggressive inlining due to
> better codegen (in this build, e.g. bpf_map_is_rdonly is no longer
> present due to being inlined completely).
> 
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v3:
> * Comment.
> 
> v2:
> * Add __rwonce_typeof_unqual() as fallback for old compilers.
> ---
>  arch/arm64/include/asm/rwonce.h | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> index fc0fb42b0b64..42c9e8429274 100644
> --- a/arch/arm64/include/asm/rwonce.h
> +++ b/arch/arm64/include/asm/rwonce.h
> @@ -19,6 +19,20 @@
>  		"ldapr"	#sfx "\t" #regs,				\
>  	ARM64_HAS_LDAPR)
>  
> +#ifdef USE_TYPEOF_UNQUAL
> +#define __rwonce_typeof_unqual(x) TYPEOF_UNQUAL(x)
> +#else
> +/*
> + * Fallback for older compilers (Clang < 19).
> + *
> + * Uses the fact that, for all supported Clang versions, 'auto' correctly drops
> + * qualifiers. Unlike typeof_unqual(), the type must be completely defined, i.e.
> + * no forward-declared struct pointer dereferences.  The array-to-pointer decay
> + * case does not matter for usage in READ_ONCE() either.
> + */
> +#define __rwonce_typeof_unqual(x) typeof(({ auto ____t = (x); ____t; }))
> +#endif

I know that CONFIG_LTO practically depends on Clang, but it's a bit
grotty relying on that assumption here. Ideally, it would be
straightforward to enable the strong READ_ONCE() semantics on arm64
regardless of the compiler.

>  /*
>   * When building with LTO, there is an increased risk of the compiler
>   * converting an address dependency headed by a READ_ONCE() invocation
> @@ -32,8 +46,7 @@
>  #define __READ_ONCE(x)							\
>  ({									\
>  	typeof(&(x)) __x = &(x);					\
> -	int atomic = 1;							\
> -	union { __unqual_scalar_typeof(*__x) __val; char __c[1]; } __u;	\
> +	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
>  	switch (sizeof(x)) {						\
>  	case 1:								\
>  		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
> @@ -56,9 +69,9 @@
>  			: "Q" (*__x) : "memory");			\
>  		break;							\
>  	default:							\
> -		atomic = 0;						\
> +		__u.__val = *(volatile typeof(*__x) *)__x;		\

Since we're not providing acquire semantics for the non-atomic case,
what we really want is the generic definition of __READ_ONCE() from
include/asm-generic/rwonce.h here. The header inclusion mess prevents
that, but why can't we just inline that definition here for the
'default' case? If TYPEOF_UNQUAL() leads to better codegen, shouldn't
we use that to implement __unqual_scalar_typeof() when it is available?

I fear I'm missing something here, but it just feels like we're
optimising a pretty niche case (arm64 + LTO + non-atomic __READ_ONCE())
in a way that looks more generally applicable.

Will


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-01-30 13:28 ` [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through " Marco Elver
  2026-01-30 15:13   ` David Laight
@ 2026-02-02 15:39   ` Will Deacon
  2026-02-02 19:29     ` David Laight
  1 sibling, 1 reply; 37+ messages in thread
From: Will Deacon @ 2026-02-02 15:39 UTC (permalink / raw)
  To: Marco Elver
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Boqun Feng,
	Waiman Long, Bart Van Assche, llvm, David Laight, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng

On Fri, Jan 30, 2026 at 02:28:26PM +0100, Marco Elver wrote:
> When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
> kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
> could see:
> 
> | kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
> |      982 | }
> |          | ^
> |    kernel/futex/core.c:976:2: note: spinlock acquired here
> |      976 |         spin_lock(lock_ptr);
> |          |         ^
> | kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
> |      982 | }
> |          | ^
> |    kernel/futex/core.c:966:6: note: spinlock acquired here
> |      966 | void futex_q_lockptr_lock(struct futex_q *q)
> |          |      ^
> |    2 warnings generated.
> 
> Where we have:
> 
> 	extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
> 	..
> 	void futex_q_lockptr_lock(struct futex_q *q)
> 	{
> 		spinlock_t *lock_ptr;
> 
> 		/*
> 		 * See futex_unqueue() why lock_ptr can change.
> 		 */
> 		guard(rcu)();
> 	retry:
> >>		lock_ptr = READ_ONCE(q->lock_ptr);
> 		spin_lock(lock_ptr);
> 	...
> 	}
> 
> At the time of the above report (prior to removal of the 'atomic' flag),
> Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
> 'atomic ?  __u.__val : q->lock_ptr' (now just '__u.__val'), and used
> this as the identity of the context lock given it cannot "see through"
> the inline assembly; however, we want 'q->lock_ptr' as the canonical
> context lock.
> 
> While for code generation the compiler simplified to '__u.__val' for
> pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
> much earlier on the AST, and (b) would be the wrong deduction.
> 
> Now that we've gotten rid of the 'atomic' ternary comparison, we can
> return '__u.__val' through a pointer that we initialize with '&x', but
> then update via a pointer-to-pointer. When READ_ONCE()'ing a context
> lock pointer, TSA's alias analysis does not invalidate the initial alias
> when updated through the pointer-to-pointer, and we make it effectively
> "see through" the __READ_ONCE().
> 
> Code generation is unchanged.
> 
> Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
> Cc: Peter Zijlstra <peterz@infradead.org>
> Tested-by: Boqun Feng <boqun@kernel.org>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v3:
> * Use 'typeof(*__ret)'.
> * Commit message.
> 
> v2:
> * Rebase.
> ---
>  arch/arm64/include/asm/rwonce.h | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> index 42c9e8429274..b7de74d4bf07 100644
> --- a/arch/arm64/include/asm/rwonce.h
> +++ b/arch/arm64/include/asm/rwonce.h
> @@ -45,8 +45,12 @@
>   */
>  #define __READ_ONCE(x)							\
>  ({									\
> -	typeof(&(x)) __x = &(x);					\
> -	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
> +	auto __x = &(x);						\
> +	auto __ret = (__rwonce_typeof_unqual(*__x) *)__x;		\
> +	/* Hides alias reassignment from Clang's -Wthread-safety. */	\
> +	auto __retp = &__ret;						\
> +	union { typeof(*__ret) __val; char __c[1]; } __u;		\
> +	*__retp = &__u.__val;						\
>  	switch (sizeof(x)) {						\
>  	case 1:								\
>  		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
> @@ -71,7 +75,7 @@
>  	default:							\
>  		__u.__val = *(volatile typeof(*__x) *)__x;		\
>  	}								\
> -	__u.__val;							\
> +	*__ret;								\
>  })

What does GCC do with this? :/

Will


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
  2026-02-02 15:36   ` Will Deacon
@ 2026-02-02 16:01     ` Peter Zijlstra
  2026-02-02 16:05       ` Will Deacon
  2026-02-02 19:28     ` David Laight
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2026-02-02 16:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marco Elver, Ingo Molnar, Thomas Gleixner, Boqun Feng,
	Waiman Long, Bart Van Assche, llvm, David Laight, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel

On Mon, Feb 02, 2026 at 03:36:40PM +0000, Will Deacon wrote:

> Since we're not providing acquire semantics for the non-atomic case,
> what we really want is the generic definition of __READ_ONCE() from
> include/asm-generic/rwonce.h here. The header inclusion mess prevents
> that, but why can't we just inline that definition here for the
> 'default' case? If TYPEOF_UNQUAL() leads to better codegen, shouldn't
> we use that to implement __unqual_scalar_typeof() when it is available?

We are?

---

commit fd69b2f7d5f4e1d89cea4cdfa6f15e7fa53d8358
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Jan 16 19:18:16 2026 +0100

    compiler: Use __typeof_unqual__() for __unqual_scalar_typeof()
    
    The recent changes to get_unaligned() resulted in a new sparse warning:
    
       net/rds/ib_cm.c:96:35: sparse: sparse: incorrect type in argument 1 (different modifiers) @@     expected void * @@     got restricted __be64 const * @@
       net/rds/ib_cm.c:96:35: sparse:     expected void *
       net/rds/ib_cm.c:96:35: sparse:     got restricted __be64 const *
    
    The updated get_unaligned_t() uses __unqual_scalar_typeof() to get an
    unqualified type. This works correctly for the compilers, but fails for
    sparse when the data type is __be64 (or any other __beNN variant).
    
    On sparse runs (C=[12]) __beNN types are annotated with
    __attribute__((bitwise)).
    
    That annotation allows sparse to detect incompatible operations on __beNN
    variables, but it also prevents sparse from evaluating the _Generic() in
    __unqual_scalar_typeof() and map __beNN to a unqualified scalar type, so it
    ends up with the default, i.e. the original qualified type of a 'const
    __beNN' pointer. That then ends up as the first pointer argument to
    builtin_memcpy(), which obviously causes the above sparse warnings.
    
    The sparse git tree supports typeof_unqual() now, which allows to use it
    instead of the _Generic() based __unqual_scalar_typeof(). With that sparse
    correctly evaluates the unqualified type and keeps the __beNN logic intact.
    
    The downside is that this requires a top of tree sparse build and an old
    sparse version will emit a metric ton of incomprehensible error messages
    before it dies with a segfault.
    
    Therefore implement a sanity check which validates that the checker is
    available and capable of handling typeof_unqual(). Emit a warning if not so
    the user can take informed action.
    
    [ tglx: Move the evaluation of USE_TYPEOF_UNQUAL to compiler_types.h so it is
            set before use and implement the sanity checker ]
    
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@kernel.org>
    Acked-by: Ian Rogers <irogers@google.com>
    Link: https://patch.msgid.link/87ecnp2zh3.ffs@tglx
    Closes: https://lore.kernel.org/oe-kbuild-all/202601150001.sKSN644a-lkp@intel.com/

diff --git a/Makefile b/Makefile
index 9d38125263fb..179c9d9a56dd 100644
--- a/Makefile
+++ b/Makefile
@@ -1187,6 +1187,14 @@ CHECKFLAGS += $(if $(CONFIG_CPU_BIG_ENDIAN),-mbig-endian,-mlittle-endian)
 # the checker needs the correct machine size
 CHECKFLAGS += $(if $(CONFIG_64BIT),-m64,-m32)
 
+# Validate the checker is available and functional
+ifneq ($(KBUILD_CHECKSRC), 0)
+  ifneq ($(shell $(srctree)/scripts/checker-valid.sh $(CHECK) $(CHECKFLAGS)), 1)
+    $(warning C=$(KBUILD_CHECKSRC) specified, but $(CHECK) is not available or not up to date)
+    KBUILD_CHECKSRC = 0
+  endif
+endif
+
 # Default kernel image to build when no specific target is given.
 # KBUILD_IMAGE may be overruled on the command line or
 # set in the environment
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 04487c9bd751..c601222b495a 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -230,16 +230,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 	__BUILD_BUG_ON_ZERO_MSG(!__is_noncstr(p), \
 				"must be non-C-string (not NUL-terminated)")
 
-/*
- * Use __typeof_unqual__() when available.
- *
- * XXX: Remove test for __CHECKER__ once
- * sparse learns about __typeof_unqual__().
- */
-#if CC_HAS_TYPEOF_UNQUAL && !defined(__CHECKER__)
-# define USE_TYPEOF_UNQUAL 1
-#endif
-
 /*
  * Define TYPEOF_UNQUAL() to use __typeof_unqual__() as typeof
  * operator when available, to return an unqualified type of the exp.
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index d3318a3c2577..377df1e64096 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -562,6 +562,14 @@ struct ftrace_likely_data {
 #define asm_inline asm
 #endif
 
+#ifndef __ASSEMBLY__
+/*
+ * Use __typeof_unqual__() when available.
+ */
+#if CC_HAS_TYPEOF_UNQUAL || defined(__CHECKER__)
+# define USE_TYPEOF_UNQUAL 1
+#endif
+
 /* Are two types/vars the same type (ignoring qualifiers)? */
 #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
 
@@ -569,6 +577,7 @@ struct ftrace_likely_data {
  * __unqual_scalar_typeof(x) - Declare an unqualified scalar type, leaving
  *			       non-scalar types unchanged.
  */
+#ifndef USE_TYPEOF_UNQUAL
 /*
  * Prefer C11 _Generic for better compile-times and simpler code. Note: 'char'
  * is not type-compatible with 'signed char', and we define a separate case.
@@ -586,6 +595,10 @@ struct ftrace_likely_data {
 			 __scalar_type_to_expr_cases(long),		\
 			 __scalar_type_to_expr_cases(long long),	\
 			 default: (x)))
+#else
+#define __unqual_scalar_typeof(x) __typeof_unqual__(x)
+#endif
+#endif /* !__ASSEMBLY__ */
 
 /* Is this type a native word size -- useful for atomic operations */
 #define __native_word(t) \
diff --git a/scripts/checker-valid.sh b/scripts/checker-valid.sh
new file mode 100755
index 000000000000..625a789ed1c8
--- /dev/null
+++ b/scripts/checker-valid.sh
@@ -0,0 +1,19 @@
+#!/bin/sh -eu
+# SPDX-License-Identifier: GPL-2.0
+
+[ ! -x "$(command -v "$1")" ] && exit 1
+
+tmp_file=$(mktemp)
+trap "rm -f $tmp_file" EXIT
+
+cat << EOF >$tmp_file
+static inline int u(const int *q)
+{
+	__typeof_unqual__(*q) v = *q;
+	return v;
+}
+EOF
+
+# sparse happily exits with 0 on error so validate
+# there is none on stderr. Use awk as grep is a pain with sh -e
+$@ $tmp_file 2>&1 | awk -v c=1 '/error/{c=0}END{print c}'


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
  2026-02-02 16:01     ` Peter Zijlstra
@ 2026-02-02 16:05       ` Will Deacon
  2026-02-02 17:48         ` Marco Elver
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2026-02-02 16:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marco Elver, Ingo Molnar, Thomas Gleixner, Boqun Feng,
	Waiman Long, Bart Van Assche, llvm, David Laight, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel

On Mon, Feb 02, 2026 at 05:01:39PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 02, 2026 at 03:36:40PM +0000, Will Deacon wrote:
> 
> > Since we're not providing acquire semantics for the non-atomic case,
> > what we really want is the generic definition of __READ_ONCE() from
> > include/asm-generic/rwonce.h here. The header inclusion mess prevents
> > that, but why can't we just inline that definition here for the
> > 'default' case? If TYPEOF_UNQUAL() leads to better codegen, shouldn't
> > we use that to implement __unqual_scalar_typeof() when it is available?
> 
> We are?

Great! Then I don't grok why we need to choose between
__unqual_scalar_typeof() and __typeof_unqual__() in the arch code. We
should just use the former and it will DTRT.

Will


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
  2026-02-02 16:05       ` Will Deacon
@ 2026-02-02 17:48         ` Marco Elver
  0 siblings, 0 replies; 37+ messages in thread
From: Marco Elver @ 2026-02-02 17:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Boqun Feng,
	Waiman Long, Bart Van Assche, llvm, David Laight, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel

On Mon, Feb 02, 2026 at 03:36:40PM +0000, Will Deacon wrote:
> I know that CONFIG_LTO practically depends on Clang, but it's a bit
> grotty relying on that assumption here. Ideally, it would be
> straightforward to enable the strong READ_ONCE() semantics on arm64
> regardless of the compiler.

Does it matter for GCC versions that do not support LTO? Because I'm
quite sure that if, one day, we add support for GCC LTO, that GCC
version will be new enough that it'll just take the __typeof_unqual__()
version and it'll "just work".

The problem with older GCC versions was that their __auto_type did not
actually strip qualifiers (which it should have) -- this was fixed at
some point.

On Mon, Feb 02, 2026 at 04:05PM +0000, Will Deacon wrote:
> On Mon, Feb 02, 2026 at 05:01:39PM +0100, Peter Zijlstra wrote:
> > On Mon, Feb 02, 2026 at 03:36:40PM +0000, Will Deacon wrote:
> > 
> > > Since we're not providing acquire semantics for the non-atomic case,
> > > what we really want is the generic definition of __READ_ONCE() from
> > > include/asm-generic/rwonce.h here. The header inclusion mess prevents
> > > that, but why can't we just inline that definition here for the
> > > 'default' case? If TYPEOF_UNQUAL() leads to better codegen, shouldn't
> > > we use that to implement __unqual_scalar_typeof() when it is available?
> > 
> > We are?
> 
> Great! Then I don't grok why we need to choose between
> __unqual_scalar_typeof() and __typeof_unqual__() in the arch code. We
> should just use the former and it will DTRT.

The old __unqual_scalar_typeof() is still broken where
__typeof_unqual__() is unavailable - for the arm64 + LTO case that'd be
Clang <= 18, which we still have to support.

We could probably just ignore the performance issue ('volatile' reload
from stack, rare enough though given volatile variables are not usually
allowed) for these older versions and just say "use the newer compiler
to get better perf", but the 'const' issue will break the build:

| --- a/arch/arm64/include/asm/rwonce.h
| +++ b/arch/arm64/include/asm/rwonce.h
| @@ -46,7 +46,7 @@
|  #define __READ_ONCE(x)							\
|  ({									\
|  	auto __x = &(x);						\
| -	auto __ret = (__rwonce_typeof_unqual(*__x) *)__x;		\
| +	auto __ret = (__unqual_scalar_typeof(*__x) *)__x;		\
|  	/* Hides alias reassignment from Clang's -Wthread-safety. */	\
|  	auto __retp = &__ret;						\
|  	union { typeof(*__ret) __val; char __c[1]; } __u;		\

Results in:

| In file included from arch/arm64/kernel/asm-offsets.c:11:
| In file included from ./include/linux/arm_sdei.h:8:
| In file included from ./include/acpi/ghes.h:5:
| In file included from ./include/acpi/apei.h:9:
| In file included from ./include/linux/acpi.h:15:
| In file included from ./include/linux/device.h:32:
| In file included from ./include/linux/device/driver.h:21:
| In file included from ./include/linux/module.h:20:
| In file included from ./include/linux/elf.h:6:
| In file included from ./arch/arm64/include/asm/elf.h:141:
| ./include/linux/fs.h:1344:9: error: cannot assign to non-static data member '__val' with const-qualified type 'typeof (*__ret)' (aka 'struct fown_struct *const')
|  1344 |         return READ_ONCE(file->f_owner);
|       |                ^~~~~~~~~~~~~~~~~~~~~~~~
| ./include/asm-generic/rwonce.h:50:2: note: expanded from macro 'READ_ONCE'
|    50 |         __READ_ONCE(x);                                                 \
|       |         ^~~~~~~~~~~~~~
| ./arch/arm64/include/asm/rwonce.h:76:13: note: expanded from macro '__READ_ONCE'
|    76 |                 __u.__val = *(volatile typeof(*__x) *)__x;              \
|       |                 ~~~~~~~~~ ^
| ./include/linux/fs.h:1344:9: note: non-static data member '__val' declared const here
|  1344 |         return READ_ONCE(file->f_owner);
|       |                ^~~~~~~~~~~~~~~~~~~~~~~~
| ./include/asm-generic/rwonce.h:50:2: note: expanded from macro 'READ_ONCE'
|    50 |         __READ_ONCE(x);                                                 \
|       |         ^~~~~~~~~~~~~~
| ./arch/arm64/include/asm/rwonce.h:52:25: note: expanded from macro '__READ_ONCE'
|    52 |         union { typeof(*__ret) __val; char __c[1]; } __u;               \
|       |                 ~~~~~~~~~~~~~~~^~~~~

... and many many more such errors.

It's an unfortunate mess today, but I hope sooner than later we bump the
minimum compiler versions that we can just unconditionally use
__typeof_unqual__() and delete __unqual_scalar_typeof(),
__rwonce_typeof_unqual() workaround and all the other code that appears
to be conditional on USE_TYPEOF_UNQUAL:

	% git grep USE_TYPEOF_UNQUAL
	arch/x86/include/asm/percpu.h:#if defined(CONFIG_USE_X86_SEG_SUPPORT) && defined(USE_TYPEOF_UNQUAL)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] arm64: Fixes for __READ_ONCE() with CONFIG_LTO=y
  2026-01-30 13:28 [PATCH v3 0/3] arm64: Fixes for __READ_ONCE() with CONFIG_LTO=y Marco Elver
                   ` (2 preceding siblings ...)
  2026-01-30 13:28 ` [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through " Marco Elver
@ 2026-02-02 19:13 ` Will Deacon
  3 siblings, 0 replies; 37+ messages in thread
From: Will Deacon @ 2026-02-02 19:13 UTC (permalink / raw)
  To: Peter Zijlstra, Marco Elver
  Cc: catalin.marinas, kernel-team, Will Deacon, Ingo Molnar,
	Thomas Gleixner, Boqun Feng, Waiman Long, Bart Van Assche, llvm,
	David Laight, Arnd Bergmann, linux-arm-kernel, linux-kernel

On Fri, 30 Jan 2026 14:28:23 +0100, Marco Elver wrote:
> While investigating a Clang Context Analysis [1] false positive [2], I started
> to dig deeper into arm64's __READ_ONCE() implementation with LTO. That rabbit
> hole led me to find one critical bug with the current implementation (patch 1),
> and subtle improvements that then enabled me to fix the original false positive.
> 
> Patch 1 fixes a bug where READ_ONCE() on types larger than 8 bytes (non-atomic
> fallback) incorrectly qualified the pointer rather than the pointee as volatile.
> This resulted in a lack of "once" semantics for large struct loads.
> 
> [...]

Applied first patch only to arm64 (for-next/core), thanks!

[1/3] arm64: Fix non-atomic __READ_ONCE() with CONFIG_LTO=y
      https://git.kernel.org/arm64/c/bb0c99e08ab9

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
  2026-02-02 15:36   ` Will Deacon
  2026-02-02 16:01     ` Peter Zijlstra
@ 2026-02-02 19:28     ` David Laight
  1 sibling, 0 replies; 37+ messages in thread
From: David Laight @ 2026-02-02 19:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marco Elver, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel

On Mon, 2 Feb 2026 15:36:40 +0000
Will Deacon <will@kernel.org> wrote:

> On Fri, Jan 30, 2026 at 02:28:25PM +0100, Marco Elver wrote:
> > Rework arm64 LTO __READ_ONCE() to improve code generation as follows:
...				\
> >  	default:							\
> > -		atomic = 0;						\
> > +		__u.__val = *(volatile typeof(*__x) *)__x;		\  
> 
> Since we're not providing acquire semantics for the non-atomic case,
> what we really want is the generic definition of __READ_ONCE() from
> include/asm-generic/rwonce.h here. The header inclusion mess prevents
> that, but why can't we just inline that definition here for the
> 'default' case? If TYPEOF_UNQUAL() leads to better codegen, shouldn't
> we use that to implement __unqual_scalar_typeof() when it is available?
> 
> I fear I'm missing something here, but it just feels like we're
> optimising a pretty niche case (arm64 + LTO + non-atomic __READ_ONCE())
> in a way that looks more generally applicable.

Is that path even needed?

I'm sure I've built an x86-64 allmodconfig with it being an error.
If you look back in the history it used to be an error.

Anything to simplify READ_ONCE() will noticeably speed up build times.
Even on x86 just removing the check that the size is 1, 2, 4 or 8
makes a measurable difference - and that check doesn't need to be done
on every compile.

I'm not setup to do arm builds - never mind LTO ones.
(Yes, I know, it 'just' involves downloading the toolchain.)

	David


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-02 15:39   ` Will Deacon
@ 2026-02-02 19:29     ` David Laight
  2026-02-03 11:47       ` Will Deacon
  0 siblings, 1 reply; 37+ messages in thread
From: David Laight @ 2026-02-02 19:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marco Elver, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng

On Mon, 2 Feb 2026 15:39:36 +0000
Will Deacon <will@kernel.org> wrote:

> On Fri, Jan 30, 2026 at 02:28:26PM +0100, Marco Elver wrote:
> > When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
> > kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
> > could see:
> > 
> > | kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
> > |      982 | }
> > |          | ^
> > |    kernel/futex/core.c:976:2: note: spinlock acquired here
> > |      976 |         spin_lock(lock_ptr);
> > |          |         ^
> > | kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
> > |      982 | }
> > |          | ^
> > |    kernel/futex/core.c:966:6: note: spinlock acquired here
> > |      966 | void futex_q_lockptr_lock(struct futex_q *q)
> > |          |      ^
> > |    2 warnings generated.
> > 
> > Where we have:
> > 
> > 	extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
> > 	..
> > 	void futex_q_lockptr_lock(struct futex_q *q)
> > 	{
> > 		spinlock_t *lock_ptr;
> > 
> > 		/*
> > 		 * See futex_unqueue() why lock_ptr can change.
> > 		 */
> > 		guard(rcu)();
> > 	retry:  
> > >>		lock_ptr = READ_ONCE(q->lock_ptr);  
> > 		spin_lock(lock_ptr);
> > 	...
> > 	}
> > 
> > At the time of the above report (prior to removal of the 'atomic' flag),
> > Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
> > 'atomic ?  __u.__val : q->lock_ptr' (now just '__u.__val'), and used
> > this as the identity of the context lock given it cannot "see through"
> > the inline assembly; however, we want 'q->lock_ptr' as the canonical
> > context lock.
> > 
> > While for code generation the compiler simplified to '__u.__val' for
> > pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
> > much earlier on the AST, and (b) would be the wrong deduction.
> > 
> > Now that we've gotten rid of the 'atomic' ternary comparison, we can
> > return '__u.__val' through a pointer that we initialize with '&x', but
> > then update via a pointer-to-pointer. When READ_ONCE()'ing a context
> > lock pointer, TSA's alias analysis does not invalidate the initial alias
> > when updated through the pointer-to-pointer, and we make it effectively
> > "see through" the __READ_ONCE().
> > 
> > Code generation is unchanged.
> > 
> > Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
> > Reported-by: kernel test robot <lkp@intel.com>
> > Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Tested-by: Boqun Feng <boqun@kernel.org>
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v3:
> > * Use 'typeof(*__ret)'.
> > * Commit message.
> > 
> > v2:
> > * Rebase.
> > ---
> >  arch/arm64/include/asm/rwonce.h | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> > index 42c9e8429274..b7de74d4bf07 100644
> > --- a/arch/arm64/include/asm/rwonce.h
> > +++ b/arch/arm64/include/asm/rwonce.h
> > @@ -45,8 +45,12 @@
> >   */
> >  #define __READ_ONCE(x)							\
> >  ({									\
> > -	typeof(&(x)) __x = &(x);					\
> > -	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
> > +	auto __x = &(x);						\
> > +	auto __ret = (__rwonce_typeof_unqual(*__x) *)__x;		\
> > +	/* Hides alias reassignment from Clang's -Wthread-safety. */	\
> > +	auto __retp = &__ret;						\
> > +	union { typeof(*__ret) __val; char __c[1]; } __u;		\
> > +	*__retp = &__u.__val;						\
> >  	switch (sizeof(x)) {						\
> >  	case 1:								\
> >  		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
> > @@ -71,7 +75,7 @@
> >  	default:							\
> >  		__u.__val = *(volatile typeof(*__x) *)__x;		\
> >  	}								\
> > -	__u.__val;							\
> > +	*__ret;								\
> >  })  
> 
> What does GCC do with this? :/

GCC currently doesn't see it, LTO is clang only.

	David

> 
> Will



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-02 19:29     ` David Laight
@ 2026-02-03 11:47       ` Will Deacon
  2026-02-04 10:46         ` Marco Elver
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2026-02-03 11:47 UTC (permalink / raw)
  To: David Laight
  Cc: Marco Elver, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng

On Mon, Feb 02, 2026 at 07:29:23PM +0000, David Laight wrote:
> On Mon, 2 Feb 2026 15:39:36 +0000
> Will Deacon <will@kernel.org> wrote:
> 
> > On Fri, Jan 30, 2026 at 02:28:26PM +0100, Marco Elver wrote:
> > > When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
> > > kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
> > > could see:
> > > 
> > > | kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
> > > |      982 | }
> > > |          | ^
> > > |    kernel/futex/core.c:976:2: note: spinlock acquired here
> > > |      976 |         spin_lock(lock_ptr);
> > > |          |         ^
> > > | kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
> > > |      982 | }
> > > |          | ^
> > > |    kernel/futex/core.c:966:6: note: spinlock acquired here
> > > |      966 | void futex_q_lockptr_lock(struct futex_q *q)
> > > |          |      ^
> > > |    2 warnings generated.
> > > 
> > > Where we have:
> > > 
> > > 	extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
> > > 	..
> > > 	void futex_q_lockptr_lock(struct futex_q *q)
> > > 	{
> > > 		spinlock_t *lock_ptr;
> > > 
> > > 		/*
> > > 		 * See futex_unqueue() why lock_ptr can change.
> > > 		 */
> > > 		guard(rcu)();
> > > 	retry:  
> > > >>		lock_ptr = READ_ONCE(q->lock_ptr);  
> > > 		spin_lock(lock_ptr);
> > > 	...
> > > 	}
> > > 
> > > At the time of the above report (prior to removal of the 'atomic' flag),
> > > Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
> > > 'atomic ?  __u.__val : q->lock_ptr' (now just '__u.__val'), and used
> > > this as the identity of the context lock given it cannot "see through"
> > > the inline assembly; however, we want 'q->lock_ptr' as the canonical
> > > context lock.
> > > 
> > > While for code generation the compiler simplified to '__u.__val' for
> > > pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
> > > much earlier on the AST, and (b) would be the wrong deduction.
> > > 
> > > Now that we've gotten rid of the 'atomic' ternary comparison, we can
> > > return '__u.__val' through a pointer that we initialize with '&x', but
> > > then update via a pointer-to-pointer. When READ_ONCE()'ing a context
> > > lock pointer, TSA's alias analysis does not invalidate the initial alias
> > > when updated through the pointer-to-pointer, and we make it effectively
> > > "see through" the __READ_ONCE().
> > > 
> > > Code generation is unchanged.
> > > 
> > > Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
> > > Reported-by: kernel test robot <lkp@intel.com>
> > > Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Tested-by: Boqun Feng <boqun@kernel.org>
> > > Signed-off-by: Marco Elver <elver@google.com>
> > > ---
> > > v3:
> > > * Use 'typeof(*__ret)'.
> > > * Commit message.
> > > 
> > > v2:
> > > * Rebase.
> > > ---
> > >  arch/arm64/include/asm/rwonce.h | 10 +++++++---
> > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> > > index 42c9e8429274..b7de74d4bf07 100644
> > > --- a/arch/arm64/include/asm/rwonce.h
> > > +++ b/arch/arm64/include/asm/rwonce.h
> > > @@ -45,8 +45,12 @@
> > >   */
> > >  #define __READ_ONCE(x)							\
> > >  ({									\
> > > -	typeof(&(x)) __x = &(x);					\
> > > -	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
> > > +	auto __x = &(x);						\
> > > +	auto __ret = (__rwonce_typeof_unqual(*__x) *)__x;		\
> > > +	/* Hides alias reassignment from Clang's -Wthread-safety. */	\
> > > +	auto __retp = &__ret;						\
> > > +	union { typeof(*__ret) __val; char __c[1]; } __u;		\
> > > +	*__retp = &__u.__val;						\
> > >  	switch (sizeof(x)) {						\
> > >  	case 1:								\
> > >  		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
> > > @@ -71,7 +75,7 @@
> > >  	default:							\
> > >  		__u.__val = *(volatile typeof(*__x) *)__x;		\
> > >  	}								\
> > > -	__u.__val;							\
> > > +	*__ret;								\
> > >  })  
> > 
> > What does GCC do with this? :/
> 
> GCC currently doesn't see it, LTO is clang only.

LTO is just one way that a compiler could end up breaking dependency
chains, so I really want to maintain the option to enable this path for
GCC in case we run into problems caused by other optimisations in future.

Will


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-03 11:47       ` Will Deacon
@ 2026-02-04 10:46         ` Marco Elver
  2026-02-04 13:14           ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Marco Elver @ 2026-02-04 10:46 UTC (permalink / raw)
  To: Will Deacon
  Cc: David Laight, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng

On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
[...]
> > > What does GCC do with this? :/
> >
> > GCC currently doesn't see it, LTO is clang only.
>
> LTO is just one way that a compiler could end up breaking dependency
> chains, so I really want to maintain the option to enable this path for
> GCC in case we run into problems caused by other optimisations in future.

It will work for GCC, but only from GCC 11. Before that __auto_type
does not drop qualifiers:
https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)

So to summarize, all supported Clang versions deal with __auto_type
correctly for the fallback; GCC from version 11 does (kernel currently
supports GCC 8 and above). From GCC 14 and Clang 19 we have
__typeof_unqual__.

I really don't see another way forward; there's no other good way to
solve this issue. I would advise against pessimizing new compilers and
features because maybe one day we might still want to enable this
version of READ_ONCE() for GCC 8-10.

Should we one day choose to enable this READ_ONCE() version for GCC,
we will (a) either have bumped the minimum GCC version to 11+, or (b)
we can only do so from GCC 11. At this point GCC 11 was released 5
years ago!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-04 10:46         ` Marco Elver
@ 2026-02-04 13:14           ` Peter Zijlstra
  2026-02-04 14:15             ` Will Deacon
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2026-02-04 13:14 UTC (permalink / raw)
  To: Marco Elver
  Cc: Will Deacon, David Laight, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng, Linus Torvalds

On Wed, Feb 04, 2026 at 11:46:02AM +0100, Marco Elver wrote:
> On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
> [...]
> > > > What does GCC do with this? :/
> > >
> > > GCC currently doesn't see it, LTO is clang only.
> >
> > LTO is just one way that a compiler could end up breaking dependency
> > chains, so I really want to maintain the option to enable this path for
> > GCC in case we run into problems caused by other optimisations in future.
> 
> It will work for GCC, but only from GCC 11. Before that __auto_type
> does not drop qualifiers:
> https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)
> 
> So to summarize, all supported Clang versions deal with __auto_type
> correctly for the fallback; GCC from version 11 does (kernel currently
> supports GCC 8 and above). From GCC 14 and Clang 19 we have
> __typeof_unqual__.
> 
> I really don't see another way forward; there's no other good way to
> solve this issue. I would advise against pessimizing new compilers and
> features because maybe one day we might still want to enable this
> version of READ_ONCE() for GCC 8-10.
> 
> Should we one day choose to enable this READ_ONCE() version for GCC,
> we will (a) either have bumped the minimum GCC version to 11+, or (b)
> we can only do so from GCC 11. At this point GCC 11 was released 5
> years ago!

There is, from this thread:

  https://lkml.kernel.org/r/20260111182010.GH3634291@ZenIV

another trick to strip qualifiers:

  #define unqual_non_array(T) __typeof__(((T(*)(void))0)())

which will work from GCC-8.4 onwards. Arguably, it should be possible to
raise the minimum from 8 to 8.4 (IMO).

But yes; in general I think it is fine to have 'old' compilers generate
suboptimal code.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-04 13:14           ` Peter Zijlstra
@ 2026-02-04 14:15             ` Will Deacon
  2026-02-06 15:09               ` Marco Elver
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2026-02-04 14:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marco Elver, David Laight, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng, Linus Torvalds

On Wed, Feb 04, 2026 at 02:14:00PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 04, 2026 at 11:46:02AM +0100, Marco Elver wrote:
> > On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
> > [...]
> > > > > What does GCC do with this? :/
> > > >
> > > > GCC currently doesn't see it, LTO is clang only.
> > >
> > > LTO is just one way that a compiler could end up breaking dependency
> > > chains, so I really want to maintain the option to enable this path for
> > > GCC in case we run into problems caused by other optimisations in future.
> > 
> > It will work for GCC, but only from GCC 11. Before that __auto_type
> > does not drop qualifiers:
> > https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)
> > 
> > So to summarize, all supported Clang versions deal with __auto_type
> > correctly for the fallback; GCC from version 11 does (kernel currently
> > supports GCC 8 and above). From GCC 14 and Clang 19 we have
> > __typeof_unqual__.
> > 
> > I really don't see another way forward; there's no other good way to
> > solve this issue. I would advise against pessimizing new compilers and
> > features because maybe one day we might still want to enable this
> > version of READ_ONCE() for GCC 8-10.
> > 
> > Should we one day choose to enable this READ_ONCE() version for GCC,
> > we will (a) either have bumped the minimum GCC version to 11+, or (b)
> > we can only do so from GCC 11. At this point GCC 11 was released 5
> > years ago!
> 
> There is, from this thread:
> 
>   https://lkml.kernel.org/r/20260111182010.GH3634291@ZenIV
> 
> another trick to strip qualifiers:
> 
>   #define unqual_non_array(T) __typeof__(((T(*)(void))0)())
> 
> which will work from GCC-8.4 onwards. Arguably, it should be possible to
> raise the minimum from 8 to 8.4 (IMO).

That sounds reasonable to me but I'm not usually the one to push back
on raising the minimum compiler version!

> But yes; in general I think it is fine to have 'old' compilers generate
> suboptimal code.

I'm absolutely fine with the codegen being terrible for ancient
toolchains as long as it's correct.

Will


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-04 14:15             ` Will Deacon
@ 2026-02-06 15:09               ` Marco Elver
  2026-02-06 18:26                 ` David Laight
  0 siblings, 1 reply; 37+ messages in thread
From: Marco Elver @ 2026-02-06 15:09 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, David Laight, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng, Linus Torvalds, Al Viro

 On Wed, 4 Feb 2026 at 15:15, Will Deacon <will@kernel.org> wrote:
>
> On Wed, Feb 04, 2026 at 02:14:00PM +0100, Peter Zijlstra wrote:
> > On Wed, Feb 04, 2026 at 11:46:02AM +0100, Marco Elver wrote:
> > > On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
> > > [...]
> > > > > > What does GCC do with this? :/
> > > > >
> > > > > GCC currently doesn't see it, LTO is clang only.
> > > >
> > > > LTO is just one way that a compiler could end up breaking dependency
> > > > chains, so I really want to maintain the option to enable this path for
> > > > GCC in case we run into problems caused by other optimisations in future.
> > >
> > > It will work for GCC, but only from GCC 11. Before that __auto_type
> > > does not drop qualifiers:
> > > https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)
> > >
> > > So to summarize, all supported Clang versions deal with __auto_type
> > > correctly for the fallback; GCC from version 11 does (kernel currently
> > > supports GCC 8 and above). From GCC 14 and Clang 19 we have
> > > __typeof_unqual__.
> > >
> > > I really don't see another way forward; there's no other good way to
> > > solve this issue. I would advise against pessimizing new compilers and
> > > features because maybe one day we might still want to enable this
> > > version of READ_ONCE() for GCC 8-10.
> > >
> > > Should we one day choose to enable this READ_ONCE() version for GCC,
> > > we will (a) either have bumped the minimum GCC version to 11+, or (b)
> > > we can only do so from GCC 11. At this point GCC 11 was released 5
> > > years ago!
> >
> > There is, from this thread:
> >
> >   https://lkml.kernel.org/r/20260111182010.GH3634291@ZenIV
> >
> > another trick to strip qualifiers:
> >
> >   #define unqual_non_array(T) __typeof__(((T(*)(void))0)())
> >
> > which will work from GCC-8.4 onwards. Arguably, it should be possible to
> > raise the minimum from 8 to 8.4 (IMO).

That looks like an interesting option.

> That sounds reasonable to me but I'm not usually the one to push back
> on raising the minimum compiler version!
>
> > But yes; in general I think it is fine to have 'old' compilers generate
> > suboptimal code.
>
> I'm absolutely fine with the codegen being terrible for ancient
> toolchains as long as it's correct.

From that discussion a month ago and this one, it seems we need
something to fix __unqual_scalar_typeof().

What's the way forward?

1. Bump minimum GCC version to 8.4. Replace __unqual_scalar_typeof()
for old compilers with the better unqual_non_array hack?

2. Leave __unqual_scalar_typeof() as-is. The patch "compiler: Use
__typeof_unqual__() for __unqual_scalar_typeof()" will fix the codegen
issues for new compilers. Doesn't fix not dropping 'const' for old
compilers for non-scalar types, and requires localized workarounds
(like this patch here).

Either way we need a fix for this arm64 LTO version to fix the
context-analysis "see through" the inline asm (how this patch series
started).

Option #1 needs a lot more due-diligence and testing that it all works
for all compilers and configs (opening Pandora's Box :-)). For option
#2 we just need these patches here to at least fix the acute issue
with this arm64 LTO version.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-06 15:09               ` Marco Elver
@ 2026-02-06 18:26                 ` David Laight
  2026-02-15 21:55                   ` Marco Elver
  0 siblings, 1 reply; 37+ messages in thread
From: David Laight @ 2026-02-06 18:26 UTC (permalink / raw)
  To: Marco Elver
  Cc: Will Deacon, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng, Linus Torvalds, Al Viro

On Fri, 6 Feb 2026 16:09:35 +0100
Marco Elver <elver@google.com> wrote:

>  On Wed, 4 Feb 2026 at 15:15, Will Deacon <will@kernel.org> wrote:
> >
> > On Wed, Feb 04, 2026 at 02:14:00PM +0100, Peter Zijlstra wrote:  
> > > On Wed, Feb 04, 2026 at 11:46:02AM +0100, Marco Elver wrote:  
> > > > On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
> > > > [...]  
> > > > > > > What does GCC do with this? :/  
> > > > > >
> > > > > > GCC currently doesn't see it, LTO is clang only.  
> > > > >
> > > > > LTO is just one way that a compiler could end up breaking dependency
> > > > > chains, so I really want to maintain the option to enable this path for
> > > > > GCC in case we run into problems caused by other optimisations in future.  
> > > >
> > > > It will work for GCC, but only from GCC 11. Before that __auto_type
> > > > does not drop qualifiers:
> > > > https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)
> > > >
> > > > So to summarize, all supported Clang versions deal with __auto_type
> > > > correctly for the fallback; GCC from version 11 does (kernel currently
> > > > supports GCC 8 and above). From GCC 14 and Clang 19 we have
> > > > __typeof_unqual__.
> > > >
> > > > I really don't see another way forward; there's no other good way to
> > > > solve this issue. I would advise against pessimizing new compilers and
> > > > features because maybe one day we might still want to enable this
> > > > version of READ_ONCE() for GCC 8-10.
> > > >
> > > > Should we one day choose to enable this READ_ONCE() version for GCC,
> > > > we will (a) either have bumped the minimum GCC version to 11+, or (b)
> > > > we can only do so from GCC 11. At this point GCC 11 was released 5
> > > > years ago!  
> > >
> > > There is, from this thread:
> > >
> > >   https://lkml.kernel.org/r/20260111182010.GH3634291@ZenIV
> > >
> > > another trick to strip qualifiers:
> > >
> > >   #define unqual_non_array(T) __typeof__(((T(*)(void))0)())
> > >
> > > which will work from GCC-8.4 onwards. Arguably, it should be possible to
> > > raise the minimum from 8 to 8.4 (IMO).  
> 
> That looks like an interesting option.
> 
> > That sounds reasonable to me but I'm not usually the one to push back
> > on raising the minimum compiler version!
> >  
> > > But yes; in general I think it is fine to have 'old' compilers generate
> > > suboptimal code.  
> >
> > I'm absolutely fine with the codegen being terrible for ancient
> > toolchains as long as it's correct.  
> 
> From that discussion a month ago and this one, it seems we need
> something to fix __unqual_scalar_typeof().
> 
> What's the way forward?
> 
> 1. Bump minimum GCC version to 8.4. Replace __unqual_scalar_typeof()
> for old compilers with the better unqual_non_array hack?
> 
> 2. Leave __unqual_scalar_typeof() as-is. The patch "compiler: Use
> __typeof_unqual__() for __unqual_scalar_typeof()" will fix the codegen
> issues for new compilers. Doesn't fix not dropping 'const' for old
> compilers for non-scalar types, and requires localized workarounds
> (like this patch here).
> 
> Either way we need a fix for this arm64 LTO version to fix the
> context-analysis "see through" the inline asm (how this patch series
> started).
> 
> Option #1 needs a lot more due-diligence and testing that it all works
> for all compilers and configs (opening Pandora's Box :-)). For option
> #2 we just need these patches here to at least fix the acute issue
> with this arm64 LTO version.

Option 3.

Look are where/why they are used and change the code to do it differently.
Don't forget the similar __unsigned_scalar_typeof() in bitfield.h.
(I posted a patch that nuked that one not long ago - used sizeof instead.)

The one in minmax_array (in minmax.h) is particularly pointless.
The value 'suffers' integer promotion as soon as it is used, nothing
wrong with 'auto _x = x + 0' there.
That will work elsewhere.

	David




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-06 18:26                 ` David Laight
@ 2026-02-15 21:55                   ` Marco Elver
  2026-02-15 22:16                     ` David Laight
  0 siblings, 1 reply; 37+ messages in thread
From: Marco Elver @ 2026-02-15 21:55 UTC (permalink / raw)
  To: David Laight
  Cc: Will Deacon, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng, Linus Torvalds, Al Viro

On Fri, 6 Feb 2026 at 19:26, David Laight <david.laight.linux@gmail.com> wrote:
> On Fri, 6 Feb 2026 16:09:35 +0100
> Marco Elver <elver@google.com> wrote:
>
> >  On Wed, 4 Feb 2026 at 15:15, Will Deacon <will@kernel.org> wrote:
> > >
> > > On Wed, Feb 04, 2026 at 02:14:00PM +0100, Peter Zijlstra wrote:
> > > > On Wed, Feb 04, 2026 at 11:46:02AM +0100, Marco Elver wrote:
> > > > > On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
> > > > > [...]
> > > > > > > > What does GCC do with this? :/
> > > > > > >
> > > > > > > GCC currently doesn't see it, LTO is clang only.
> > > > > >
> > > > > > LTO is just one way that a compiler could end up breaking dependency
> > > > > > chains, so I really want to maintain the option to enable this path for
> > > > > > GCC in case we run into problems caused by other optimisations in future.
> > > > >
> > > > > It will work for GCC, but only from GCC 11. Before that __auto_type
> > > > > does not drop qualifiers:
> > > > > https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)
> > > > >
> > > > > So to summarize, all supported Clang versions deal with __auto_type
> > > > > correctly for the fallback; GCC from version 11 does (kernel currently
> > > > > supports GCC 8 and above). From GCC 14 and Clang 19 we have
> > > > > __typeof_unqual__.
> > > > >
> > > > > I really don't see another way forward; there's no other good way to
> > > > > solve this issue. I would advise against pessimizing new compilers and
> > > > > features because maybe one day we might still want to enable this
> > > > > version of READ_ONCE() for GCC 8-10.
> > > > >
> > > > > Should we one day choose to enable this READ_ONCE() version for GCC,
> > > > > we will (a) either have bumped the minimum GCC version to 11+, or (b)
> > > > > we can only do so from GCC 11. At this point GCC 11 was released 5
> > > > > years ago!
> > > >
> > > > There is, from this thread:
> > > >
> > > >   https://lkml.kernel.org/r/20260111182010.GH3634291@ZenIV
> > > >
> > > > another trick to strip qualifiers:
> > > >
> > > >   #define unqual_non_array(T) __typeof__(((T(*)(void))0)())
> > > >
> > > > which will work from GCC-8.4 onwards. Arguably, it should be possible to
> > > > raise the minimum from 8 to 8.4 (IMO).
> >
> > That looks like an interesting option.
> >
> > > That sounds reasonable to me but I'm not usually the one to push back
> > > on raising the minimum compiler version!
> > >
> > > > But yes; in general I think it is fine to have 'old' compilers generate
> > > > suboptimal code.
> > >
> > > I'm absolutely fine with the codegen being terrible for ancient
> > > toolchains as long as it's correct.
> >
> > From that discussion a month ago and this one, it seems we need
> > something to fix __unqual_scalar_typeof().
> >
> > What's the way forward?
> >
> > 1. Bump minimum GCC version to 8.4. Replace __unqual_scalar_typeof()
> > for old compilers with the better unqual_non_array hack?
> >
> > 2. Leave __unqual_scalar_typeof() as-is. The patch "compiler: Use
> > __typeof_unqual__() for __unqual_scalar_typeof()" will fix the codegen
> > issues for new compilers. Doesn't fix not dropping 'const' for old
> > compilers for non-scalar types, and requires localized workarounds
> > (like this patch here).
> >
> > Either way we need a fix for this arm64 LTO version to fix the
> > context-analysis "see through" the inline asm (how this patch series
> > started).
> >
> > Option #1 needs a lot more due-diligence and testing that it all works
> > for all compilers and configs (opening Pandora's Box :-)). For option
> > #2 we just need these patches here to at least fix the acute issue
> > with this arm64 LTO version.
>
> Option 3.
>
> Look are where/why they are used and change the code to do it differently.
> Don't forget the similar __unsigned_scalar_typeof() in bitfield.h.
> (I posted a patch that nuked that one not long ago - used sizeof instead.)
>
> The one in minmax_array (in minmax.h) is particularly pointless.
> The value 'suffers' integer promotion as soon as it is used, nothing
> wrong with 'auto _x = x + 0' there.
> That will work elsewhere.

Agreed that getting rid of __unqual_scalar_typeof() in favor of 'auto'
where possible is the way to go.

Unfortunately I spent the last week occasionally glancing at this
arm64 READ_ONCE problem, and could not come up with something that
avoids using typeof_unqual() or __unqual_scalar_typeof(). I'm inclined
to go with the unqual_non_array hack, but not make this available as a
macro for general use - we have too many of these horrid macros, don't
want to add more to this hack pile.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-15 21:55                   ` Marco Elver
@ 2026-02-15 22:16                     ` David Laight
  2026-02-15 22:43                       ` Marco Elver
  0 siblings, 1 reply; 37+ messages in thread
From: David Laight @ 2026-02-15 22:16 UTC (permalink / raw)
  To: Marco Elver
  Cc: Will Deacon, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng, Linus Torvalds, Al Viro

On Sun, 15 Feb 2026 22:55:44 +0100
Marco Elver <elver@google.com> wrote:

> On Fri, 6 Feb 2026 at 19:26, David Laight <david.laight.linux@gmail.com> wrote:
> > On Fri, 6 Feb 2026 16:09:35 +0100
> > Marco Elver <elver@google.com> wrote:
> >  
> > >  On Wed, 4 Feb 2026 at 15:15, Will Deacon <will@kernel.org> wrote:  
> > > >
> > > > On Wed, Feb 04, 2026 at 02:14:00PM +0100, Peter Zijlstra wrote:  
> > > > > On Wed, Feb 04, 2026 at 11:46:02AM +0100, Marco Elver wrote:  
> > > > > > On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
> > > > > > [...]  
> > > > > > > > > What does GCC do with this? :/  
> > > > > > > >
> > > > > > > > GCC currently doesn't see it, LTO is clang only.  
> > > > > > >
> > > > > > > LTO is just one way that a compiler could end up breaking dependency
> > > > > > > chains, so I really want to maintain the option to enable this path for
> > > > > > > GCC in case we run into problems caused by other optimisations in future.  
> > > > > >
> > > > > > It will work for GCC, but only from GCC 11. Before that __auto_type
> > > > > > does not drop qualifiers:
> > > > > > https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)
> > > > > >
> > > > > > So to summarize, all supported Clang versions deal with __auto_type
> > > > > > correctly for the fallback; GCC from version 11 does (kernel currently
> > > > > > supports GCC 8 and above). From GCC 14 and Clang 19 we have
> > > > > > __typeof_unqual__.
> > > > > >
> > > > > > I really don't see another way forward; there's no other good way to
> > > > > > solve this issue. I would advise against pessimizing new compilers and
> > > > > > features because maybe one day we might still want to enable this
> > > > > > version of READ_ONCE() for GCC 8-10.
> > > > > >
> > > > > > Should we one day choose to enable this READ_ONCE() version for GCC,
> > > > > > we will (a) either have bumped the minimum GCC version to 11+, or (b)
> > > > > > we can only do so from GCC 11. At this point GCC 11 was released 5
> > > > > > years ago!  
> > > > >
> > > > > There is, from this thread:
> > > > >
> > > > >   https://lkml.kernel.org/r/20260111182010.GH3634291@ZenIV
> > > > >
> > > > > another trick to strip qualifiers:
> > > > >
> > > > >   #define unqual_non_array(T) __typeof__(((T(*)(void))0)())
> > > > >
> > > > > which will work from GCC-8.4 onwards. Arguably, it should be possible to
> > > > > raise the minimum from 8 to 8.4 (IMO).  
> > >
> > > That looks like an interesting option.
> > >  
> > > > That sounds reasonable to me but I'm not usually the one to push back
> > > > on raising the minimum compiler version!
> > > >  
> > > > > But yes; in general I think it is fine to have 'old' compilers generate
> > > > > suboptimal code.  
> > > >
> > > > I'm absolutely fine with the codegen being terrible for ancient
> > > > toolchains as long as it's correct.  
> > >
> > > From that discussion a month ago and this one, it seems we need
> > > something to fix __unqual_scalar_typeof().
> > >
> > > What's the way forward?
> > >
> > > 1. Bump minimum GCC version to 8.4. Replace __unqual_scalar_typeof()
> > > for old compilers with the better unqual_non_array hack?
> > >
> > > 2. Leave __unqual_scalar_typeof() as-is. The patch "compiler: Use
> > > __typeof_unqual__() for __unqual_scalar_typeof()" will fix the codegen
> > > issues for new compilers. Doesn't fix not dropping 'const' for old
> > > compilers for non-scalar types, and requires localized workarounds
> > > (like this patch here).
> > >
> > > Either way we need a fix for this arm64 LTO version to fix the
> > > context-analysis "see through" the inline asm (how this patch series
> > > started).
> > >
> > > Option #1 needs a lot more due-diligence and testing that it all works
> > > for all compilers and configs (opening Pandora's Box :-)). For option
> > > #2 we just need these patches here to at least fix the acute issue
> > > with this arm64 LTO version.  
> >
> > Option 3.
> >
> > Look are where/why they are used and change the code to do it differently.
> > Don't forget the similar __unsigned_scalar_typeof() in bitfield.h.
> > (I posted a patch that nuked that one not long ago - used sizeof instead.)
> >
> > The one in minmax_array (in minmax.h) is particularly pointless.
> > The value 'suffers' integer promotion as soon as it is used, nothing
> > wrong with 'auto _x = x + 0' there.
> > That will work elsewhere.  
> 
> Agreed that getting rid of __unqual_scalar_typeof() in favor of 'auto'
> where possible is the way to go.
> 
> Unfortunately I spent the last week occasionally glancing at this
> arm64 READ_ONCE problem, and could not come up with something that
> avoids using typeof_unqual() or __unqual_scalar_typeof(). I'm inclined
> to go with the unqual_non_array hack, but not make this available as a
> macro for general use - we have too many of these horrid macros, don't
> want to add more to this hack pile.

Agreed, having to do such things inside what are already horrid 'functions'
is one thing, but when they get used in 'normal' code it is silly.

Have you checked whether sizes other than 1, 2, 4 and 8 are ever used?
There aren't any in an x86-64 allmodconfig build and it used to be an error.
Even if there are handful, having to use a different define wouldn't
really be an issue.
Removing that support would make READ_ONCE() easier to write/understand
and (hopefully) compile faster - there is a measurable cost for the
'size check' in the x86-64 build, the arm LTO expansion must be significant.

	David





^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-15 22:16                     ` David Laight
@ 2026-02-15 22:43                       ` Marco Elver
  2026-02-15 23:18                         ` David Laight
  2026-02-15 23:40                         ` Linus Torvalds
  0 siblings, 2 replies; 37+ messages in thread
From: Marco Elver @ 2026-02-15 22:43 UTC (permalink / raw)
  To: David Laight
  Cc: Will Deacon, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng, Linus Torvalds, Al Viro

On Sun, 15 Feb 2026 at 23:16, David Laight <david.laight.linux@gmail.com> wrote:
>
> On Sun, 15 Feb 2026 22:55:44 +0100
> Marco Elver <elver@google.com> wrote:
>
> > On Fri, 6 Feb 2026 at 19:26, David Laight <david.laight.linux@gmail.com> wrote:
> > > On Fri, 6 Feb 2026 16:09:35 +0100
> > > Marco Elver <elver@google.com> wrote:
> > >
> > > >  On Wed, 4 Feb 2026 at 15:15, Will Deacon <will@kernel.org> wrote:
> > > > >
> > > > > On Wed, Feb 04, 2026 at 02:14:00PM +0100, Peter Zijlstra wrote:
> > > > > > On Wed, Feb 04, 2026 at 11:46:02AM +0100, Marco Elver wrote:
> > > > > > > On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
> > > > > > > [...]
> > > > > > > > > > What does GCC do with this? :/
> > > > > > > > >
> > > > > > > > > GCC currently doesn't see it, LTO is clang only.
> > > > > > > >
> > > > > > > > LTO is just one way that a compiler could end up breaking dependency
> > > > > > > > chains, so I really want to maintain the option to enable this path for
> > > > > > > > GCC in case we run into problems caused by other optimisations in future.
> > > > > > >
> > > > > > > It will work for GCC, but only from GCC 11. Before that __auto_type
> > > > > > > does not drop qualifiers:
> > > > > > > https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)
> > > > > > >
> > > > > > > So to summarize, all supported Clang versions deal with __auto_type
> > > > > > > correctly for the fallback; GCC from version 11 does (kernel currently
> > > > > > > supports GCC 8 and above). From GCC 14 and Clang 19 we have
> > > > > > > __typeof_unqual__.
> > > > > > >
> > > > > > > I really don't see another way forward; there's no other good way to
> > > > > > > solve this issue. I would advise against pessimizing new compilers and
> > > > > > > features because maybe one day we might still want to enable this
> > > > > > > version of READ_ONCE() for GCC 8-10.
> > > > > > >
> > > > > > > Should we one day choose to enable this READ_ONCE() version for GCC,
> > > > > > > we will (a) either have bumped the minimum GCC version to 11+, or (b)
> > > > > > > we can only do so from GCC 11. At this point GCC 11 was released 5
> > > > > > > years ago!
> > > > > >
> > > > > > There is, from this thread:
> > > > > >
> > > > > >   https://lkml.kernel.org/r/20260111182010.GH3634291@ZenIV
> > > > > >
> > > > > > another trick to strip qualifiers:
> > > > > >
> > > > > >   #define unqual_non_array(T) __typeof__(((T(*)(void))0)())
> > > > > >
> > > > > > which will work from GCC-8.4 onwards. Arguably, it should be possible to
> > > > > > raise the minimum from 8 to 8.4 (IMO).
> > > >
> > > > That looks like an interesting option.
> > > >
> > > > > That sounds reasonable to me but I'm not usually the one to push back
> > > > > on raising the minimum compiler version!
> > > > >
> > > > > > But yes; in general I think it is fine to have 'old' compilers generate
> > > > > > suboptimal code.
> > > > >
> > > > > I'm absolutely fine with the codegen being terrible for ancient
> > > > > toolchains as long as it's correct.
> > > >
> > > > From that discussion a month ago and this one, it seems we need
> > > > something to fix __unqual_scalar_typeof().
> > > >
> > > > What's the way forward?
> > > >
> > > > 1. Bump minimum GCC version to 8.4. Replace __unqual_scalar_typeof()
> > > > for old compilers with the better unqual_non_array hack?
> > > >
> > > > 2. Leave __unqual_scalar_typeof() as-is. The patch "compiler: Use
> > > > __typeof_unqual__() for __unqual_scalar_typeof()" will fix the codegen
> > > > issues for new compilers. Doesn't fix not dropping 'const' for old
> > > > compilers for non-scalar types, and requires localized workarounds
> > > > (like this patch here).
> > > >
> > > > Either way we need a fix for this arm64 LTO version to fix the
> > > > context-analysis "see through" the inline asm (how this patch series
> > > > started).
> > > >
> > > > Option #1 needs a lot more due-diligence and testing that it all works
> > > > for all compilers and configs (opening Pandora's Box :-)). For option
> > > > #2 we just need these patches here to at least fix the acute issue
> > > > with this arm64 LTO version.
> > >
> > > Option 3.
> > >
> > > Look are where/why they are used and change the code to do it differently.
> > > Don't forget the similar __unsigned_scalar_typeof() in bitfield.h.
> > > (I posted a patch that nuked that one not long ago - used sizeof instead.)
> > >
> > > The one in minmax_array (in minmax.h) is particularly pointless.
> > > The value 'suffers' integer promotion as soon as it is used, nothing
> > > wrong with 'auto _x = x + 0' there.
> > > That will work elsewhere.
> >
> > Agreed that getting rid of __unqual_scalar_typeof() in favor of 'auto'
> > where possible is the way to go.
> >
> > Unfortunately I spent the last week occasionally glancing at this
> > arm64 READ_ONCE problem, and could not come up with something that
> > avoids using typeof_unqual() or __unqual_scalar_typeof(). I'm inclined
> > to go with the unqual_non_array hack, but not make this available as a
> > macro for general use - we have too many of these horrid macros, don't
> > want to add more to this hack pile.
>
> Agreed, having to do such things inside what are already horrid 'functions'
> is one thing, but when they get used in 'normal' code it is silly.
>
> Have you checked whether sizes other than 1, 2, 4 and 8 are ever used?
> There aren't any in an x86-64 allmodconfig build and it used to be an error.
> Even if there are handful, having to use a different define wouldn't
> really be an issue.
> Removing that support would make READ_ONCE() easier to write/understand
> and (hopefully) compile faster - there is a measurable cost for the
> 'size check' in the x86-64 build, the arm LTO expansion must be significant.

I found e.g. xen_get_runstate_snapshot_cpu_delta() uses the >8 byte
case via __READ_ONCE(). READ_ONCE() itself is already restricted to <=
8 bytes (due to that static assert), but that itself uses the
__READ_ONCE() helper which these patches were touching.

We could invert the game: have READ_ONCE() which just deals with <= 8
bytes. And __READ_ONCE() which uses READ_ONCE() if <= 8 bytes, and the
non-atomic case if >8 bytes. However, I fear the static size check
won't go away because the asm-generic version of __READ_ONCE() happily
works on any type (it's just a volatile cast+deref) - I don't know how
we'd enforce the size limit otherwise.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-15 22:43                       ` Marco Elver
@ 2026-02-15 23:18                         ` David Laight
  2026-02-15 23:40                         ` Linus Torvalds
  1 sibling, 0 replies; 37+ messages in thread
From: David Laight @ 2026-02-15 23:18 UTC (permalink / raw)
  To: Marco Elver
  Cc: Will Deacon, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Boqun Feng, Waiman Long, Bart Van Assche, llvm, Catalin Marinas,
	Arnd Bergmann, linux-arm-kernel, linux-kernel, kernel test robot,
	Boqun Feng, Linus Torvalds, Al Viro

On Sun, 15 Feb 2026 23:43:23 +0100
Marco Elver <elver@google.com> wrote:

> On Sun, 15 Feb 2026 at 23:16, David Laight <david.laight.linux@gmail.com> wrote:
> >
> > On Sun, 15 Feb 2026 22:55:44 +0100
> > Marco Elver <elver@google.com> wrote:
> >  
> > > On Fri, 6 Feb 2026 at 19:26, David Laight <david.laight.linux@gmail.com> wrote:  
> > > > On Fri, 6 Feb 2026 16:09:35 +0100
> > > > Marco Elver <elver@google.com> wrote:
> > > >  
> > > > >  On Wed, 4 Feb 2026 at 15:15, Will Deacon <will@kernel.org> wrote:  
> > > > > >
> > > > > > On Wed, Feb 04, 2026 at 02:14:00PM +0100, Peter Zijlstra wrote:  
> > > > > > > On Wed, Feb 04, 2026 at 11:46:02AM +0100, Marco Elver wrote:  
> > > > > > > > On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
> > > > > > > > [...]  
> > > > > > > > > > > What does GCC do with this? :/  
> > > > > > > > > >
> > > > > > > > > > GCC currently doesn't see it, LTO is clang only.  
> > > > > > > > >
> > > > > > > > > LTO is just one way that a compiler could end up breaking dependency
> > > > > > > > > chains, so I really want to maintain the option to enable this path for
> > > > > > > > > GCC in case we run into problems caused by other optimisations in future.  
> > > > > > > >
> > > > > > > > It will work for GCC, but only from GCC 11. Before that __auto_type
> > > > > > > > does not drop qualifiers:
> > > > > > > > https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)
> > > > > > > >
> > > > > > > > So to summarize, all supported Clang versions deal with __auto_type
> > > > > > > > correctly for the fallback; GCC from version 11 does (kernel currently
> > > > > > > > supports GCC 8 and above). From GCC 14 and Clang 19 we have
> > > > > > > > __typeof_unqual__.
> > > > > > > >
> > > > > > > > I really don't see another way forward; there's no other good way to
> > > > > > > > solve this issue. I would advise against pessimizing new compilers and
> > > > > > > > features because maybe one day we might still want to enable this
> > > > > > > > version of READ_ONCE() for GCC 8-10.
> > > > > > > >
> > > > > > > > Should we one day choose to enable this READ_ONCE() version for GCC,
> > > > > > > > we will (a) either have bumped the minimum GCC version to 11+, or (b)
> > > > > > > > we can only do so from GCC 11. At this point GCC 11 was released 5
> > > > > > > > years ago!  
> > > > > > >
> > > > > > > There is, from this thread:
> > > > > > >
> > > > > > >   https://lkml.kernel.org/r/20260111182010.GH3634291@ZenIV
> > > > > > >
> > > > > > > another trick to strip qualifiers:
> > > > > > >
> > > > > > >   #define unqual_non_array(T) __typeof__(((T(*)(void))0)())
> > > > > > >
> > > > > > > which will work from GCC-8.4 onwards. Arguably, it should be possible to
> > > > > > > raise the minimum from 8 to 8.4 (IMO).  
> > > > >
> > > > > That looks like an interesting option.
> > > > >  
> > > > > > That sounds reasonable to me but I'm not usually the one to push back
> > > > > > on raising the minimum compiler version!
> > > > > >  
> > > > > > > But yes; in general I think it is fine to have 'old' compilers generate
> > > > > > > suboptimal code.  
> > > > > >
> > > > > > I'm absolutely fine with the codegen being terrible for ancient
> > > > > > toolchains as long as it's correct.  
> > > > >
> > > > > From that discussion a month ago and this one, it seems we need
> > > > > something to fix __unqual_scalar_typeof().
> > > > >
> > > > > What's the way forward?
> > > > >
> > > > > 1. Bump minimum GCC version to 8.4. Replace __unqual_scalar_typeof()
> > > > > for old compilers with the better unqual_non_array hack?
> > > > >
> > > > > 2. Leave __unqual_scalar_typeof() as-is. The patch "compiler: Use
> > > > > __typeof_unqual__() for __unqual_scalar_typeof()" will fix the codegen
> > > > > issues for new compilers. Doesn't fix not dropping 'const' for old
> > > > > compilers for non-scalar types, and requires localized workarounds
> > > > > (like this patch here).
> > > > >
> > > > > Either way we need a fix for this arm64 LTO version to fix the
> > > > > context-analysis "see through" the inline asm (how this patch series
> > > > > started).
> > > > >
> > > > > Option #1 needs a lot more due-diligence and testing that it all works
> > > > > for all compilers and configs (opening Pandora's Box :-)). For option
> > > > > #2 we just need these patches here to at least fix the acute issue
> > > > > with this arm64 LTO version.  
> > > >
> > > > Option 3.
> > > >
> > > > Look are where/why they are used and change the code to do it differently.
> > > > Don't forget the similar __unsigned_scalar_typeof() in bitfield.h.
> > > > (I posted a patch that nuked that one not long ago - used sizeof instead.)
> > > >
> > > > The one in minmax_array (in minmax.h) is particularly pointless.
> > > > The value 'suffers' integer promotion as soon as it is used, nothing
> > > > wrong with 'auto _x = x + 0' there.
> > > > That will work elsewhere.  
> > >
> > > Agreed that getting rid of __unqual_scalar_typeof() in favor of 'auto'
> > > where possible is the way to go.
> > >
> > > Unfortunately I spent the last week occasionally glancing at this
> > > arm64 READ_ONCE problem, and could not come up with something that
> > > avoids using typeof_unqual() or __unqual_scalar_typeof(). I'm inclined
> > > to go with the unqual_non_array hack, but not make this available as a
> > > macro for general use - we have too many of these horrid macros, don't
> > > want to add more to this hack pile.  
> >
> > Agreed, having to do such things inside what are already horrid 'functions'
> > is one thing, but when they get used in 'normal' code it is silly.
> >
> > Have you checked whether sizes other than 1, 2, 4 and 8 are ever used?
> > There aren't any in an x86-64 allmodconfig build and it used to be an error.
> > Even if there are handful, having to use a different define wouldn't
> > really be an issue.
> > Removing that support would make READ_ONCE() easier to write/understand
> > and (hopefully) compile faster - there is a measurable cost for the
> > 'size check' in the x86-64 build, the arm LTO expansion must be significant.  
> 
> I found e.g. xen_get_runstate_snapshot_cpu_delta() uses the >8 byte
> case via __READ_ONCE(). READ_ONCE() itself is already restricted to <=
> 8 bytes (due to that static assert), but that itself uses the
> __READ_ONCE() helper which these patches were touching.

One thing that might reduce the cost of that static_assert is to move
the error_function out of it - defining that in every expansion can't help.
A few places do that, but it really needs a helper - say:
#define compiletime_assert_fn(fn, msg) \
	__noreturn extern void fn(void) __compiletime_assert(msg) 

> 
> We could invert the game: have READ_ONCE() which just deals with <= 8
> bytes. And __READ_ONCE() which uses READ_ONCE() if <= 8 bytes, and the
> non-atomic case if >8 bytes. However, I fear the static size check
> won't go away because the asm-generic version of __READ_ONCE() happily
> works on any type (it's just a volatile cast+deref) - I don't know how
> we'd enforce the size limit otherwise.

That should probably be a NON_ATOMIC_READ_ONCE() that doesn't 'fall-back'.

	David




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-15 22:43                       ` Marco Elver
  2026-02-15 23:18                         ` David Laight
@ 2026-02-15 23:40                         ` Linus Torvalds
  2026-02-16 11:09                           ` David Laight
  1 sibling, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2026-02-15 23:40 UTC (permalink / raw)
  To: Marco Elver
  Cc: David Laight, Will Deacon, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, Boqun Feng, Waiman Long, Bart Van Assche, llvm,
	Catalin Marinas, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	kernel test robot, Boqun Feng, Al Viro

On Sun, 15 Feb 2026 at 14:44, Marco Elver <elver@google.com> wrote:
>
> I found e.g. xen_get_runstate_snapshot_cpu_delta() uses the >8 byte
> case via __READ_ONCE(). READ_ONCE() itself is already restricted to <=
> 8 bytes (due to that static assert), but that itself uses the
> __READ_ONCE() helper which these patches were touching.

I think we could just make __READ_ONCE() be totally unchecked - just
make it be "const volatile typeof()" and leave it at that.

Anybody who uses __READ_ONCE() would then have to deal with it.

There are very few users of that left, I think it's mainly
arch_atomic_read(), which just doesn't want any of the checking that a
regular "READ_ONCE()" does.

In fact, there are *so* few users left that I think we could probably
just remove __READ_ONCE() entirely, and make our "atomic_t" use
"volatile" in the type itself.

I generally absolutely hate volatile on data structures - it's a
design mistake (an understandable one: it was at least partly designed
for memory mapped IO accesses), and the volatile should be in the
accesses, not the data, because very often the volatility of the data
depends on context, not on the data.

But our "atomic_t" is already properly wrapped, and nobody should be
accessing it with anything but our helpers, so putting the volatile
there looks ok.

             Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-15 23:40                         ` Linus Torvalds
@ 2026-02-16 11:09                           ` David Laight
  2026-02-16 15:32                             ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: David Laight @ 2026-02-16 11:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Marco Elver, Will Deacon, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, Boqun Feng, Waiman Long, Bart Van Assche, llvm,
	Catalin Marinas, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	kernel test robot, Boqun Feng, Al Viro

On Sun, 15 Feb 2026 15:40:07 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sun, 15 Feb 2026 at 14:44, Marco Elver <elver@google.com> wrote:
> >
> > I found e.g. xen_get_runstate_snapshot_cpu_delta() uses the >8 byte
> > case via __READ_ONCE(). READ_ONCE() itself is already restricted to <=
> > 8 bytes (due to that static assert), but that itself uses the
> > __READ_ONCE() helper which these patches were touching.  
> 
> I think we could just make __READ_ONCE() be totally unchecked - just
> make it be "const volatile typeof()" and leave it at that.
> 
> Anybody who uses __READ_ONCE() would then have to deal with it.
> 
> There are very few users of that left, I think it's mainly
> arch_atomic_read(), which just doesn't want any of the checking that a
> regular "READ_ONCE()" does.
> 
> In fact, there are *so* few users left that I think we could probably
> just remove __READ_ONCE() entirely, and make our "atomic_t" use
> "volatile" in the type itself.
> 
> I generally absolutely hate volatile on data structures - it's a
> design mistake (an understandable one: it was at least partly designed
> for memory mapped IO accesses), and the volatile should be in the
> accesses, not the data, because very often the volatility of the data
> depends on context, not on the data.

IIRC The bots are now bleating when there is a READ_ONCE() in one
place but a write without a matching WRITE_ONCE().
That is effectively forcing all the accesses to be volatile regardless
of the context.
Which is pretty much equivalent to making the structure members volatile.
So if all you care about is 'inverted CSE' and read/write tearing
then volatile structure members DTRT.
What you probably don't want is 'volatile struct foo *'.

volatile structure members are almost free, you lose CSE and some versions
of gcc 'forget' that a load zero/sign extended a byte and do it again.
(I had to use barrier() rather than volatile in some code where I really
did care about single instructions.)

I've never see gcc reload a local, but I have seen it do CSE then
spill the value to stack.

	David


> 
> But our "atomic_t" is already properly wrapped, and nobody should be
> accessing it with anything but our helpers, so putting the volatile
> there looks ok.
> 
>              Linus
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-16 11:09                           ` David Laight
@ 2026-02-16 15:32                             ` Linus Torvalds
  2026-02-16 17:43                               ` David Laight
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2026-02-16 15:32 UTC (permalink / raw)
  To: David Laight
  Cc: Marco Elver, Will Deacon, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, Boqun Feng, Waiman Long, Bart Van Assche, llvm,
	Catalin Marinas, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	kernel test robot, Boqun Feng, Al Viro

On Mon, 16 Feb 2026 at 03:09, David Laight <david.laight.linux@gmail.com> wrote:
>
> volatile structure members are almost free

No, gcc does absolutely horrible things with volatiles. It disables a
lot of very basic stuff.

Try doing something as simple as a "var++" on a volatile, and cry.

We need to have explicit 'READ_ONCE()' etc atomic accesses, just to
make it very very clear that the compiler will generate shit code.

We do *not* hide them and make them implicit by marking data
structures volatile. I very much want those explicit
READ_ONCE/WRITE_ONCE cases.

           Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-16 15:32                             ` Linus Torvalds
@ 2026-02-16 17:43                               ` David Laight
  2026-02-17 12:16                                 ` Peter Zijlstra
  2026-02-17 16:23                                 ` Linus Torvalds
  0 siblings, 2 replies; 37+ messages in thread
From: David Laight @ 2026-02-16 17:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Marco Elver, Will Deacon, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, Boqun Feng, Waiman Long, Bart Van Assche, llvm,
	Catalin Marinas, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	kernel test robot, Boqun Feng, Al Viro

On Mon, 16 Feb 2026 07:32:53 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, 16 Feb 2026 at 03:09, David Laight <david.laight.linux@gmail.com> wrote:
> >
> > volatile structure members are almost free  
> 
> No, gcc does absolutely horrible things with volatiles. It disables a
> lot of very basic stuff.
> 
> Try doing something as simple as a "var++" on a volatile, and cry.

On x86 I just see a load, inc, store - not that surprising really.
(clang did do 'inc memory'.)

It's not as though 'inc memory' is atomic (without a lock prefix).

Also var++ will be 3 u-ops the same as the read, inc, write so the
underlying execution is much the same (ok you might save on the
address generation and the compiler doesn't have to find a register name,
but I don't remember anything modern being limited by instruction retirement).
You might save a bit of I-cache.

I've an idea that gcc converts var++ to 'var = var + 1' early on
and then might manage to convert it back later.
A good way of running out of registers,

> 
> We need to have explicit 'READ_ONCE()' etc atomic accesses, just to
> make it very very clear that the compiler will generate shit code.
> 
> We do *not* hide them and make them implicit by marking data
> structures volatile. I very much want those explicit
> READ_ONCE/WRITE_ONCE cases.

I guess you count as the boss :-)

	David

> 
>            Linus



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-16 17:43                               ` David Laight
@ 2026-02-17 12:16                                 ` Peter Zijlstra
  2026-02-17 14:25                                   ` David Laight
  2026-02-17 16:23                                 ` Linus Torvalds
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2026-02-17 12:16 UTC (permalink / raw)
  To: David Laight
  Cc: Linus Torvalds, Marco Elver, Will Deacon, Ingo Molnar,
	Thomas Gleixner, Boqun Feng, Waiman Long, Bart Van Assche, llvm,
	Catalin Marinas, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	kernel test robot, Boqun Feng, Al Viro

On Mon, Feb 16, 2026 at 05:43:24PM +0000, David Laight wrote:
> On Mon, 16 Feb 2026 07:32:53 -0800
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Mon, 16 Feb 2026 at 03:09, David Laight <david.laight.linux@gmail.com> wrote:
> > >
> > > volatile structure members are almost free  
> > 
> > No, gcc does absolutely horrible things with volatiles. It disables a
> > lot of very basic stuff.
> > 
> > Try doing something as simple as a "var++" on a volatile, and cry.
> 
> On x86 I just see a load, inc, store - not that surprising really.
> (clang did do 'inc memory'.)
> 
> It's not as though 'inc memory' is atomic (without a lock prefix).
> 
> Also var++ will be 3 u-ops the same as the read, inc, write so the
> underlying execution is much the same (ok you might save on the
> address generation and the compiler doesn't have to find a register name,
> but I don't remember anything modern being limited by instruction retirement).
> You might save a bit of I-cache.

Interrupts can tell the difference, and that matters.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-17 12:16                                 ` Peter Zijlstra
@ 2026-02-17 14:25                                   ` David Laight
  0 siblings, 0 replies; 37+ messages in thread
From: David Laight @ 2026-02-17 14:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Marco Elver, Will Deacon, Ingo Molnar,
	Thomas Gleixner, Boqun Feng, Waiman Long, Bart Van Assche, llvm,
	Catalin Marinas, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	kernel test robot, Boqun Feng, Al Viro

On Tue, 17 Feb 2026 13:16:19 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Feb 16, 2026 at 05:43:24PM +0000, David Laight wrote:
> > On Mon, 16 Feb 2026 07:32:53 -0800
> > Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >   
> > > On Mon, 16 Feb 2026 at 03:09, David Laight <david.laight.linux@gmail.com> wrote:  
> > > >
> > > > volatile structure members are almost free    
> > > 
> > > No, gcc does absolutely horrible things with volatiles. It disables a
> > > lot of very basic stuff.
> > > 
> > > Try doing something as simple as a "var++" on a volatile, and cry.  
> > 
> > On x86 I just see a load, inc, store - not that surprising really.
> > (clang did do 'inc memory'.)
> > 
> > It's not as though 'inc memory' is atomic (without a lock prefix).
> > 
> > Also var++ will be 3 u-ops the same as the read, inc, write so the
> > underlying execution is much the same (ok you might save on the
> > address generation and the compiler doesn't have to find a register name,
> > but I don't remember anything modern being limited by instruction retirement).
> > You might save a bit of I-cache.  
> 
> Interrupts can tell the difference, and that matters.

Just makes it the same as every other architecture.

The only way to guarantee an 'add to memory' instruction is to use an
asm statement.

	David


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-16 17:43                               ` David Laight
  2026-02-17 12:16                                 ` Peter Zijlstra
@ 2026-02-17 16:23                                 ` Linus Torvalds
  2026-02-17 16:32                                   ` Linus Torvalds
  1 sibling, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2026-02-17 16:23 UTC (permalink / raw)
  To: David Laight
  Cc: Marco Elver, Will Deacon, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, Boqun Feng, Waiman Long, Bart Van Assche, llvm,
	Catalin Marinas, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	kernel test robot, Boqun Feng, Al Viro

On Mon, 16 Feb 2026 at 09:43, David Laight <david.laight.linux@gmail.com> wrote:
>
> >
> > Try doing something as simple as a "var++" on a volatile, and cry.
>
> On x86 I just see a load, inc, store - not that surprising really.
> (clang did do 'inc memory'.)
>
> It's not as though 'inc memory' is atomic (without a lock prefix).

That's not my point. My point is that it makes for absolutely
disgusting - and pointless - code generation.

That load + inc + store is a sign of the compiler missing truly basic
optimizations because "volatile" is so badly designed.

The thing is, we typically even *want* a single load. We actually want
not only to have basic optimizations that don't even change the
semantics - we typically even want CSE.

So we want basic optimizations and combining loads. The main reason to
use READ_ONCE() is actually a worry about compilers doing even worse
things, namely rematerialization or memory accesses - something that
compilers don't even do, because it's a bad idea, but people still
worry because they are _allowed_ to do it and who knows when something
silly happens.

So I want that READ_ONCE(), not "volatile" on data structures, because
*some* day we can rely on more modern things and compilers will
actually get it right if we do it as

 #define READ_ONCE(ptr) __atomic_load_n(ptr, __ATOMIC_RELAXED)

or similar.

But last time I looked at it - which was admittedly a few years ago -
the compilers we supported didn't actually do anything reasonable here
(ie the built-in atomics were fundamentally worse than the ones we do
by hand, and even basic things like __atomic_load_n() weren't
actually; better than just using 'volatile'.

Maybe that has changed. We've upgraded minimum compilers since.

             Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-17 16:23                                 ` Linus Torvalds
@ 2026-02-17 16:32                                   ` Linus Torvalds
  2026-02-18 19:34                                     ` Boqun Feng
  2026-02-19 15:21                                     ` Gary Guo
  0 siblings, 2 replies; 37+ messages in thread
From: Linus Torvalds @ 2026-02-17 16:32 UTC (permalink / raw)
  To: David Laight
  Cc: Marco Elver, Will Deacon, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, Boqun Feng, Waiman Long, Bart Van Assche, llvm,
	Catalin Marinas, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	kernel test robot, Boqun Feng, Al Viro

On Tue, 17 Feb 2026 at 08:23, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But last time I looked at it - which was admittedly a few years ago -
> the compilers we supported didn't actually do anything reasonable here
> (ie the built-in atomics were fundamentally worse than the ones we do
> by hand, and even basic things like __atomic_load_n() weren't
> actually; better than just using 'volatile'.
>
> Maybe that has changed. We've upgraded minimum compilers since.

I  just checked a few test-cases, and I don't think anything has changed.

All the trivial things where __atomic_load_n(__ATOMIC_RELAXED) *could*
do better than just our old code using "cast pointer to volatile"
resulted in no better code generation.

So we're sticking to that "cast to volatile" not because it's great,
but because it continues to be reliable and no worse than the fancy
modern versions.

But at least it's explicit in the code, not hidden away in code that
*looks* like it just dereferences another field in a structure.

               Linus


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-17 16:32                                   ` Linus Torvalds
@ 2026-02-18 19:34                                     ` Boqun Feng
  2026-02-18 20:18                                       ` Linus Torvalds
  2026-02-19 15:21                                     ` Gary Guo
  1 sibling, 1 reply; 37+ messages in thread
From: Boqun Feng @ 2026-02-18 19:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Laight, Marco Elver, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Boqun Feng, Waiman Long,
	Bart Van Assche, llvm, Catalin Marinas, Arnd Bergmann,
	linux-arm-kernel, linux-kernel, kernel test robot, Al Viro,
	Alice Ryhl, Gary Guo

On Tue, Feb 17, 2026 at 08:32:32AM -0800, Linus Torvalds wrote:
> On Tue, 17 Feb 2026 at 08:23, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > But last time I looked at it - which was admittedly a few years ago -
> > the compilers we supported didn't actually do anything reasonable here
> > (ie the built-in atomics were fundamentally worse than the ones we do
> > by hand, and even basic things like __atomic_load_n() weren't
> > actually; better than just using 'volatile'.
> >
> > Maybe that has changed. We've upgraded minimum compilers since.
> 
> I  just checked a few test-cases, and I don't think anything has changed.
> 
> All the trivial things where __atomic_load_n(__ATOMIC_RELAXED) *could*
> do better than just our old code using "cast pointer to volatile"
> resulted in no better code generation.
> 

By any chance, could you share the test cases? It'll be really helpful
to represent what semantics we really want to compiler/language folks.

[Cc Alice and Gary]

Regards,
Boqun

> So we're sticking to that "cast to volatile" not because it's great,
> but because it continues to be reliable and no worse than the fancy
> modern versions.
> 
> But at least it's explicit in the code, not hidden away in code that
> *looks* like it just dereferences another field in a structure.
> 
>                Linus


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-18 19:34                                     ` Boqun Feng
@ 2026-02-18 20:18                                       ` Linus Torvalds
  0 siblings, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2026-02-18 20:18 UTC (permalink / raw)
  To: Boqun Feng
  Cc: David Laight, Marco Elver, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Boqun Feng, Waiman Long,
	Bart Van Assche, llvm, Catalin Marinas, Arnd Bergmann,
	linux-arm-kernel, linux-kernel, kernel test robot, Al Viro,
	Alice Ryhl, Gary Guo

On Wed, 18 Feb 2026 at 11:34, Boqun Feng <boqun@kernel.org> wrote:
>
> > I  just checked a few test-cases, and I don't think anything has changed.
> >
> > All the trivial things where __atomic_load_n(__ATOMIC_RELAXED) *could*
> > do better than just our old code using "cast pointer to volatile"
> > resulted in no better code generation.
> >
>
> By any chance, could you share the test cases? It'll be really helpful
> to represent what semantics we really want to compiler/language folks.

Lol, it was just a one-liner test where I compared a volatile pointer
with a regular pointer and a __atomic_load_n(), I didn't even save it.
It was checking two different bits in the same word.

It was something trivial like "(*ptr & 1) && (*ptr & 2)".

With a "volatile", that obviously has to give two loads, because
that's the required behavior of "the load is visible as an IO".

But what I want from __atomic_load_n() is that it does all the
optimizations that *hardware* is allowed to do on a load.  Loads are
not "visible as an IO", but they have ordering requirements, and they
cannot be torn.

That means that basic CSE is obviously allowed, so loading the pointer
value twice results in the compiler just doing a single load.

But it has a subtler meaning: while CSE is allowed, a load has to be
done at least once. That means that you can combine them, but you
can't hoist them outside loops and mke them go away entirely inside
the loop.

So my basic test-case is literally just check for that trivial case.
And the atomics fail that completely. Which makes them entirely
pointless. They don't generate any better code than "volatile" does.

And if they don't generate better code, then they are worthless
complexity and garbage.

And yes, that "check two different bits" actually comes from real use,
ie it comes from my annoyance with "test_bit()" kind of semantics.

[ Time passes ]

Bah. Based on the above, I just recreated a trivial test-case of what
I think *should* work. Here:

    #ifdef BAD
    #define ACCESS(p) (*(p))
    #elif defined(VOLATILE)
    #define ACCESS(p) (*(volatile unsigned long *)(p))
    #else
    #define ACCESS(p) __atomic_load_n(p,__ATOMIC_RELAXED)
    #endif

    static inline bool test_bit(int n, unsigned long *p)
    {
        return (1ul << n) & ACCESS(p);
    }

    int myfn(unsigned long *a)
    {
        while (!test_bit(3, a))
                /* Busy loop doing nothing*/ ;
        return test_bit(1, a) && test_bit(2,a);
    }

and then just do something like this:

    gcc -O2 -c -DVOLATILE t.c && objdump --disassemble --no-show-raw-insn t.o

to see what gcc generates.

The -DVOLATILE case works:

0000000000000000 <myfn>:
   0: mov    (%rdi),%rax
   3: test   $0x8,%al
   5: je     0 <myfn>
   7: mov    (%rdi),%rdx
   a: xor    %eax,%eax
   c: and    $0x2,%edx
   f: je     1b <myfn+0x1b>
  11: mov    (%rdi),%rax
  14: shr    $0x2,%rax
  18: and    $0x1,%eax
  1b: ret

Look, it loops until bit three is set, then it ands bits 1 and two,
but it has up to three loads to do this all.

With just plain loads (the -DBAD case), you get this:

0000000000000000 <myfn>:
   0: mov    (%rdi),%rax
   3: test   $0x8,%al
   5: je     13 <myfn+0x13>
   7: not    %rax
   a: test   $0x6,%al
   c: sete   %al
   f: movzbl %al,%eax
  12: ret
  13: jmp    13 <myfn+0x13>

which just results in an endless loop if bit three wasn't set, because
the atomic load was hoisted outside the loop. Plus there was no
guarantee that you couldn't have TOCTOU races (which this example
doesn't show). So that BAD case really is unacceptable, because it's
actively buggy.

And the __atomic_load_n() case generates pretty much the same thing as volatile:

0000000000000000 <myfn>:
   0: mov    (%rdi),%rax
   3: test   $0x8,%al
   5: je     0 <myfn>
   7: mov    (%rdi),%rdx
   a: xor    %eax,%eax
   c: and    $0x2,%edx
   f: jne    18 <myfn+0x18>
  11: ret
  12: nopw   0x0(%rax,%rax,1)
  18: mov    (%rdi),%rax
  1b: shr    $0x2,%rax
  1f: and    $0x1,%eax
  22: ret

except it's actually randomly worse for some unfathomable reason.

So tell me: why should anybody ever use that "modern" thing that
generates worse code?

What I *WANT* is something like this:

0000000000000030 <myfn>:
  30: mov    (%rdi),%rax
  33: test   $0x8,%al
  35: je     30 <myfn2>
  37: not    %rax
  3a: test   $0x6,%al
  3c: sete   %al
  3f: movzbl %al,%eax
  42: ret

and I can make the compiler do that by just doing the CSE myself and
not using a test_bit() helper.

But until the atomics do this *trivial* case right, there is
absolutely no way that we'll use them in the kernel.

And no, clang does no better than gcc. It's possible that the C
standards have specified the semantics of atomics badly and compilers
are forced to generate crap - I wouldn't be surprised.

But more likely it's that nobody cares, and compilers generate crap
because the atomics are complicated enough that it's not worth the
pain to do anything better.

But as long as nobody cares about those code generations issues,
"volatile" is simply superior.

It is standard and portable, doesn't rely on modern compilers, and it
generates equivalent or better code.

I feel like I'm not asking for much. I'm literally asking for an
__atomic_load_n() that is better than volatile

Right now it is WORSE.

              Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-17 16:32                                   ` Linus Torvalds
  2026-02-18 19:34                                     ` Boqun Feng
@ 2026-02-19 15:21                                     ` Gary Guo
  2026-02-19 18:36                                       ` Linus Torvalds
  1 sibling, 1 reply; 37+ messages in thread
From: Gary Guo @ 2026-02-19 15:21 UTC (permalink / raw)
  To: Linus Torvalds, David Laight
  Cc: Marco Elver, Will Deacon, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, Boqun Feng, Waiman Long, Bart Van Assche, llvm,
	Catalin Marinas, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	kernel test robot, Boqun Feng, Al Viro

On Tue Feb 17, 2026 at 4:32 PM GMT, Linus Torvalds wrote:
> On Tue, 17 Feb 2026 at 08:23, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> But last time I looked at it - which was admittedly a few years ago -
>> the compilers we supported didn't actually do anything reasonable here
>> (ie the built-in atomics were fundamentally worse than the ones we do
>> by hand, and even basic things like __atomic_load_n() weren't
>> actually; better than just using 'volatile'.
>>
>> Maybe that has changed. We've upgraded minimum compilers since.
>
> I  just checked a few test-cases, and I don't think anything has changed.
>
> All the trivial things where __atomic_load_n(__ATOMIC_RELAXED) *could*
> do better than just our old code using "cast pointer to volatile"
> resulted in no better code generation.

I think compilers are *very* conservative in implementing any optimization
related to atomics, to the extent that normal atomics load/stores are
effectively volatile atomic load/stores and thus the optimization you are
referring to here does not get performed.

However there is some limited optimizations being done to fences, at least I
know in LLVM multiple fences do get merged.

E.g. if you have 

    atomic_thread_fence(memory_order_acquire);
    atomic_thread_fence(memory_order_acq_rel);

the first fence is abosrbed into the second.

>
> So we're sticking to that "cast to volatile" not because it's great,
> but because it continues to be reliable and no worse than the fancy
> modern versions.
>
> But at least it's explicit in the code, not hidden away in code that
> *looks* like it just dereferences another field in a structure.

The implicit load/store is arguably the biggest design failure of C11 atomics.
However we can still typedef like atomic_t or use "cast to _Atomic". But as you
said, currently this does not really yield any benefits over "cast to volatile".

Best,
Gary



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  2026-02-19 15:21                                     ` Gary Guo
@ 2026-02-19 18:36                                       ` Linus Torvalds
  0 siblings, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2026-02-19 18:36 UTC (permalink / raw)
  To: Gary Guo
  Cc: David Laight, Marco Elver, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Boqun Feng, Waiman Long,
	Bart Van Assche, llvm, Catalin Marinas, Arnd Bergmann,
	linux-arm-kernel, linux-kernel, kernel test robot, Boqun Feng,
	Al Viro

On Thu, 19 Feb 2026 at 07:21, Gary Guo <gary@garyguo.net> wrote:
>
> However there is some limited optimizations being done to fences, at least I
> know in LLVM multiple fences do get merged.

I don't think we have any real use for that, in that we typically have
no real need for combining fences.

Yes, we have hacks for "I did an atomic op, so now I'm using a special
fence that turns into a nop on x86 because the atomic already acted as
a fence" (ie "smp_mb__after_atomic()" etc). So we have some pain
points, but it's pain points we've dealt with.

Having the compiler reliably deal with that would have been great - 25
years ago.

But we have the code, and we have to use our own architecture-specific
atomics anyway because the compiler built-ins aren't reliable or good
enough.

The thing is, inline asm is *reliable*. It's reliable exactly because
it isn't a thousand different cases that the compiler has to get
right. It's *one* case - admittedly a complicated one - that the
compiler has to get right, but then you can do pretty much anything.

And having seen the process - and read the language - of the C memory
ordering standard, I do not believe anybody can ever understand that
thing. Memory ordering is very subtle and hard to understand even when
well documented. The C standard documentation for it is definitely not
helping.

I'm not surprised that compilers don't do better at atomics, because
if I was a compiler writer, I'd have long since given up trying to
figure out the meaning of the standards language and just gone "yeah,
it's an atomic, I won't do anything fancy".

(Which is obviously exactly what happens to volatile too, but at least
there you can *understand* the rules, at least unless you got confused
by the C++ horrors, at which point you throw your hands up again).

             Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2026-02-19 18:36 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-30 13:28 [PATCH v3 0/3] arm64: Fixes for __READ_ONCE() with CONFIG_LTO=y Marco Elver
2026-01-30 13:28 ` [PATCH v3 1/3] arm64: Fix non-atomic " Marco Elver
2026-01-30 15:06   ` David Laight
2026-01-30 13:28 ` [PATCH v3 2/3] arm64: Optimize " Marco Elver
2026-01-30 15:11   ` David Laight
2026-02-02 15:36   ` Will Deacon
2026-02-02 16:01     ` Peter Zijlstra
2026-02-02 16:05       ` Will Deacon
2026-02-02 17:48         ` Marco Elver
2026-02-02 19:28     ` David Laight
2026-01-30 13:28 ` [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through " Marco Elver
2026-01-30 15:13   ` David Laight
2026-02-02 15:39   ` Will Deacon
2026-02-02 19:29     ` David Laight
2026-02-03 11:47       ` Will Deacon
2026-02-04 10:46         ` Marco Elver
2026-02-04 13:14           ` Peter Zijlstra
2026-02-04 14:15             ` Will Deacon
2026-02-06 15:09               ` Marco Elver
2026-02-06 18:26                 ` David Laight
2026-02-15 21:55                   ` Marco Elver
2026-02-15 22:16                     ` David Laight
2026-02-15 22:43                       ` Marco Elver
2026-02-15 23:18                         ` David Laight
2026-02-15 23:40                         ` Linus Torvalds
2026-02-16 11:09                           ` David Laight
2026-02-16 15:32                             ` Linus Torvalds
2026-02-16 17:43                               ` David Laight
2026-02-17 12:16                                 ` Peter Zijlstra
2026-02-17 14:25                                   ` David Laight
2026-02-17 16:23                                 ` Linus Torvalds
2026-02-17 16:32                                   ` Linus Torvalds
2026-02-18 19:34                                     ` Boqun Feng
2026-02-18 20:18                                       ` Linus Torvalds
2026-02-19 15:21                                     ` Gary Guo
2026-02-19 18:36                                       ` Linus Torvalds
2026-02-02 19:13 ` [PATCH v3 0/3] arm64: Fixes for " Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox