* [RFC PATCH v4 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency
[not found] <20251218014531.3793471-1-mathieu.desnoyers@efficios.com>
@ 2025-12-18 1:45 ` Mathieu Desnoyers
2025-12-18 9:03 ` David Laight
2025-12-18 1:45 ` [RFC PATCH v4 2/4] Documentation: RCU: Refer to ptr_eq() Mathieu Desnoyers
1 sibling, 1 reply; 7+ messages in thread
From: Mathieu Desnoyers @ 2025-12-18 1:45 UTC (permalink / raw)
To: Boqun Feng, Joel Fernandes, Paul E. McKenney
Cc: linux-kernel, Mathieu Desnoyers, Nicholas Piggin,
Michael Ellerman, Greg Kroah-Hartman, Sebastian Andrzej Siewior,
Will Deacon, Peter Zijlstra, Alan Stern, John Stultz,
Neeraj Upadhyay, Linus Torvalds, Andrew Morton,
Frederic Weisbecker, Josh Triplett, Uladzislau Rezki,
Steven Rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
Mateusz Guzik, Jonas Oberhauser, rcu, linux-mm, lkmm, Gary Guo,
Nikita Popov, llvm
Compiler CSE and SSA GVN optimizations can cause the address dependency
of addresses returned by rcu_dereference to be lost when comparing those
pointers with either constants or previously loaded pointers.
Introduce ptr_eq() to compare two addresses while preserving the address
dependencies for later use of the address. It should be used when
comparing an address returned by rcu_dereference().
This is needed to prevent the compiler CSE and SSA GVN optimizations
from using @a (or @b) in places where the source refers to @b (or @a)
based on the fact that after the comparison, the two are known to be
equal, which does not preserve address dependencies and allows the
following misordering speculations:
- If @b is a constant, the compiler can issue the loads which depend
on @a before loading @a.
- If @b is a register populated by a prior load, weakly-ordered
CPUs can speculate loads which depend on @a before loading @a.
The same logic applies with @a and @b swapped.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: "Paul E. McKenney" <paulmck@kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: John Stultz <jstultz@google.com>
Cc: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Zqiang <qiang.zhang1211@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: maged.michael@gmail.com
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
Cc: rcu@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: lkmm@lists.linux.dev
Cc: Nikita Popov <github@npopov.com>
Cc: llvm@lists.linux.dev
---
Changes since v0:
- Include feedback from Alan Stern.
---
include/linux/compiler.h | 63 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 63 insertions(+)
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 5b45ea7dff3e..c5ca3b54c112 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -163,6 +163,69 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
__asm__ ("" : "=r" (var) : "0" (var))
#endif
+/*
+ * Compare two addresses while preserving the address dependencies for
+ * later use of the address. It should be used when comparing an address
+ * returned by rcu_dereference().
+ *
+ * This is needed to prevent the compiler CSE and SSA GVN optimizations
+ * from using @a (or @b) in places where the source refers to @b (or @a)
+ * based on the fact that after the comparison, the two are known to be
+ * equal, which does not preserve address dependencies and allows the
+ * following misordering speculations:
+ *
+ * - If @b is a constant, the compiler can issue the loads which depend
+ * on @a before loading @a.
+ * - If @b is a register populated by a prior load, weakly-ordered
+ * CPUs can speculate loads which depend on @a before loading @a.
+ *
+ * The same logic applies with @a and @b swapped.
+ *
+ * Return value: true if pointers are equal, false otherwise.
+ *
+ * The compiler barrier() is ineffective at fixing this issue. It does
+ * not prevent the compiler CSE from losing the address dependency:
+ *
+ * int fct_2_volatile_barriers(void)
+ * {
+ * int *a, *b;
+ *
+ * do {
+ * a = READ_ONCE(p);
+ * asm volatile ("" : : : "memory");
+ * b = READ_ONCE(p);
+ * } while (a != b);
+ * asm volatile ("" : : : "memory"); <-- barrier()
+ * return *b;
+ * }
+ *
+ * With gcc 14.2 (arm64):
+ *
+ * fct_2_volatile_barriers:
+ * adrp x0, .LANCHOR0
+ * add x0, x0, :lo12:.LANCHOR0
+ * .L2:
+ * ldr x1, [x0] <-- x1 populated by first load.
+ * ldr x2, [x0]
+ * cmp x1, x2
+ * bne .L2
+ * ldr w0, [x1] <-- x1 is used for access which should depend on b.
+ * ret
+ *
+ * On weakly-ordered architectures, this lets CPU speculation use the
+ * result from the first load to speculate "ldr w0, [x1]" before
+ * "ldr x2, [x0]".
+ * Based on the RCU documentation, the control dependency does not
+ * prevent the CPU from speculating loads.
+ */
+static __always_inline
+int ptr_eq(const volatile void *a, const volatile void *b)
+{
+ OPTIMIZER_HIDE_VAR(a);
+ OPTIMIZER_HIDE_VAR(b);
+ return a == b;
+}
+
#define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
/**
--
2.39.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH v4 2/4] Documentation: RCU: Refer to ptr_eq()
[not found] <20251218014531.3793471-1-mathieu.desnoyers@efficios.com>
2025-12-18 1:45 ` [RFC PATCH v4 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency Mathieu Desnoyers
@ 2025-12-18 1:45 ` Mathieu Desnoyers
1 sibling, 0 replies; 7+ messages in thread
From: Mathieu Desnoyers @ 2025-12-18 1:45 UTC (permalink / raw)
To: Boqun Feng, Joel Fernandes, Paul E. McKenney
Cc: linux-kernel, Mathieu Desnoyers, Nicholas Piggin,
Michael Ellerman, Greg Kroah-Hartman, Sebastian Andrzej Siewior,
Will Deacon, Peter Zijlstra, Alan Stern, John Stultz,
Neeraj Upadhyay, Linus Torvalds, Andrew Morton,
Frederic Weisbecker, Josh Triplett, Uladzislau Rezki,
Steven Rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
Mateusz Guzik, Jonas Oberhauser, rcu, linux-mm, lkmm, Gary Guo,
Nikita Popov, llvm
Refer to ptr_eq() in the rcu_dereference() documentation.
ptr_eq() is a mechanism that preserves address dependencies when
comparing pointers, and should be favored when comparing a pointer
obtained from rcu_dereference() against another pointer.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: John Stultz <jstultz@google.com>
Cc: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Zqiang <qiang.zhang1211@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: maged.michael@gmail.com
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
Cc: rcu@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: lkmm@lists.linux.dev
Cc: Nikita Popov <github@npopov.com>
Cc: llvm@lists.linux.dev
---
Changes since v1:
- Include feedback from Paul E. McKenney.
Changes since v0:
- Include feedback from Alan Stern.
---
Documentation/RCU/rcu_dereference.rst | 38 +++++++++++++++++++++++----
1 file changed, 33 insertions(+), 5 deletions(-)
diff --git a/Documentation/RCU/rcu_dereference.rst b/Documentation/RCU/rcu_dereference.rst
index 2524dcdadde2..de6175bf430f 100644
--- a/Documentation/RCU/rcu_dereference.rst
+++ b/Documentation/RCU/rcu_dereference.rst
@@ -104,11 +104,12 @@ readers working properly:
after such branches, but can speculate loads, which can again
result in misordering bugs.
-- Be very careful about comparing pointers obtained from
- rcu_dereference() against non-NULL values. As Linus Torvalds
- explained, if the two pointers are equal, the compiler could
- substitute the pointer you are comparing against for the pointer
- obtained from rcu_dereference(). For example::
+- Use operations that preserve address dependencies (such as
+ "ptr_eq()") to compare pointers obtained from rcu_dereference()
+ against non-NULL pointers. As Linus Torvalds explained, if the
+ two pointers are equal, the compiler could substitute the
+ pointer you are comparing against for the pointer obtained from
+ rcu_dereference(). For example::
p = rcu_dereference(gp);
if (p == &default_struct)
@@ -125,6 +126,29 @@ readers working properly:
On ARM and Power hardware, the load from "default_struct.a"
can now be speculated, such that it might happen before the
rcu_dereference(). This could result in bugs due to misordering.
+ Performing the comparison with "ptr_eq()" ensures the compiler
+ does not perform such transformation.
+
+ If the comparison is against another pointer, the compiler is
+ allowed to use either pointer for the following accesses, which
+ loses the address dependency and allows weakly-ordered
+ architectures such as ARM and PowerPC to speculate the
+ address-dependent load before rcu_dereference(). For example::
+
+ p1 = READ_ONCE(gp);
+ p2 = rcu_dereference(gp);
+ if (p1 == p2) /* BUGGY!!! */
+ do_default(p2->a);
+
+ The compiler can use p1->a rather than p2->a, destroying the
+ address dependency. Performing the comparison with "ptr_eq()"
+ ensures the compiler preserves the address dependencies.
+ Corrected code::
+
+ p1 = READ_ONCE(gp);
+ p2 = rcu_dereference(gp);
+ if (ptr_eq(p1, p2))
+ do_default(p2->a);
However, comparisons are OK in the following cases:
@@ -204,6 +228,10 @@ readers working properly:
comparison will provide exactly the information that the
compiler needs to deduce the value of the pointer.
+ When in doubt, use operations that preserve address dependencies
+ (such as "ptr_eq()") to compare pointers obtained from
+ rcu_dereference() against non-NULL pointers.
+
- Disable any value-speculation optimizations that your compiler
might provide, especially if you are making use of feedback-based
optimizations that take data collected from prior runs. Such
--
2.39.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH v4 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency
2025-12-18 1:45 ` [RFC PATCH v4 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency Mathieu Desnoyers
@ 2025-12-18 9:03 ` David Laight
2025-12-18 13:51 ` Mathieu Desnoyers
2025-12-18 14:27 ` Gary Guo
0 siblings, 2 replies; 7+ messages in thread
From: David Laight @ 2025-12-18 9:03 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Boqun Feng, Joel Fernandes, Paul E. McKenney, linux-kernel,
Nicholas Piggin, Michael Ellerman, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Will Deacon, Peter Zijlstra,
Alan Stern, John Stultz, Neeraj Upadhyay, Linus Torvalds,
Andrew Morton, Frederic Weisbecker, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Jonas Oberhauser,
rcu, linux-mm, lkmm, Gary Guo, Nikita Popov, llvm
On Wed, 17 Dec 2025 20:45:28 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> Compiler CSE and SSA GVN optimizations can cause the address dependency
> of addresses returned by rcu_dereference to be lost when comparing those
> pointers with either constants or previously loaded pointers.
>
> Introduce ptr_eq() to compare two addresses while preserving the address
> dependencies for later use of the address. It should be used when
> comparing an address returned by rcu_dereference().
>
> This is needed to prevent the compiler CSE and SSA GVN optimizations
> from using @a (or @b) in places where the source refers to @b (or @a)
> based on the fact that after the comparison, the two are known to be
> equal, which does not preserve address dependencies and allows the
> following misordering speculations:
>
> - If @b is a constant, the compiler can issue the loads which depend
> on @a before loading @a.
> - If @b is a register populated by a prior load, weakly-ordered
> CPUs can speculate loads which depend on @a before loading @a.
>
> The same logic applies with @a and @b swapped.
>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Suggested-by: Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Acked-by: "Paul E. McKenney" <paulmck@kernel.org>
> Acked-by: Alan Stern <stern@rowland.harvard.edu>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: John Stultz <jstultz@google.com>
> Cc: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Zqiang <qiang.zhang1211@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: maged.michael@gmail.com
> Cc: Mateusz Guzik <mjguzik@gmail.com>
> Cc: Gary Guo <gary@garyguo.net>
> Cc: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
> Cc: rcu@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: lkmm@lists.linux.dev
> Cc: Nikita Popov <github@npopov.com>
> Cc: llvm@lists.linux.dev
> ---
> Changes since v0:
> - Include feedback from Alan Stern.
> ---
> include/linux/compiler.h | 63 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 63 insertions(+)
>
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5b45ea7dff3e..c5ca3b54c112 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -163,6 +163,69 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> __asm__ ("" : "=r" (var) : "0" (var))
> #endif
>
> +/*
> + * Compare two addresses while preserving the address dependencies for
> + * later use of the address. It should be used when comparing an address
> + * returned by rcu_dereference().
> + *
> + * This is needed to prevent the compiler CSE and SSA GVN optimizations
> + * from using @a (or @b) in places where the source refers to @b (or @a)
> + * based on the fact that after the comparison, the two are known to be
> + * equal, which does not preserve address dependencies and allows the
> + * following misordering speculations:
> + *
> + * - If @b is a constant, the compiler can issue the loads which depend
> + * on @a before loading @a.
> + * - If @b is a register populated by a prior load, weakly-ordered
> + * CPUs can speculate loads which depend on @a before loading @a.
> + *
> + * The same logic applies with @a and @b swapped.
> + *
> + * Return value: true if pointers are equal, false otherwise.
> + *
> + * The compiler barrier() is ineffective at fixing this issue. It does
> + * not prevent the compiler CSE from losing the address dependency:
> + *
> + * int fct_2_volatile_barriers(void)
> + * {
> + * int *a, *b;
> + *
> + * do {
> + * a = READ_ONCE(p);
> + * asm volatile ("" : : : "memory");
> + * b = READ_ONCE(p);
> + * } while (a != b);
> + * asm volatile ("" : : : "memory"); <-- barrier()
> + * return *b;
> + * }
> + *
> + * With gcc 14.2 (arm64):
> + *
> + * fct_2_volatile_barriers:
> + * adrp x0, .LANCHOR0
> + * add x0, x0, :lo12:.LANCHOR0
> + * .L2:
> + * ldr x1, [x0] <-- x1 populated by first load.
> + * ldr x2, [x0]
> + * cmp x1, x2
> + * bne .L2
> + * ldr w0, [x1] <-- x1 is used for access which should depend on b.
> + * ret
> + *
> + * On weakly-ordered architectures, this lets CPU speculation use the
> + * result from the first load to speculate "ldr w0, [x1]" before
> + * "ldr x2, [x0]".
> + * Based on the RCU documentation, the control dependency does not
> + * prevent the CPU from speculating loads.
I'm not sure that example (of something that doesn't work) is really necessary.
The simple example of, given:
return a == b ? *a : 0;
the generated code might speculatively dereference 'b' (not a) before returning
zero when the pointers are different.
David
> + */
> +static __always_inline
> +int ptr_eq(const volatile void *a, const volatile void *b)
> +{
> + OPTIMIZER_HIDE_VAR(a);
> + OPTIMIZER_HIDE_VAR(b);
> + return a == b;
> +}
> +
> #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
>
> /**
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH v4 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency
2025-12-18 9:03 ` David Laight
@ 2025-12-18 13:51 ` Mathieu Desnoyers
2025-12-18 15:54 ` David Laight
2025-12-18 14:27 ` Gary Guo
1 sibling, 1 reply; 7+ messages in thread
From: Mathieu Desnoyers @ 2025-12-18 13:51 UTC (permalink / raw)
To: David Laight
Cc: Boqun Feng, Joel Fernandes, Paul E. McKenney, linux-kernel,
Nicholas Piggin, Michael Ellerman, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Will Deacon, Peter Zijlstra,
Alan Stern, John Stultz, Neeraj Upadhyay, Linus Torvalds,
Andrew Morton, Frederic Weisbecker, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Jonas Oberhauser,
rcu, linux-mm, lkmm, Gary Guo, Nikita Popov, llvm
On 2025-12-18 04:03, David Laight wrote:
[...]
>> + *
>> + * The compiler barrier() is ineffective at fixing this issue. It does
>> + * not prevent the compiler CSE from losing the address dependency:
>> + *
>> + * int fct_2_volatile_barriers(void)
>> + * {
>> + * int *a, *b;
>> + *
>> + * do {
>> + * a = READ_ONCE(p);
>> + * asm volatile ("" : : : "memory");
>> + * b = READ_ONCE(p);
>> + * } while (a != b);
>> + * asm volatile ("" : : : "memory"); <-- barrier()
>> + * return *b;
>> + * }
>> + *
>> + * With gcc 14.2 (arm64):
>> + *
>> + * fct_2_volatile_barriers:
>> + * adrp x0, .LANCHOR0
>> + * add x0, x0, :lo12:.LANCHOR0
>> + * .L2:
>> + * ldr x1, [x0] <-- x1 populated by first load.
>> + * ldr x2, [x0]
>> + * cmp x1, x2
>> + * bne .L2
>> + * ldr w0, [x1] <-- x1 is used for access which should depend on b.
>> + * ret
>> + *
>> + * On weakly-ordered architectures, this lets CPU speculation use the
>> + * result from the first load to speculate "ldr w0, [x1]" before
>> + * "ldr x2, [x0]".
>> + * Based on the RCU documentation, the control dependency does not
>> + * prevent the CPU from speculating loads.
>
> I'm not sure that example (of something that doesn't work) is really necessary.
> The simple example of, given:
> return a == b ? *a : 0;
> the generated code might speculatively dereference 'b' (not a) before returning
> zero when the pointers are different.
In the past discussion that led to this new API, AFAIU, Linus made it
clear that this counter example needs to be in a comment:
https://lore.kernel.org/lkml/CAHk-=wgBgh5U+dyNaN=+XCdcm2OmgSRbcH4Vbtk8i5ZDGwStSA@mail.gmail.com/
This counter-example is what convinced him that this addresses a real
issue.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH v4 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency
2025-12-18 9:03 ` David Laight
2025-12-18 13:51 ` Mathieu Desnoyers
@ 2025-12-18 14:27 ` Gary Guo
2025-12-18 16:12 ` David Laight
1 sibling, 1 reply; 7+ messages in thread
From: Gary Guo @ 2025-12-18 14:27 UTC (permalink / raw)
To: David Laight
Cc: Mathieu Desnoyers, Boqun Feng, Joel Fernandes, Paul E. McKenney,
linux-kernel, Nicholas Piggin, Michael Ellerman,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Will Deacon,
Peter Zijlstra, Alan Stern, John Stultz, Neeraj Upadhyay,
Linus Torvalds, Andrew Morton, Frederic Weisbecker, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Jonas Oberhauser,
rcu, linux-mm, lkmm, Nikita Popov, llvm
On Thu, 18 Dec 2025 09:03:13 +0000
David Laight <david.laight.linux@gmail.com> wrote:
> On Wed, 17 Dec 2025 20:45:28 -0500
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>
> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > index 5b45ea7dff3e..c5ca3b54c112 100644
> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -163,6 +163,69 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> > __asm__ ("" : "=r" (var) : "0" (var))
> > #endif
> >
> > +/*
> > + * Compare two addresses while preserving the address dependencies for
> > + * later use of the address. It should be used when comparing an address
> > + * returned by rcu_dereference().
> > + *
> > + * This is needed to prevent the compiler CSE and SSA GVN optimizations
> > + * from using @a (or @b) in places where the source refers to @b (or @a)
> > + * based on the fact that after the comparison, the two are known to be
> > + * equal, which does not preserve address dependencies and allows the
> > + * following misordering speculations:
> > + *
> > + * - If @b is a constant, the compiler can issue the loads which depend
> > + * on @a before loading @a.
> > + * - If @b is a register populated by a prior load, weakly-ordered
> > + * CPUs can speculate loads which depend on @a before loading @a.
> > + *
> > + * The same logic applies with @a and @b swapped.
> > + *
> > + * Return value: true if pointers are equal, false otherwise.
> > + *
> > + * The compiler barrier() is ineffective at fixing this issue. It does
> > + * not prevent the compiler CSE from losing the address dependency:
> > + *
> > + * int fct_2_volatile_barriers(void)
> > + * {
> > + * int *a, *b;
> > + *
> > + * do {
> > + * a = READ_ONCE(p);
> > + * asm volatile ("" : : : "memory");
> > + * b = READ_ONCE(p);
> > + * } while (a != b);
> > + * asm volatile ("" : : : "memory"); <-- barrier()
> > + * return *b;
> > + * }
> > + *
> > + * With gcc 14.2 (arm64):
> > + *
> > + * fct_2_volatile_barriers:
> > + * adrp x0, .LANCHOR0
> > + * add x0, x0, :lo12:.LANCHOR0
> > + * .L2:
> > + * ldr x1, [x0] <-- x1 populated by first load.
> > + * ldr x2, [x0]
> > + * cmp x1, x2
> > + * bne .L2
> > + * ldr w0, [x1] <-- x1 is used for access which should depend on b.
> > + * ret
> > + *
> > + * On weakly-ordered architectures, this lets CPU speculation use the
> > + * result from the first load to speculate "ldr w0, [x1]" before
> > + * "ldr x2, [x0]".
> > + * Based on the RCU documentation, the control dependency does not
> > + * prevent the CPU from speculating loads.
>
> I'm not sure that example (of something that doesn't work) is really necessary.
> The simple example of, given:
> return a == b ? *a : 0;
> the generated code might speculatively dereference 'b' (not a) before returning
> zero when the pointers are different.
I'm not sure I understand what you're saying.
`b` cannot be speculatively dereferenced by the compiler in code-path
where pointers are different, as the compiler cannot ascertain that it is
valid.
The speculative execution on the processor side *does not* matter here as
it needs to honour address dependency (unless you're Alpha, which is why we
add a `mb()` in each `READ_ONCE`).
Best,
Gary
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH v4 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency
2025-12-18 13:51 ` Mathieu Desnoyers
@ 2025-12-18 15:54 ` David Laight
0 siblings, 0 replies; 7+ messages in thread
From: David Laight @ 2025-12-18 15:54 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Boqun Feng, Joel Fernandes, Paul E. McKenney, linux-kernel,
Nicholas Piggin, Michael Ellerman, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Will Deacon, Peter Zijlstra,
Alan Stern, John Stultz, Neeraj Upadhyay, Linus Torvalds,
Andrew Morton, Frederic Weisbecker, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Jonas Oberhauser,
rcu, linux-mm, lkmm, Gary Guo, Nikita Popov, llvm
On Thu, 18 Dec 2025 08:51:02 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> On 2025-12-18 04:03, David Laight wrote:
> [...]
> >> + *
> >> + * The compiler barrier() is ineffective at fixing this issue. It does
> >> + * not prevent the compiler CSE from losing the address dependency:
> >> + *
> >> + * int fct_2_volatile_barriers(void)
> >> + * {
> >> + * int *a, *b;
> >> + *
> >> + * do {
> >> + * a = READ_ONCE(p);
> >> + * asm volatile ("" : : : "memory");
> >> + * b = READ_ONCE(p);
> >> + * } while (a != b);
> >> + * asm volatile ("" : : : "memory"); <-- barrier()
> >> + * return *b;
> >> + * }
> >> + *
> >> + * With gcc 14.2 (arm64):
> >> + *
> >> + * fct_2_volatile_barriers:
> >> + * adrp x0, .LANCHOR0
> >> + * add x0, x0, :lo12:.LANCHOR0
> >> + * .L2:
> >> + * ldr x1, [x0] <-- x1 populated by first load.
> >> + * ldr x2, [x0]
> >> + * cmp x1, x2
> >> + * bne .L2
> >> + * ldr w0, [x1] <-- x1 is used for access which should depend on b.
> >> + * ret
> >> + *
> >> + * On weakly-ordered architectures, this lets CPU speculation use the
> >> + * result from the first load to speculate "ldr w0, [x1]" before
> >> + * "ldr x2, [x0]".
> >> + * Based on the RCU documentation, the control dependency does not
> >> + * prevent the CPU from speculating loads.
> >
> > I'm not sure that example (of something that doesn't work) is really necessary.
> > The simple example of, given:
> > return a == b ? *a : 0;
> > the generated code might speculatively dereference 'b' (not a) before returning
> > zero when the pointers are different.
>
> In the past discussion that led to this new API, AFAIU, Linus made it
> clear that this counter example needs to be in a comment:
I might remember that...
But if you read the proposed comment it starts looking like an example.
It is also very long for the file it is in - even if clearly marked as why
the same effect can't be achieved with barrier().
Maybe the long gory comment belongs in the rst file?
I do wonder if some places need this:
#define OPTIMISER_HIDE_VAL(x) ({ auto _x = x; OPTIMISER_HIDE_VAR(_x); _x; })
Then you could do:
#define ptr_eq(x, y) (OPTIMISER_HIDE_VAL(x) == OPTIMISER_HIDE_VAL(y))
which includes the check that the pointers are the same type.
But it would be more generally useful for hiding constants from the optimiser.
David
>
> https://lore.kernel.org/lkml/CAHk-=wgBgh5U+dyNaN=+XCdcm2OmgSRbcH4Vbtk8i5ZDGwStSA@mail.gmail.com/
>
> This counter-example is what convinced him that this addresses a real
> issue.
>
> Thanks,
>
> Mathieu
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH v4 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency
2025-12-18 14:27 ` Gary Guo
@ 2025-12-18 16:12 ` David Laight
0 siblings, 0 replies; 7+ messages in thread
From: David Laight @ 2025-12-18 16:12 UTC (permalink / raw)
To: Gary Guo
Cc: Mathieu Desnoyers, Boqun Feng, Joel Fernandes, Paul E. McKenney,
linux-kernel, Nicholas Piggin, Michael Ellerman,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Will Deacon,
Peter Zijlstra, Alan Stern, John Stultz, Neeraj Upadhyay,
Linus Torvalds, Andrew Morton, Frederic Weisbecker, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Jonas Oberhauser,
rcu, linux-mm, lkmm, Nikita Popov, llvm
On Thu, 18 Dec 2025 14:27:36 +0000
Gary Guo <gary@garyguo.net> wrote:
> On Thu, 18 Dec 2025 09:03:13 +0000
> David Laight <david.laight.linux@gmail.com> wrote:
>
> > On Wed, 17 Dec 2025 20:45:28 -0500
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >
> > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > > index 5b45ea7dff3e..c5ca3b54c112 100644
> > > --- a/include/linux/compiler.h
> > > +++ b/include/linux/compiler.h
> > > @@ -163,6 +163,69 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> > > __asm__ ("" : "=r" (var) : "0" (var))
> > > #endif
> > >
> > > +/*
> > > + * Compare two addresses while preserving the address dependencies for
> > > + * later use of the address. It should be used when comparing an address
> > > + * returned by rcu_dereference().
> > > + *
> > > + * This is needed to prevent the compiler CSE and SSA GVN optimizations
> > > + * from using @a (or @b) in places where the source refers to @b (or @a)
> > > + * based on the fact that after the comparison, the two are known to be
> > > + * equal, which does not preserve address dependencies and allows the
> > > + * following misordering speculations:
> > > + *
> > > + * - If @b is a constant, the compiler can issue the loads which depend
> > > + * on @a before loading @a.
> > > + * - If @b is a register populated by a prior load, weakly-ordered
> > > + * CPUs can speculate loads which depend on @a before loading @a.
> > > + *
> > > + * The same logic applies with @a and @b swapped.
> > > + *
> > > + * Return value: true if pointers are equal, false otherwise.
> > > + *
> > > + * The compiler barrier() is ineffective at fixing this issue. It does
> > > + * not prevent the compiler CSE from losing the address dependency:
> > > + *
> > > + * int fct_2_volatile_barriers(void)
> > > + * {
> > > + * int *a, *b;
> > > + *
> > > + * do {
> > > + * a = READ_ONCE(p);
> > > + * asm volatile ("" : : : "memory");
> > > + * b = READ_ONCE(p);
> > > + * } while (a != b);
> > > + * asm volatile ("" : : : "memory"); <-- barrier()
> > > + * return *b;
> > > + * }
> > > + *
> > > + * With gcc 14.2 (arm64):
> > > + *
> > > + * fct_2_volatile_barriers:
> > > + * adrp x0, .LANCHOR0
> > > + * add x0, x0, :lo12:.LANCHOR0
> > > + * .L2:
> > > + * ldr x1, [x0] <-- x1 populated by first load.
> > > + * ldr x2, [x0]
> > > + * cmp x1, x2
> > > + * bne .L2
> > > + * ldr w0, [x1] <-- x1 is used for access which should depend on b.
> > > + * ret
> > > + *
> > > + * On weakly-ordered architectures, this lets CPU speculation use the
> > > + * result from the first load to speculate "ldr w0, [x1]" before
> > > + * "ldr x2, [x0]".
> > > + * Based on the RCU documentation, the control dependency does not
> > > + * prevent the CPU from speculating loads.
> >
> > I'm not sure that example (of something that doesn't work) is really necessary.
> > The simple example of, given:
> > return a == b ? *a : 0;
> > the generated code might speculatively dereference 'b' (not a) before returning
> > zero when the pointers are different.
>
> I'm not sure I understand what you're saying.
>
> `b` cannot be speculatively dereferenced by the compiler in code-path
> where pointers are different, as the compiler cannot ascertain that it is
> valid.
The 'validity' doesn't matter for speculative execution.
> The speculative execution on the processor side *does not* matter here as
> it needs to honour address dependency (unless you're Alpha, which is why we
> add a `mb()` in each `READ_ONCE`).
There isn't an 'address dependency', that is the problem.
The issue is that 'a == b ? *a : 0' and 'a == b ? *b : 0' always evaluate
to the same value and the compiler will (effectively) substitute one for the
other.
But sometimes you really do care which pointer is speculatively dereferenced
when the they are different.
Memory barriers can only enforce the order of the reads of 'a', 'b' and '*a',
they won't change whether the generated code contains '*a' or '*b'.
David
>
> Best,
> Gary
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-12-18 17:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20251218014531.3793471-1-mathieu.desnoyers@efficios.com>
2025-12-18 1:45 ` [RFC PATCH v4 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency Mathieu Desnoyers
2025-12-18 9:03 ` David Laight
2025-12-18 13:51 ` Mathieu Desnoyers
2025-12-18 15:54 ` David Laight
2025-12-18 14:27 ` Gary Guo
2025-12-18 16:12 ` David Laight
2025-12-18 1:45 ` [RFC PATCH v4 2/4] Documentation: RCU: Refer to ptr_eq() Mathieu Desnoyers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox