Building the Linux kernel with Clang and LLVM
 help / color / mirror / Atom feed
* [RFC PATCH 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency
       [not found] <20241002010205.1341915-1-mathieu.desnoyers@efficios.com>
@ 2024-10-02  1:02 ` Mathieu Desnoyers
  2024-10-03  0:08   ` Joel Fernandes
  2024-10-02  1:02 ` [RFC PATCH 2/4] Documentation: RCU: Refer to ptr_eq() Mathieu Desnoyers
  1 sibling, 1 reply; 5+ messages in thread
From: Mathieu Desnoyers @ 2024-10-02  1:02 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, Peter Zijlstra
  Cc: linux-kernel, Mathieu Desnoyers, Nicholas Piggin,
	Michael Ellerman, Greg Kroah-Hartman, Sebastian Andrzej Siewior,
	Paul E. McKenney, Will Deacon, Boqun Feng, Alan Stern,
	John Stultz, Neeraj Upadhyay, Frederic Weisbecker, Joel Fernandes,
	Josh Triplett, Uladzislau Rezki, Steven Rostedt, Lai Jiangshan,
	Zqiang, Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
	Vlastimil Babka, maged.michael, Mateusz Guzik, Jonas Oberhauser,
	rcu, linux-mm, lkmm, Gary Guo, Nikita Popov, llvm

Compiler CSE and SSA GVN optimizations can cause the address dependency
of addresses returned by rcu_dereference to be lost when comparing those
pointers with either constants or previously loaded pointers.

Introduce ptr_eq() to compare two addresses while preserving the address
dependencies for later use of the address. It should be used when
comparing an address returned by rcu_dereference().

This is needed to prevent the compiler CSE and SSA GVN optimizations
from using @a (or @b) in places where the source refers to @b (or @a)
based on the fact that after the comparison, the two are known to be
equal, which does not preserve address dependencies and allows the
following misordering speculations:

- If @b is a constant, the compiler can issue the loads which depend
  on @a before loading @a.
- If @b is a register populated by a prior load, weakly-ordered
  CPUs can speculate loads which depend on @a before loading @a.

The same logic applies with @a and @b swapped.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: "Paul E. McKenney" <paulmck@kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: John Stultz <jstultz@google.com>
Cc: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Zqiang <qiang.zhang1211@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: maged.michael@gmail.com
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
Cc: rcu@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: lkmm@lists.linux.dev
Cc: Nikita Popov <github@npopov.com>
Cc: llvm@lists.linux.dev
---
Changes since v0:
- Include feedback from Alan Stern.
---
 include/linux/compiler.h | 63 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 2df665fa2964..75a378ae7af1 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -186,6 +186,69 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 	__asm__ ("" : "=r" (var) : "0" (var))
 #endif
 
+/*
+ * Compare two addresses while preserving the address dependencies for
+ * later use of the address. It should be used when comparing an address
+ * returned by rcu_dereference().
+ *
+ * This is needed to prevent the compiler CSE and SSA GVN optimizations
+ * from using @a (or @b) in places where the source refers to @b (or @a)
+ * based on the fact that after the comparison, the two are known to be
+ * equal, which does not preserve address dependencies and allows the
+ * following misordering speculations:
+ *
+ * - If @b is a constant, the compiler can issue the loads which depend
+ *   on @a before loading @a.
+ * - If @b is a register populated by a prior load, weakly-ordered
+ *   CPUs can speculate loads which depend on @a before loading @a.
+ *
+ * The same logic applies with @a and @b swapped.
+ *
+ * Return value: true if pointers are equal, false otherwise.
+ *
+ * The compiler barrier() is ineffective at fixing this issue. It does
+ * not prevent the compiler CSE from losing the address dependency:
+ *
+ * int fct_2_volatile_barriers(void)
+ * {
+ *     int *a, *b;
+ *
+ *     do {
+ *         a = READ_ONCE(p);
+ *         asm volatile ("" : : : "memory");
+ *         b = READ_ONCE(p);
+ *     } while (a != b);
+ *     asm volatile ("" : : : "memory");  <-- barrier()
+ *     return *b;
+ * }
+ *
+ * With gcc 14.2 (arm64):
+ *
+ * fct_2_volatile_barriers:
+ *         adrp    x0, .LANCHOR0
+ *         add     x0, x0, :lo12:.LANCHOR0
+ * .L2:
+ *         ldr     x1, [x0]  <-- x1 populated by first load.
+ *         ldr     x2, [x0]
+ *         cmp     x1, x2
+ *         bne     .L2
+ *         ldr     w0, [x1]  <-- x1 is used for access which should depend on b.
+ *         ret
+ *
+ * On weakly-ordered architectures, this lets CPU speculation use the
+ * result from the first load to speculate "ldr w0, [x1]" before
+ * "ldr x2, [x0]".
+ * Based on the RCU documentation, the control dependency does not
+ * prevent the CPU from speculating loads.
+ */
+static __always_inline
+int ptr_eq(const volatile void *a, const volatile void *b)
+{
+	OPTIMIZER_HIDE_VAR(a);
+	OPTIMIZER_HIDE_VAR(b);
+	return a == b;
+}
+
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/4] Documentation: RCU: Refer to ptr_eq()
       [not found] <20241002010205.1341915-1-mathieu.desnoyers@efficios.com>
  2024-10-02  1:02 ` [RFC PATCH 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency Mathieu Desnoyers
@ 2024-10-02  1:02 ` Mathieu Desnoyers
  1 sibling, 0 replies; 5+ messages in thread
From: Mathieu Desnoyers @ 2024-10-02  1:02 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, Peter Zijlstra
  Cc: linux-kernel, Mathieu Desnoyers, Nicholas Piggin,
	Michael Ellerman, Greg Kroah-Hartman, Sebastian Andrzej Siewior,
	Paul E. McKenney, Will Deacon, Boqun Feng, Alan Stern,
	John Stultz, Neeraj Upadhyay, Frederic Weisbecker, Joel Fernandes,
	Josh Triplett, Uladzislau Rezki, Steven Rostedt, Lai Jiangshan,
	Zqiang, Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
	Vlastimil Babka, maged.michael, Mateusz Guzik, Jonas Oberhauser,
	rcu, linux-mm, lkmm, Gary Guo, Nikita Popov, llvm

Refer to ptr_eq() in the rcu_dereference() documentation.

ptr_eq() is a mechanism that preserves address dependencies when
comparing pointers, and should be favored when comparing a pointer
obtained from rcu_dereference() against another pointer.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: John Stultz <jstultz@google.com>
Cc: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Zqiang <qiang.zhang1211@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: maged.michael@gmail.com
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
Cc: rcu@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: lkmm@lists.linux.dev
Cc: Nikita Popov <github@npopov.com>
Cc: llvm@lists.linux.dev
---
Changes since v0:
- Include feedback from Alan Stern.

Changes since v1:
- Include feedback from Paul E. McKenney.
---
 Documentation/RCU/rcu_dereference.rst | 38 +++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git a/Documentation/RCU/rcu_dereference.rst b/Documentation/RCU/rcu_dereference.rst
index 2524dcdadde2..de6175bf430f 100644
--- a/Documentation/RCU/rcu_dereference.rst
+++ b/Documentation/RCU/rcu_dereference.rst
@@ -104,11 +104,12 @@ readers working properly:
 	after such branches, but can speculate loads, which can again
 	result in misordering bugs.
 
--	Be very careful about comparing pointers obtained from
-	rcu_dereference() against non-NULL values.  As Linus Torvalds
-	explained, if the two pointers are equal, the compiler could
-	substitute the pointer you are comparing against for the pointer
-	obtained from rcu_dereference().  For example::
+-	Use operations that preserve address dependencies (such as
+	"ptr_eq()") to compare pointers obtained from rcu_dereference()
+	against non-NULL pointers. As Linus Torvalds explained, if the
+	two pointers are equal, the compiler could substitute the
+	pointer you are comparing against for the pointer obtained from
+	rcu_dereference().  For example::
 
 		p = rcu_dereference(gp);
 		if (p == &default_struct)
@@ -125,6 +126,29 @@ readers working properly:
 	On ARM and Power hardware, the load from "default_struct.a"
 	can now be speculated, such that it might happen before the
 	rcu_dereference().  This could result in bugs due to misordering.
+	Performing the comparison with "ptr_eq()" ensures the compiler
+	does not perform such transformation.
+
+	If the comparison is against another pointer, the compiler is
+	allowed to use either pointer for the following accesses, which
+	loses the address dependency and allows weakly-ordered
+	architectures such as ARM and PowerPC to speculate the
+	address-dependent load before rcu_dereference().  For example::
+
+		p1 = READ_ONCE(gp);
+		p2 = rcu_dereference(gp);
+		if (p1 == p2)  /* BUGGY!!! */
+			do_default(p2->a);
+
+	The compiler can use p1->a rather than p2->a, destroying the
+	address dependency.  Performing the comparison with "ptr_eq()"
+	ensures the compiler preserves the address dependencies.
+	Corrected code::
+
+		p1 = READ_ONCE(gp);
+		p2 = rcu_dereference(gp);
+		if (ptr_eq(p1, p2))
+			do_default(p2->a);
 
 	However, comparisons are OK in the following cases:
 
@@ -204,6 +228,10 @@ readers working properly:
 		comparison will provide exactly the information that the
 		compiler needs to deduce the value of the pointer.
 
+	When in doubt, use operations that preserve address dependencies
+	(such as "ptr_eq()") to compare pointers obtained from
+	rcu_dereference() against non-NULL pointers.
+
 -	Disable any value-speculation optimizations that your compiler
 	might provide, especially if you are making use of feedback-based
 	optimizations that take data collected from prior runs.  Such
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency
  2024-10-02  1:02 ` [RFC PATCH 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency Mathieu Desnoyers
@ 2024-10-03  0:08   ` Joel Fernandes
  2024-10-03 14:19     ` Mathieu Desnoyers
  0 siblings, 1 reply; 5+ messages in thread
From: Joel Fernandes @ 2024-10-03  0:08 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, linux-kernel,
	Nicholas Piggin, Michael Ellerman, Greg Kroah-Hartman,
	Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
	Boqun Feng, Alan Stern, John Stultz, Neeraj Upadhyay,
	Frederic Weisbecker, Josh Triplett, Uladzislau Rezki,
	Steven Rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
	Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
	Mateusz Guzik, Jonas Oberhauser, rcu, linux-mm, lkmm, Gary Guo,
	Nikita Popov, llvm

On Tue, Oct 01, 2024 at 09:02:02PM -0400, Mathieu Desnoyers wrote:
> Compiler CSE and SSA GVN optimizations can cause the address dependency
> of addresses returned by rcu_dereference to be lost when comparing those
> pointers with either constants or previously loaded pointers.
> 
> Introduce ptr_eq() to compare two addresses while preserving the address
> dependencies for later use of the address. It should be used when
> comparing an address returned by rcu_dereference().
> 
> This is needed to prevent the compiler CSE and SSA GVN optimizations
> from using @a (or @b) in places where the source refers to @b (or @a)
> based on the fact that after the comparison, the two are known to be
> equal, which does not preserve address dependencies and allows the
> following misordering speculations:
> 
> - If @b is a constant, the compiler can issue the loads which depend
>   on @a before loading @a.
> - If @b is a register populated by a prior load, weakly-ordered
>   CPUs can speculate loads which depend on @a before loading @a.
> 
> The same logic applies with @a and @b swapped.
> 
[...]
> +/*
> + * Compare two addresses while preserving the address dependencies for
> + * later use of the address. It should be used when comparing an address
> + * returned by rcu_dereference().
> + *
> + * This is needed to prevent the compiler CSE and SSA GVN optimizations
> + * from using @a (or @b) in places where the source refers to @b (or @a)
> + * based on the fact that after the comparison, the two are known to be
> + * equal, which does not preserve address dependencies and allows the
> + * following misordering speculations:
> + *
> + * - If @b is a constant, the compiler can issue the loads which depend
> + *   on @a before loading @a.
> + * - If @b is a register populated by a prior load, weakly-ordered
> + *   CPUs can speculate loads which depend on @a before loading @a.
> + *
> + * The same logic applies with @a and @b swapped.
> + *
> + * Return value: true if pointers are equal, false otherwise.
> + *
> + * The compiler barrier() is ineffective at fixing this issue. It does
> + * not prevent the compiler CSE from losing the address dependency:
> + *
> + * int fct_2_volatile_barriers(void)
> + * {
> + *     int *a, *b;
> + *
> + *     do {
> + *         a = READ_ONCE(p);
> + *         asm volatile ("" : : : "memory");
> + *         b = READ_ONCE(p);
> + *     } while (a != b);
> + *     asm volatile ("" : : : "memory");  <-- barrier()
> + *     return *b;
> + * }
> + *
> + * With gcc 14.2 (arm64):
> + *
> + * fct_2_volatile_barriers:
> + *         adrp    x0, .LANCHOR0
> + *         add     x0, x0, :lo12:.LANCHOR0
> + * .L2:
> + *         ldr     x1, [x0]  <-- x1 populated by first load.
> + *         ldr     x2, [x0]
> + *         cmp     x1, x2
> + *         bne     .L2
> + *         ldr     w0, [x1]  <-- x1 is used for access which should depend on b.
> + *         ret
> + *

I could reproduce this in compiler explorer, but I'm curious what flags are
you using? For me it does a bunch of usage of the stack for temporary storage
(still incorrectly returns *a though as you pointed).

Interestingly, if I just move the comparison into an an __always_inline__
function like below, but without the optimizer hide stuff, gcc 14.2 on arm64
does generate the correct code:

static inline __attribute__((__always_inline__)) int ptr_eq(const volatile void *a, const volatile void *b)
{
    /* No OPTIMIZER_HIDE_VAR */
    return a == b;
}

volatile int *p = 0;

int fct_2_volatile_barriers()
{
    int *a, *b;

    do {
        a = READ_ONCE(p);
        asm volatile ("" : : : "memory");
        b = READ_ONCE(p);
    } while (!ptr_eq(a, b));
    asm volatile ("" : : : "memory");  // barrier()
    return *b;
}

But not sure if it fixes the speculation issue you referred to.

Putting back the OPTIMIZER_HIDE_VAR() then just seems to pass the a and b
stored on the stack through a washing machine:

        ldr     x0, [sp, 8]
        str     x0, [sp, 8]
        ldr     x0, [sp]
        str     x0, [sp]

And here I thought the "" in OPTIMIZER_HIDE_VAR was not supposed to generate
any code but I guess it is still a NOOP.

Anyway, as such this LGTM since whether OPTIMIZER_HIDE_VAR() used or not, it
does fix the problem.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency
  2024-10-03  0:08   ` Joel Fernandes
@ 2024-10-03 14:19     ` Mathieu Desnoyers
  2024-10-03 22:09       ` Joel Fernandes
  0 siblings, 1 reply; 5+ messages in thread
From: Mathieu Desnoyers @ 2024-10-03 14:19 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, linux-kernel,
	Nicholas Piggin, Michael Ellerman, Greg Kroah-Hartman,
	Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
	Boqun Feng, Alan Stern, John Stultz, Neeraj Upadhyay,
	Frederic Weisbecker, Josh Triplett, Uladzislau Rezki,
	Steven Rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
	Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
	Mateusz Guzik, Jonas Oberhauser, rcu, linux-mm, lkmm, Gary Guo,
	Nikita Popov, llvm

On 2024-10-03 02:08, Joel Fernandes wrote:
> On Tue, Oct 01, 2024 at 09:02:02PM -0400, Mathieu Desnoyers wrote:
>> Compiler CSE and SSA GVN optimizations can cause the address dependency
>> of addresses returned by rcu_dereference to be lost when comparing those
>> pointers with either constants or previously loaded pointers.
>>
>> Introduce ptr_eq() to compare two addresses while preserving the address
>> dependencies for later use of the address. It should be used when
>> comparing an address returned by rcu_dereference().
>>
>> This is needed to prevent the compiler CSE and SSA GVN optimizations
>> from using @a (or @b) in places where the source refers to @b (or @a)
>> based on the fact that after the comparison, the two are known to be
>> equal, which does not preserve address dependencies and allows the
>> following misordering speculations:
>>
>> - If @b is a constant, the compiler can issue the loads which depend
>>    on @a before loading @a.
>> - If @b is a register populated by a prior load, weakly-ordered
>>    CPUs can speculate loads which depend on @a before loading @a.
>>
>> The same logic applies with @a and @b swapped.
>>
> [...]
>> +/*
>> + * Compare two addresses while preserving the address dependencies for
>> + * later use of the address. It should be used when comparing an address
>> + * returned by rcu_dereference().
>> + *
>> + * This is needed to prevent the compiler CSE and SSA GVN optimizations
>> + * from using @a (or @b) in places where the source refers to @b (or @a)
>> + * based on the fact that after the comparison, the two are known to be
>> + * equal, which does not preserve address dependencies and allows the
>> + * following misordering speculations:
>> + *
>> + * - If @b is a constant, the compiler can issue the loads which depend
>> + *   on @a before loading @a.
>> + * - If @b is a register populated by a prior load, weakly-ordered
>> + *   CPUs can speculate loads which depend on @a before loading @a.
>> + *
>> + * The same logic applies with @a and @b swapped.
>> + *
>> + * Return value: true if pointers are equal, false otherwise.
>> + *
>> + * The compiler barrier() is ineffective at fixing this issue. It does
>> + * not prevent the compiler CSE from losing the address dependency:
>> + *
>> + * int fct_2_volatile_barriers(void)
>> + * {
>> + *     int *a, *b;
>> + *
>> + *     do {
>> + *         a = READ_ONCE(p);
>> + *         asm volatile ("" : : : "memory");
>> + *         b = READ_ONCE(p);
>> + *     } while (a != b);
>> + *     asm volatile ("" : : : "memory");  <-- barrier()
>> + *     return *b;
>> + * }
>> + *
>> + * With gcc 14.2 (arm64):
>> + *
>> + * fct_2_volatile_barriers:
>> + *         adrp    x0, .LANCHOR0
>> + *         add     x0, x0, :lo12:.LANCHOR0
>> + * .L2:
>> + *         ldr     x1, [x0]  <-- x1 populated by first load.
>> + *         ldr     x2, [x0]
>> + *         cmp     x1, x2
>> + *         bne     .L2
>> + *         ldr     w0, [x1]  <-- x1 is used for access which should depend on b.
>> + *         ret
>> + *
> 
> I could reproduce this in compiler explorer, but I'm curious what flags are
> you using? For me it does a bunch of usage of the stack for temporary storage
> (still incorrectly returns *a though as you pointed).

You are probably missing "-O2".


> 
> Interestingly, if I just move the comparison into an an __always_inline__
> function like below, but without the optimizer hide stuff, gcc 14.2 on arm64
> does generate the correct code:

Make sure you compile in -O2. Based on a quick check here the hide var
is needed to make sure the compiler does the intended behavior in O2.

> 
> static inline __attribute__((__always_inline__)) int ptr_eq(const volatile void *a, const volatile void *b)
> {
>      /* No OPTIMIZER_HIDE_VAR */
>      return a == b;
> }
> 
> volatile int *p = 0;
> 
> int fct_2_volatile_barriers()
> {
>      int *a, *b;
> 
>      do {
>          a = READ_ONCE(p);
>          asm volatile ("" : : : "memory");
>          b = READ_ONCE(p);
>      } while (!ptr_eq(a, b));
>      asm volatile ("" : : : "memory");  // barrier()
>      return *b;
> }
> 
> But not sure if it fixes the speculation issue you referred to.

Not in -O2.

> 
> Putting back the OPTIMIZER_HIDE_VAR() then just seems to pass the a and b
> stored on the stack through a washing machine:
> 
>          ldr     x0, [sp, 8]
>          str     x0, [sp, 8]
>          ldr     x0, [sp]
>          str     x0, [sp]

That washing machine looks like the result of -O0.

> 
> And here I thought the "" in OPTIMIZER_HIDE_VAR was not supposed to generate
> any code but I guess it is still a NOOP.

The hide var will only emit an extra register movement to copy the
register to a temporary. That's one extra instruction but not as bad
as what you observe in -O0.

> 
> Anyway, as such this LGTM since whether OPTIMIZER_HIDE_VAR() used or not, it
> does fix the problem.

hide var is needed in O2.

> 
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Please double-check with -O2, and let me know if you still agree with
the patch :)

Thanks,

Mathieu


> 
> thanks,
> 
>   - Joel
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency
  2024-10-03 14:19     ` Mathieu Desnoyers
@ 2024-10-03 22:09       ` Joel Fernandes
  0 siblings, 0 replies; 5+ messages in thread
From: Joel Fernandes @ 2024-10-03 22:09 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, linux-kernel,
	Nicholas Piggin, Michael Ellerman, Greg Kroah-Hartman,
	Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
	Boqun Feng, Alan Stern, John Stultz, Neeraj Upadhyay,
	Frederic Weisbecker, Josh Triplett, Uladzislau Rezki,
	Steven Rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
	Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
	Mateusz Guzik, Jonas Oberhauser, rcu, linux-mm, lkmm, Gary Guo,
	Nikita Popov, llvm

On Thu, Oct 3, 2024 at 10:21 AM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
> >
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>
> Please double-check with -O2, and let me know if you still agree with
> the patch :)
>

You are quite right, with -O2 I can indeed see that the optimize hide
var fixes it.

FWIW:
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Thanks!

 - Joel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-10-03 22:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20241002010205.1341915-1-mathieu.desnoyers@efficios.com>
2024-10-02  1:02 ` [RFC PATCH 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency Mathieu Desnoyers
2024-10-03  0:08   ` Joel Fernandes
2024-10-03 14:19     ` Mathieu Desnoyers
2024-10-03 22:09       ` Joel Fernandes
2024-10-02  1:02 ` [RFC PATCH 2/4] Documentation: RCU: Refer to ptr_eq() Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox