public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH] ARM: pcpu: ensure __my_cpu_offset cannot be re-ordered across barrier()
@ 2013-06-03 16:53 Will Deacon
  2013-06-03 19:33 ` Nicolas Pitre
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2013-06-03 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

__my_cpu_offset is non-volatile, since we want its value to be cached
when we access several per-cpu variables in a row with preemption
disabled. This means that we rely on preempt_{en,dis}able to hazard
with the operation via the barrier() macro, so that we can't end up
migrating CPUs without reloading the per-cpu offset.

Unfortunately, GCC doesn't treat a "memory" clobber on a non-volatile
asm block as a side-effect, and will happily re-order it before other
memory clobbers (including those in prempt_disable()) and cache the
value. This has been observed to break the cmpxchg logic in the slub
allocator, leading to livelock in kmem_cache_alloc in mainline kernels.

This patch adds a dummy memory output operand to __my_cpu_offset,
forcing it to be ordered with respect to the barrier() macro.

Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/percpu.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/percpu.h b/arch/arm/include/asm/percpu.h
index 968c0a1..93970eb 100644
--- a/arch/arm/include/asm/percpu.h
+++ b/arch/arm/include/asm/percpu.h
@@ -30,8 +30,12 @@ static inline void set_my_cpu_offset(unsigned long off)
 static inline unsigned long __my_cpu_offset(void)
 {
 	unsigned long off;
-	/* Read TPIDRPRW */
-	asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory");
+	/*
+	 * Read TPIDRPRW.
+	 * We want to allow caching the value, so avoid using volatile and
+	 * instead use a fake memory access to hazard against barrier().
+	 */
+	asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off), "=Qo" (off));
 	return off;
 }
 #define __my_cpu_offset __my_cpu_offset()
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] ARM: pcpu: ensure __my_cpu_offset cannot be re-ordered across barrier()
  2013-06-03 16:53 [PATCH] ARM: pcpu: ensure __my_cpu_offset cannot be re-ordered across barrier() Will Deacon
@ 2013-06-03 19:33 ` Nicolas Pitre
  2013-06-04 10:39   ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2013-06-03 19:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 3 Jun 2013, Will Deacon wrote:

> __my_cpu_offset is non-volatile, since we want its value to be cached
> when we access several per-cpu variables in a row with preemption
> disabled. This means that we rely on preempt_{en,dis}able to hazard
> with the operation via the barrier() macro, so that we can't end up
> migrating CPUs without reloading the per-cpu offset.
> 
> Unfortunately, GCC doesn't treat a "memory" clobber on a non-volatile
> asm block as a side-effect, and will happily re-order it before other
> memory clobbers (including those in prempt_disable()) and cache the
> value. This has been observed to break the cmpxchg logic in the slub
> allocator, leading to livelock in kmem_cache_alloc in mainline kernels.
> 
> This patch adds a dummy memory output operand to __my_cpu_offset,
> forcing it to be ordered with respect to the barrier() macro.
> 
> Cc: Rob Herring <rob.herring@calxeda.com>
> Cc: Nicolas Pitre <nico@fluxnic.net>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm/include/asm/percpu.h | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/percpu.h b/arch/arm/include/asm/percpu.h
> index 968c0a1..93970eb 100644
> --- a/arch/arm/include/asm/percpu.h
> +++ b/arch/arm/include/asm/percpu.h
> @@ -30,8 +30,12 @@ static inline void set_my_cpu_offset(unsigned long off)
>  static inline unsigned long __my_cpu_offset(void)
>  {
>  	unsigned long off;
> -	/* Read TPIDRPRW */
> -	asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory");
> +	/*
> +	 * Read TPIDRPRW.
> +	 * We want to allow caching the value, so avoid using volatile and
> +	 * instead use a fake memory access to hazard against barrier().
> +	 */
> +	asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off), "=Qo" (off));
>  	return off;
>  }

This doesn't work with the little test I wrote to see the compiler 
behavior change.  Here's my test in case it is flawed:

static inline unsigned long __my_cpu_offset(void)
{
       unsigned long off;
       //asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory");
       asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off), "=Qo" (off));
       return off;
}

#define barrier() asm volatile ("" : : : "memory")

int foo(int *a, int *b, char *c)
{
	int x, y, z;

	x = a[__my_cpu_offset()];
	barrier();
	y = b[__my_cpu_offset()];
	z = c[__my_cpu_offset()];
	barrier();
	return  x + y + z + a[__my_cpu_offset()];
}

With the original memory clobber (commented out above) the asm output 
is:

foo:
        mrc     p15, 0, r3, c13, c0, 4
        ldr     ip, [r0, r3, asl #2]
        ldr     r1, [r1, r3, asl #2]
        ldrb    r2, [r2, r3]    @ zero_extendqisi2
        ldr     r3, [r0, r3, asl #2]
        add     r0, ip, r1
        add     r0, r0, r2
        add     r0, r0, r3
        bx      lr

So to confirm that no memory barrier is respected in any way.

With your change we get this instead:

foo:
        sub     sp, sp, #8
        mrc     p15, 0, r3, c13, c0, 4
        str     r3, [sp, #4]
        ldr     ip, [r0, r3, asl #2]
        ldr     r1, [r1, r3, asl #2]
        str     r3, [sp, #4]
        ldrb    r2, [r2, r3]    @ zero_extendqisi2
        ldr     r3, [r0, r3, asl #2]
        add     r0, ip, r1
        add     r0, r0, r2
        add     r0, r0, r3
        add     sp, sp, #8
        bx      lr

So not only the barrier is completely ineffective at reloading the CPU 
offset, but it introduces a useless store onto the stack.  Same behavior 
with gcc versions 4.7.3, 4.6.2 and 4.5.1 (yes, that's really what I have 
on my system -- waiting for 4.8.4 now).

So if the preemption count is really what this should be protected 
against, we should simply be adding that as an input dependency to the 
inline asm constraint as follows:

diff --git a/arch/arm/include/asm/percpu.h b/arch/arm/include/asm/percpu.h
index 968c0a14e0..52f9ca33e7 100644
--- a/arch/arm/include/asm/percpu.h
+++ b/arch/arm/include/asm/percpu.h
@@ -31,7 +31,9 @@ static inline unsigned long __my_cpu_offset(void)
 {
 	unsigned long off;
 	/* Read TPIDRPRW */
-	asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory");
+	asm("mrc p15, 0, %0, c13, c0, 4"
+	    : "=r" (off)
+	    : "m" (current_thread_info()->preempt_count));
 	return off;
 }
 #define __my_cpu_offset __my_cpu_offset()

The problem with this approach is that current_thread_info() is not a 
static location either as it is always derrived from the stack pointer, 
so it needlessly computes it even when nothing else uses it in the same 
function.


Nicolas

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] ARM: pcpu: ensure __my_cpu_offset cannot be re-ordered across barrier()
  2013-06-03 19:33 ` Nicolas Pitre
@ 2013-06-04 10:39   ` Will Deacon
  2013-06-04 17:15     ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2013-06-04 10:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Nico,

Thanks for helping out with this.

On Mon, Jun 03, 2013 at 08:33:24PM +0100, Nicolas Pitre wrote:
> On Mon, 3 Jun 2013, Will Deacon wrote:
> >  static inline unsigned long __my_cpu_offset(void)
> >  {
> >  	unsigned long off;
> > -	/* Read TPIDRPRW */
> > -	asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory");
> > +	/*
> > +	 * Read TPIDRPRW.
> > +	 * We want to allow caching the value, so avoid using volatile and
> > +	 * instead use a fake memory access to hazard against barrier().
> > +	 */
> > +	asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off), "=Qo" (off));
> >  	return off;
> >  }
> 
> This doesn't work with the little test I wrote to see the compiler 
> behavior change.  Here's my test in case it is flawed:

[...]

> int foo(int *a, int *b, char *c)
> {
> 	int x, y, z;
> 
> 	x = a[__my_cpu_offset()];
> 	barrier();
> 	y = b[__my_cpu_offset()];
> 	z = c[__my_cpu_offset()];
> 	barrier();
> 	return  x + y + z + a[__my_cpu_offset()];
> }
> 
> With the original memory clobber (commented out above) the asm output 
> is:
> 
> foo:
>         mrc     p15, 0, r3, c13, c0, 4
>         ldr     ip, [r0, r3, asl #2]
>         ldr     r1, [r1, r3, asl #2]
>         ldrb    r2, [r2, r3]    @ zero_extendqisi2
>         ldr     r3, [r0, r3, asl #2]
>         add     r0, ip, r1
>         add     r0, r0, r2
>         add     r0, r0, r3
>         bx      lr
> 
> So to confirm that no memory barrier is respected in any way.
> 
> With your change we get this instead:
> 
> foo:
>         sub     sp, sp, #8
>         mrc     p15, 0, r3, c13, c0, 4
>         str     r3, [sp, #4]
>         ldr     ip, [r0, r3, asl #2]
>         ldr     r1, [r1, r3, asl #2]
>         str     r3, [sp, #4]
>         ldrb    r2, [r2, r3]    @ zero_extendqisi2
>         ldr     r3, [r0, r3, asl #2]
>         add     r0, ip, r1
>         add     r0, r0, r2
>         add     r0, r0, r3
>         add     sp, sp, #8
>         bx      lr

Ah crap, that sucks. Thanks for testing it -- I've just been looking at slub
with my toolchain, but your test case also falls to bits here.

> So not only the barrier is completely ineffective at reloading the CPU 
> offset, but it introduces a useless store onto the stack.  Same behavior 
> with gcc versions 4.7.3, 4.6.2 and 4.5.1 (yes, that's really what I have 
> on my system -- waiting for 4.8.4 now).

Just a heads up: I've had nothing but problems with 4.8-based toolchains
(including those provided by Linaro). Seriously, I had the thing ICE on 5 of
the first 7 files building the kernel!

> So if the preemption count is really what this should be protected 
> against, we should simply be adding that as an input dependency to the 
> inline asm constraint as follows:
> 
> diff --git a/arch/arm/include/asm/percpu.h b/arch/arm/include/asm/percpu.h
> index 968c0a14e0..52f9ca33e7 100644
> --- a/arch/arm/include/asm/percpu.h
> +++ b/arch/arm/include/asm/percpu.h
> @@ -31,7 +31,9 @@ static inline unsigned long __my_cpu_offset(void)
>  {
>  	unsigned long off;
>  	/* Read TPIDRPRW */
> -	asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory");
> +	asm("mrc p15, 0, %0, c13, c0, 4"
> +	    : "=r" (off)
> +	    : "m" (current_thread_info()->preempt_count));
>  	return off;
>  }
>  #define __my_cpu_offset __my_cpu_offset()
> 
> The problem with this approach is that current_thread_info() is not a 
> static location either as it is always derrived from the stack pointer, 
> so it needlessly computes it even when nothing else uses it in the same 
> function.

Yes, that's going to generate fairly horrible code :(

Actually, I just had a play around and managed to come up with something a
bit better. It seems as though putting the memory constraint as an *input*
clobber is what GCC really wants, but this sucks because you end up
generating prologue code to set things up for the addressing mode which you
never use. For example, you can pretend you have a dependency on *(unsigned
long *)0 and your test case becomes:

c001cd48:       e52d4004        push    {r4}            ; (str r4, [sp, #-4]!)
c001cd4c:       e3a03000        mov     r3, #0
c001cd50:       ee1dcf90        mrc     15, 0, ip, cr13, cr0, {4}
c001cd54:       e790410c        ldr     r4, [r0, ip, lsl #2]
c001cd58:       ee1dcf90        mrc     15, 0, ip, cr13, cr0, {4}
c001cd5c:       e791110c        ldr     r1, [r1, ip, lsl #2]
c001cd60:       e792210c        ldr     r2, [r2, ip, lsl #2]
c001cd64:       ee1d3f90        mrc     15, 0, r3, cr13, cr0, {4}
c001cd68:       e7903103        ldr     r3, [r0, r3, lsl #2]
c001cd6c:       e0840001        add     r0, r4, r1
c001cd70:       e0800002        add     r0, r0, r2
c001cd74:       e0800003        add     r0, r0, r3
c001cd78:       e8bd0010        pop     {r4}
c001cd7c:       e12fff1e        bx      lr

However, if you instead add a dependency on something that already lives in a
register, like __builtin_return_address, you get:

c001cd44:       e92d4010        push    {r4, lr}
c001cd48:       ee1dcf90        mrc     15, 0, ip, cr13, cr0, {4}
c001cd4c:       e790410c        ldr     r4, [r0, ip, lsl #2]
c001cd50:       ee1dcf90        mrc     15, 0, ip, cr13, cr0, {4}
c001cd54:       e791110c        ldr     r1, [r1, ip, lsl #2]
c001cd58:       e792210c        ldr     r2, [r2, ip, lsl #2]
c001cd5c:       ee1d3f90        mrc     15, 0, r3, cr13, cr0, {4}
c001cd60:       e7903103        ldr     r3, [r0, r3, lsl #2]
c001cd64:       e0840001        add     r0, r4, r1
c001cd68:       e0800002        add     r0, r0, r2
c001cd6c:       e0800003        add     r0, r0, r3
c001cd70:       e8bd8010        pop     {r4, pc}

Unfortunately, that builtin has clobber semantics on lr (no idea why) so we
push an extra register. I'm going to have a chat with our tools guys this
afternoon to see if there's anything better we can do, but for the time
being my diff is below.

Cheers,

Will

--->8

diff --git a/arch/arm/include/asm/percpu.h b/arch/arm/include/asm/percpu.h
index 968c0a1..6780990 100644
--- a/arch/arm/include/asm/percpu.h
+++ b/arch/arm/include/asm/percpu.h
@@ -30,8 +30,16 @@ static inline void set_my_cpu_offset(unsigned long off)
 static inline unsigned long __my_cpu_offset(void)
 {
        unsigned long off;
-       /* Read TPIDRPRW */
-       asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory");
+
+       /*
+        * Read TPIDRPRW.
+        * We want to allow caching the value, so avoid using volatile and
+        * instead use a fake memory access to hazard against barrier().
+        */
+       asm("mrc p15, 0, %0, c13, c0, 4"
+           : "=r" (off)
+           : "Q" (*(unsigned long *)__builtin_return_address(0)));
+
        return off;
 }
 #define __my_cpu_offset __my_cpu_offset()

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] ARM: pcpu: ensure __my_cpu_offset cannot be re-ordered across barrier()
  2013-06-04 10:39   ` Will Deacon
@ 2013-06-04 17:15     ` Will Deacon
  2013-06-04 17:39       ` Nicolas Pitre
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2013-06-04 17:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 04, 2013 at 11:39:53AM +0100, Will Deacon wrote:
> However, if you instead add a dependency on something that already lives in a
> register, like __builtin_return_address, you get:
> 
> c001cd44:       e92d4010        push    {r4, lr}
> c001cd48:       ee1dcf90        mrc     15, 0, ip, cr13, cr0, {4}
> c001cd4c:       e790410c        ldr     r4, [r0, ip, lsl #2]
> c001cd50:       ee1dcf90        mrc     15, 0, ip, cr13, cr0, {4}
> c001cd54:       e791110c        ldr     r1, [r1, ip, lsl #2]
> c001cd58:       e792210c        ldr     r2, [r2, ip, lsl #2]
> c001cd5c:       ee1d3f90        mrc     15, 0, r3, cr13, cr0, {4}
> c001cd60:       e7903103        ldr     r3, [r0, r3, lsl #2]
> c001cd64:       e0840001        add     r0, r4, r1
> c001cd68:       e0800002        add     r0, r0, r2
> c001cd6c:       e0800003        add     r0, r0, r3
> c001cd70:       e8bd8010        pop     {r4, pc}

Actually, I've managed to improve this further using sp instead (see diff
below), which also prevents GCC from moving the lr into other registers for
no reason. The code generated for your test is:

c001cda4:       ee1d3f90        mrc     15, 0, r3, cr13, cr0, {4}
c001cda8:       e790c103        ldr     ip, [r0, r3, lsl #2]
c001cdac:       ee1d3f90        mrc     15, 0, r3, cr13, cr0, {4}
c001cdb0:       e7911103        ldr     r1, [r1, r3, lsl #2]
c001cdb4:       e7922103        ldr     r2, [r2, r3, lsl #2]
c001cdb8:       ee1d3f90        mrc     15, 0, r3, cr13, cr0, {4}
c001cdbc:       e7903103        ldr     r3, [r0, r3, lsl #2]
c001cdc0:       e08c0001        add     r0, ip, r1
c001cdc4:       e0800002        add     r0, r0, r2
c001cdc8:       e0800003        add     r0, r0, r3
c001cdcc:       e12fff1e        bx      lr

I spoke to the tools guys and they'll look into why the original patch
didn't work, since that could have wider implications on our use of
barrier().

Will

--->8

diff --git a/arch/arm/include/asm/percpu.h b/arch/arm/include/asm/percpu.h
index 968c0a1..209e650 100644
--- a/arch/arm/include/asm/percpu.h
+++ b/arch/arm/include/asm/percpu.h
@@ -30,8 +30,15 @@ static inline void set_my_cpu_offset(unsigned long off)
 static inline unsigned long __my_cpu_offset(void)
 {
        unsigned long off;
-       /* Read TPIDRPRW */
-       asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory");
+       register unsigned long *sp asm ("sp");
+
+       /*
+        * Read TPIDRPRW.
+        * We want to allow caching the value, so avoid using volatile and
+        * instead use a fake stack read to hazard against barrier().
+        */
+       asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : "Q" (*sp));
+
        return off;
 }
 #define __my_cpu_offset __my_cpu_offset()

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] ARM: pcpu: ensure __my_cpu_offset cannot be re-ordered across barrier()
  2013-06-04 17:15     ` Will Deacon
@ 2013-06-04 17:39       ` Nicolas Pitre
  2013-06-05  9:59         ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2013-06-04 17:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 4 Jun 2013, Will Deacon wrote:

> On Tue, Jun 04, 2013 at 11:39:53AM +0100, Will Deacon wrote:
> > However, if you instead add a dependency on something that already lives in a
> > register, like __builtin_return_address, you get:
> > 
> > c001cd44:       e92d4010        push    {r4, lr}
> > c001cd48:       ee1dcf90        mrc     15, 0, ip, cr13, cr0, {4}
> > c001cd4c:       e790410c        ldr     r4, [r0, ip, lsl #2]
> > c001cd50:       ee1dcf90        mrc     15, 0, ip, cr13, cr0, {4}
> > c001cd54:       e791110c        ldr     r1, [r1, ip, lsl #2]
> > c001cd58:       e792210c        ldr     r2, [r2, ip, lsl #2]
> > c001cd5c:       ee1d3f90        mrc     15, 0, r3, cr13, cr0, {4}
> > c001cd60:       e7903103        ldr     r3, [r0, r3, lsl #2]
> > c001cd64:       e0840001        add     r0, r4, r1
> > c001cd68:       e0800002        add     r0, r0, r2
> > c001cd6c:       e0800003        add     r0, r0, r3
> > c001cd70:       e8bd8010        pop     {r4, pc}
> 
> Actually, I've managed to improve this further using sp instead (see diff
> below), which also prevents GCC from moving the lr into other registers for
> no reason. The code generated for your test is:
> 
> c001cda4:       ee1d3f90        mrc     15, 0, r3, cr13, cr0, {4}
> c001cda8:       e790c103        ldr     ip, [r0, r3, lsl #2]
> c001cdac:       ee1d3f90        mrc     15, 0, r3, cr13, cr0, {4}
> c001cdb0:       e7911103        ldr     r1, [r1, r3, lsl #2]
> c001cdb4:       e7922103        ldr     r2, [r2, r3, lsl #2]
> c001cdb8:       ee1d3f90        mrc     15, 0, r3, cr13, cr0, {4}
> c001cdbc:       e7903103        ldr     r3, [r0, r3, lsl #2]
> c001cdc0:       e08c0001        add     r0, ip, r1
> c001cdc4:       e0800002        add     r0, r0, r2
> c001cdc8:       e0800003        add     r0, r0, r3
> c001cdcc:       e12fff1e        bx      lr

Excellent.

> I spoke to the tools guys and they'll look into why the original patch
> didn't work, since that could have wider implications on our use of
> barrier().

Good.  However, given the wide range of gcc versions exhibiting the same 
behavior, I think your patch is the best way forward even if gcc were to 
be broken and eventually fixed.

Reviewed-by: Nicolas Pitre <nico@linaro.org>

I'd suggest queuing it for stable as well.

> --->8
> 
> diff --git a/arch/arm/include/asm/percpu.h b/arch/arm/include/asm/percpu.h
> index 968c0a1..209e650 100644
> --- a/arch/arm/include/asm/percpu.h
> +++ b/arch/arm/include/asm/percpu.h
> @@ -30,8 +30,15 @@ static inline void set_my_cpu_offset(unsigned long off)
>  static inline unsigned long __my_cpu_offset(void)
>  {
>         unsigned long off;
> -       /* Read TPIDRPRW */
> -       asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory");
> +       register unsigned long *sp asm ("sp");
> +
> +       /*
> +        * Read TPIDRPRW.
> +        * We want to allow caching the value, so avoid using volatile and
> +        * instead use a fake stack read to hazard against barrier().
> +        */
> +       asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : "Q" (*sp));
> +
>         return off;
>  }
>  #define __my_cpu_offset __my_cpu_offset()
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] ARM: pcpu: ensure __my_cpu_offset cannot be re-ordered across barrier()
  2013-06-04 17:39       ` Nicolas Pitre
@ 2013-06-05  9:59         ` Will Deacon
  0 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2013-06-05  9:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 04, 2013 at 06:39:42PM +0100, Nicolas Pitre wrote:
> On Tue, 4 Jun 2013, Will Deacon wrote:
> > I spoke to the tools guys and they'll look into why the original patch
> > didn't work, since that could have wider implications on our use of
> > barrier().
> 
> Good.  However, given the wide range of gcc versions exhibiting the same 
> behavior, I think your patch is the best way forward even if gcc were to 
> be broken and eventually fixed.
> 
> Reviewed-by: Nicolas Pitre <nico@linaro.org>

Thanks Nicolas.

> I'd suggest queuing it for stable as well.

Will do.

Will

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-06-05  9:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-03 16:53 [PATCH] ARM: pcpu: ensure __my_cpu_offset cannot be re-ordered across barrier() Will Deacon
2013-06-03 19:33 ` Nicolas Pitre
2013-06-04 10:39   ` Will Deacon
2013-06-04 17:15     ` Will Deacon
2013-06-04 17:39       ` Nicolas Pitre
2013-06-05  9:59         ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox