[PATCH 1/24] make atomic_read() behave consistently on alpha

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/24] make atomic_read() behave consistently on alpha
@ 2007-08-09 13:24 Chris Snook
  2007-08-09 14:32 ` Paul E. McKenney
  0 siblings, 1 reply; 36+ messages in thread
From: Chris Snook @ 2007-08-09 13:24 UTC (permalink / raw)
  To: linux-kernel, linux-arch, torvalds
  Cc: netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl

From: Chris Snook <csnook@redhat.com>

Purify volatile use for atomic[64]_t on alpha.

Signed-off-by: Chris Snook <csnook@redhat.com>

--- linux-2.6.23-rc2-orig/include/asm-alpha/atomic.h	2007-07-08 19:32:17.000000000 -0400
+++ linux-2.6.23-rc2/include/asm-alpha/atomic.h	2007-08-09 09:19:00.000000000 -0400
@@ -14,18 +14,22 @@
 
 
 /*
- * Counter is volatile to make sure gcc doesn't try to be clever
- * and move things around on us. We need to use _exactly_ the address
- * the user gave us, not some alias that contains the same information.
+ * Make sure gcc doesn't try to be clever and move things around
+ * on us. We need to use _exactly_ the address the user gave us,
+ * not some alias that contains the same information.
  */
-typedef struct { volatile int counter; } atomic_t;
-typedef struct { volatile long counter; } atomic64_t;
+typedef struct { int counter; } atomic_t;
+typedef struct { long counter; } atomic64_t;
 
 #define ATOMIC_INIT(i)		( (atomic_t) { (i) } )
 #define ATOMIC64_INIT(i)	( (atomic64_t) { (i) } )
 
-#define atomic_read(v)		((v)->counter + 0)
-#define atomic64_read(v)	((v)->counter + 0)
+/*
+ * Casting to volatile here minimizes the need for barriers,
+ * without having to declare the type itself as volatile.
+ */
+#define atomic_read(v)		(*(volatile int *)&(v)->counter + 0)
+#define atomic64_read(v)	(*(volatile long *)&(v)->counter + 0)
 
 #define atomic_set(v,i)		((v)->counter = (i))
 #define atomic64_set(v,i)	((v)->counter = (i))

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 13:24 [PATCH 1/24] make atomic_read() behave consistently on alpha Chris Snook
@ 2007-08-09 14:32 ` Paul E. McKenney
  2007-08-09 14:53   ` Chris Snook
  0 siblings, 1 reply; 36+ messages in thread
From: Paul E. McKenney @ 2007-08-09 14:32 UTC (permalink / raw)
  To: Chris Snook
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Thu, Aug 09, 2007 at 09:24:42AM -0400, Chris Snook wrote:
> From: Chris Snook <csnook@redhat.com>
> 
> Purify volatile use for atomic[64]_t on alpha.

Why not the same access-once semantics for atomic_set() as
for atomic_read()?  As this patch stands, it might introduce
architecture-specific compiler-induced bugs due to the fact that
atomic_set() used to imply volatile behavior but no longer does.

See below for one approach to fixing this.

> Signed-off-by: Chris Snook <csnook@redhat.com>
> 
> --- linux-2.6.23-rc2-orig/include/asm-alpha/atomic.h	2007-07-08 19:32:17.000000000 -0400
> +++ linux-2.6.23-rc2/include/asm-alpha/atomic.h	2007-08-09 09:19:00.000000000 -0400
> @@ -14,18 +14,22 @@
> 
> 
>  /*
> - * Counter is volatile to make sure gcc doesn't try to be clever
> - * and move things around on us. We need to use _exactly_ the address
> - * the user gave us, not some alias that contains the same information.
> + * Make sure gcc doesn't try to be clever and move things around
> + * on us. We need to use _exactly_ the address the user gave us,
> + * not some alias that contains the same information.
>   */
> -typedef struct { volatile int counter; } atomic_t;
> -typedef struct { volatile long counter; } atomic64_t;
> +typedef struct { int counter; } atomic_t;
> +typedef struct { long counter; } atomic64_t;
> 
>  #define ATOMIC_INIT(i)		( (atomic_t) { (i) } )
>  #define ATOMIC64_INIT(i)	( (atomic64_t) { (i) } )
> 
> -#define atomic_read(v)		((v)->counter + 0)
> -#define atomic64_read(v)	((v)->counter + 0)
> +/*
> + * Casting to volatile here minimizes the need for barriers,
> + * without having to declare the type itself as volatile.
> + */
> +#define atomic_read(v)		(*(volatile int *)&(v)->counter + 0)
> +#define atomic64_read(v)	(*(volatile long *)&(v)->counter + 0)
> 
>  #define atomic_set(v,i)		((v)->counter = (i))
>  #define atomic64_set(v,i)	((v)->counter = (i))

How about:

#define atomic_set(v,i)		((*(volatile int *)&(v)->counter) = (i))
#define atomic64_set(v,i)	((*(volatile long *)&(v)->counter) = (i))

						Thanx, Paul

> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 14:32 ` Paul E. McKenney
@ 2007-08-09 14:53   ` Chris Snook
  2007-08-09 15:04     ` Paul E. McKenney
  0 siblings, 1 reply; 36+ messages in thread
From: Chris Snook @ 2007-08-09 14:53 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Paul E. McKenney wrote:
> Why not the same access-once semantics for atomic_set() as
> for atomic_read()?  As this patch stands, it might introduce
> architecture-specific compiler-induced bugs due to the fact that
> atomic_set() used to imply volatile behavior but no longer does.

When we make the volatile cast in atomic_read(), we're casting an rvalue to 
volatile.  This unambiguously tells the compiler that we want to re-load that 
register from memory.  What's "volatile behavior" for an lvalue?  A write to an 
lvalue already implies an eventual write to memory, so this would be a no-op. 
Maybe you'll write to the register a few times before flushing it to memory, but 
it will happen eventually.  With an rvalue, there's no guarantee that it will 
*ever* load from memory, which is what volatile fixes.

I think what you have in mind is LOCK_PREFIX behavior, which is not the purpose 
of atomic_set.  We use LOCK_PREFIX in the inline assembly for the atomic_* 
operations that read, modify, and write a value, only because it is necessary to 
perform that entire transaction atomically.

	-- Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 14:53   ` Chris Snook
@ 2007-08-09 15:04     ` Paul E. McKenney
  2007-08-09 15:24       ` Chris Snook
  0 siblings, 1 reply; 36+ messages in thread
From: Paul E. McKenney @ 2007-08-09 15:04 UTC (permalink / raw)
  To: Chris Snook
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Thu, Aug 09, 2007 at 10:53:14AM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >Why not the same access-once semantics for atomic_set() as
> >for atomic_read()?  As this patch stands, it might introduce
> >architecture-specific compiler-induced bugs due to the fact that
> >atomic_set() used to imply volatile behavior but no longer does.
> 
> When we make the volatile cast in atomic_read(), we're casting an rvalue to 
> volatile.  This unambiguously tells the compiler that we want to re-load 
> that register from memory.  What's "volatile behavior" for an lvalue?

I was absolutely -not- suggesting volatile behavior for lvalues.

Instead, I am asking for volatile behavior from an -rvalue-.  In the
case of atomic_read(), it is the atomic_t being read from.  In the case
of atomic_set(), it is the atomic_t being written to.  As suggested in
my previous email:

#define atomic_set(v,i)		((*(volatile int *)&(v)->counter) = (i))
#define atomic64_set(v,i)	((*(volatile long *)&(v)->counter) = (i))

Again, the architectures that used to have their "counter" declared
as volatile will lose volatile semantics on atomic_set() with your
patch, which might result in bugs due to overly imaginative compiler
optimizations.  The above would prevent any such bugs from appearing.

>                                                                       A 
> write to an lvalue already implies an eventual write to memory, so this 
> would be a no-op. Maybe you'll write to the register a few times before 
> flushing it to memory, but it will happen eventually.  With an rvalue, 
> there's no guarantee that it will *ever* load from memory, which is what 
> volatile fixes.
> 
> I think what you have in mind is LOCK_PREFIX behavior, which is not the 
> purpose of atomic_set.  We use LOCK_PREFIX in the inline assembly for the 
> atomic_* operations that read, modify, and write a value, only because it 
> is necessary to perform that entire transaction atomically.

No LOCK_PREFIX, thank you!!!  I just want to make sure that the compiler
doesn't push the store down out of a loop, split the store, allow the
store to happen twice (e.g., to allow different code paths to be merged),
and all the other tricks that the C standard permits compilers to pull.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 15:04     ` Paul E. McKenney
@ 2007-08-09 15:24       ` Chris Snook
  2007-08-09 15:50         ` Segher Boessenkool
  2007-08-09 16:10         ` Paul E. McKenney
  0 siblings, 2 replies; 36+ messages in thread
From: Chris Snook @ 2007-08-09 15:24 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Paul E. McKenney wrote:
> On Thu, Aug 09, 2007 at 10:53:14AM -0400, Chris Snook wrote:
>> Paul E. McKenney wrote:
>>> Why not the same access-once semantics for atomic_set() as
>>> for atomic_read()?  As this patch stands, it might introduce
>>> architecture-specific compiler-induced bugs due to the fact that
>>> atomic_set() used to imply volatile behavior but no longer does.
>> When we make the volatile cast in atomic_read(), we're casting an rvalue to 
>> volatile.  This unambiguously tells the compiler that we want to re-load 
>> that register from memory.  What's "volatile behavior" for an lvalue?
> 
> I was absolutely -not- suggesting volatile behavior for lvalues.
> 
> Instead, I am asking for volatile behavior from an -rvalue-.  In the
> case of atomic_read(), it is the atomic_t being read from.  In the case
> of atomic_set(), it is the atomic_t being written to.  As suggested in
> my previous email:
> 
> #define atomic_set(v,i)		((*(volatile int *)&(v)->counter) = (i))
> #define atomic64_set(v,i)	((*(volatile long *)&(v)->counter) = (i))

That looks like a volatile lvalue to me.  I confess I didn't exactly ace 
compilers.  Care to explain this?

> Again, the architectures that used to have their "counter" declared
> as volatile will lose volatile semantics on atomic_set() with your
> patch, which might result in bugs due to overly imaginative compiler
> optimizations.  The above would prevent any such bugs from appearing.
> 
>>                                                                       A 
>> write to an lvalue already implies an eventual write to memory, so this 
>> would be a no-op. Maybe you'll write to the register a few times before 
>> flushing it to memory, but it will happen eventually.  With an rvalue, 
>> there's no guarantee that it will *ever* load from memory, which is what 
>> volatile fixes.
>>
>> I think what you have in mind is LOCK_PREFIX behavior, which is not the 
>> purpose of atomic_set.  We use LOCK_PREFIX in the inline assembly for the 
>> atomic_* operations that read, modify, and write a value, only because it 
>> is necessary to perform that entire transaction atomically.
> 
> No LOCK_PREFIX, thank you!!!  I just want to make sure that the compiler
> doesn't push the store down out of a loop, split the store, allow the
> store to happen twice (e.g., to allow different code paths to be merged),
> and all the other tricks that the C standard permits compilers to pull.

We can't have split stores because we don't use atomic64_t on 32-bit 
architectures.  volatile won't save you from pushing stores out of loops unless 
you're also doing reads.  This is why we use reads to flush writes to mmio 
registers.  In this case, an atomic_read() with volatile in it will suffice. 
Storing twice is perfectly legal here, though it's unlikely that an optimizing 
compiler smart enough to create the problem we're addressing would ever do that.

	-- Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 15:24       ` Chris Snook
@ 2007-08-09 15:50         ` Segher Boessenkool
  2007-08-09 16:20           ` Chris Snook
  2007-08-09 16:10         ` Paul E. McKenney
  1 sibling, 1 reply; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-09 15:50 UTC (permalink / raw)
  To: Chris Snook
  Cc: wjiang, rpjday, wensong, heiko.carstens, linux-kernel, ak, netdev,
	paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds,
	schwidefsky, davem, cfriesen, zlynx

> We can't have split stores because we don't use atomic64_t on 32-bit 
> architectures.

That's not true; the compiler is free to split all stores
(and reads) from memory however it wants.  It is debatable
whether "volatile" would prevent this as well, certainly
it is unsafe if you want to be portable.  GCC will do its
best to not split volatile memory accesses, but bugs in
this area do happen a lot (because the compiler code for
volatile isn't as well exercised as most other compiler
code, and because it is simply a hard subject; and there
is no real formalised model for what GCC should do).

The only safe way to get atomic accesses is to write
assembler code.  Are there any downsides to that?  I don't
see any.

Segher

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 15:50         ` Segher Boessenkool
@ 2007-08-09 16:20           ` Chris Snook
  2007-08-09 18:38             ` Segher Boessenkool
  0 siblings, 1 reply; 36+ messages in thread
From: Chris Snook @ 2007-08-09 16:20 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: wjiang, rpjday, wensong, heiko.carstens, linux-kernel, ak, netdev,
	paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds,
	schwidefsky, davem, cfriesen, zlynx

Segher Boessenkool wrote:
>> We can't have split stores because we don't use atomic64_t on 32-bit 
>> architectures.
> 
> That's not true; the compiler is free to split all stores
> (and reads) from memory however it wants.  It is debatable
> whether "volatile" would prevent this as well, certainly
> it is unsafe if you want to be portable.  GCC will do its
> best to not split volatile memory accesses, but bugs in
> this area do happen a lot (because the compiler code for
> volatile isn't as well exercised as most other compiler
> code, and because it is simply a hard subject; and there
> is no real formalised model for what GCC should do).
> 
> The only safe way to get atomic accesses is to write
> assembler code.  Are there any downsides to that?  I don't
> see any.

The assumption that aligned word reads and writes are atomic, and that words are 
aligned unless explicitly packed otherwise, is endemic in the kernel.  No sane 
compiler violates this assumption.  It's true that we're not portable to insane 
compilers after this patch, but we never were in the first place.

	-- Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 16:20           ` Chris Snook
@ 2007-08-09 18:38             ` Segher Boessenkool
  2007-08-09 19:05               ` Chris Snook
  0 siblings, 1 reply; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-09 18:38 UTC (permalink / raw)
  To: Chris Snook
  Cc: wjiang, cfriesen, wensong, heiko.carstens, linux-kernel, ak,
	netdev, paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds,
	zlynx, rpjday, schwidefsky, davem

>> The only safe way to get atomic accesses is to write
>> assembler code.  Are there any downsides to that?  I don't
>> see any.
>
> The assumption that aligned word reads and writes are atomic, and that 
> words are aligned unless explicitly packed otherwise, is endemic in 
> the kernel.  No sane compiler violates this assumption.  It's true 
> that we're not portable to insane compilers after this patch, but we 
> never were in the first place.

You didn't answer my question: are there any downsides to using
explicit coded-in-assembler accesses for atomic accesses?  You
can handwave all you want that it should "just work" with
volatile accesses, but volatility != atomicity, volatile in C
is really badly defined, GCC never officially gave stronger
guarantees, and we have a bugzilla full of PRs to show what a
minefield it is.

So, why not use the well-defined alternative?


Segher


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 18:38             ` Segher Boessenkool
@ 2007-08-09 19:05               ` Chris Snook
  2007-08-09 19:19                 ` Segher Boessenkool
  2007-08-09 19:25                 ` Geert Uytterhoeven
  0 siblings, 2 replies; 36+ messages in thread
From: Chris Snook @ 2007-08-09 19:05 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: wjiang, cfriesen, wensong, heiko.carstens, linux-kernel, ak,
	netdev, paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds,
	zlynx, rpjday, schwidefsky, davem

Segher Boessenkool wrote:
>>> The only safe way to get atomic accesses is to write
>>> assembler code.  Are there any downsides to that?  I don't
>>> see any.
>>
>> The assumption that aligned word reads and writes are atomic, and that 
>> words are aligned unless explicitly packed otherwise, is endemic in 
>> the kernel.  No sane compiler violates this assumption.  It's true 
>> that we're not portable to insane compilers after this patch, but we 
>> never were in the first place.
> 
> You didn't answer my question: are there any downsides to using
> explicit coded-in-assembler accesses for atomic accesses?  You
> can handwave all you want that it should "just work" with
> volatile accesses, but volatility != atomicity, volatile in C
> is really badly defined, GCC never officially gave stronger
> guarantees, and we have a bugzilla full of PRs to show what a
> minefield it is.
> 
> So, why not use the well-defined alternative?

Because we don't need to, and it hurts performance.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 19:05               ` Chris Snook
@ 2007-08-09 19:19                 ` Segher Boessenkool
  2007-08-09 19:25                 ` Geert Uytterhoeven
  1 sibling, 0 replies; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-09 19:19 UTC (permalink / raw)
  To: Chris Snook
  Cc: wjiang, schwidefsky, wensong, heiko.carstens, linux-kernel, ak,
	cfriesen, netdev, paulmck, horms, akpm, linux-arch, jesper.juhl,
	davem, torvalds, zlynx, rpjday

>>>> The only safe way to get atomic accesses is to write
>>>> assembler code.  Are there any downsides to that?  I don't
>>>> see any.
>>>
>>> The assumption that aligned word reads and writes are atomic, and 
>>> that words are aligned unless explicitly packed otherwise, is 
>>> endemic in the kernel.  No sane compiler violates this assumption.  
>>> It's true that we're not portable to insane compilers after this 
>>> patch, but we never were in the first place.
>> You didn't answer my question: are there any downsides to using
>> explicit coded-in-assembler accesses for atomic accesses?  You
>> can handwave all you want that it should "just work" with
>> volatile accesses, but volatility != atomicity, volatile in C
>> is really badly defined, GCC never officially gave stronger
>> guarantees, and we have a bugzilla full of PRs to show what a
>> minefield it is.
>> So, why not use the well-defined alternative?
>
> Because we don't need to,

You don't need to use volatile objects, or accesses through
valatile-cast pointers, either.

> and it hurts performance.

Please show how it does this -- one load is one load either way,
and one store is one store.


Segher


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 19:05               ` Chris Snook
  2007-08-09 19:19                 ` Segher Boessenkool
@ 2007-08-09 19:25                 ` Geert Uytterhoeven
  2007-08-09 19:47                   ` Chris Snook
  1 sibling, 1 reply; 36+ messages in thread
From: Geert Uytterhoeven @ 2007-08-09 19:25 UTC (permalink / raw)
  To: Chris Snook
  Cc: Segher Boessenkool, wjiang, cfriesen, wensong, heiko.carstens,
	linux-kernel, ak, netdev, paulmck, horms, akpm, linux-arch,
	jesper.juhl, torvalds, zlynx, rpjday, schwidefsky, davem

On Thu, 9 Aug 2007, Chris Snook wrote:
> Segher Boessenkool wrote:
> > > > The only safe way to get atomic accesses is to write
> > > > assembler code.  Are there any downsides to that?  I don't
> > > > see any.
> > > 
> > > The assumption that aligned word reads and writes are atomic, and that
> > > words are aligned unless explicitly packed otherwise, is endemic in the
> > > kernel.  No sane compiler violates this assumption.  It's true that we're
> > > not portable to insane compilers after this patch, but we never were in
> > > the first place.
> > 
> > You didn't answer my question: are there any downsides to using
> > explicit coded-in-assembler accesses for atomic accesses?  You
> > can handwave all you want that it should "just work" with
> > volatile accesses, but volatility != atomicity, volatile in C
> > is really badly defined, GCC never officially gave stronger
> > guarantees, and we have a bugzilla full of PRs to show what a
> > minefield it is.
> > 
> > So, why not use the well-defined alternative?
> 
> Because we don't need to, and it hurts performance.

It hurts performance by implementing 32-bit atomic reads in assembler?

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 19:25                 ` Geert Uytterhoeven
@ 2007-08-09 19:47                   ` Chris Snook
  2007-08-09 23:02                     ` Segher Boessenkool
  0 siblings, 1 reply; 36+ messages in thread
From: Chris Snook @ 2007-08-09 19:47 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Segher Boessenkool, wjiang, cfriesen, wensong, heiko.carstens,
	linux-kernel, ak, netdev, paulmck, horms, akpm, linux-arch,
	jesper.juhl, torvalds, zlynx, rpjday, schwidefsky, davem

Geert Uytterhoeven wrote:
> On Thu, 9 Aug 2007, Chris Snook wrote:
>> Segher Boessenkool wrote:
>>>>> The only safe way to get atomic accesses is to write
>>>>> assembler code.  Are there any downsides to that?  I don't
>>>>> see any.
>>>> The assumption that aligned word reads and writes are atomic, and that
>>>> words are aligned unless explicitly packed otherwise, is endemic in the
>>>> kernel.  No sane compiler violates this assumption.  It's true that we're
>>>> not portable to insane compilers after this patch, but we never were in
>>>> the first place.
>>> You didn't answer my question: are there any downsides to using
>>> explicit coded-in-assembler accesses for atomic accesses?  You
>>> can handwave all you want that it should "just work" with
>>> volatile accesses, but volatility != atomicity, volatile in C
>>> is really badly defined, GCC never officially gave stronger
>>> guarantees, and we have a bugzilla full of PRs to show what a
>>> minefield it is.
>>>
>>> So, why not use the well-defined alternative?
>> Because we don't need to, and it hurts performance.
> 
> It hurts performance by implementing 32-bit atomic reads in assembler?

No, I misunderstood the question.  Implementing 32-bit atomic reads in assembler 
is redundant, because any sane compiler, *particularly* and optimizing compiler 
(and we're only in this mess because of optimizing compilers) will give us that 
automatically without the assembler.  Yes, it is legal for a compiler to violate 
this assumption.  It is also legal for us to refuse to maintain compatibility 
with compilers that suck this badly.  That decision was made a very long time 
ago, and I consider it the correct decision.

	-- Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 19:47                   ` Chris Snook
@ 2007-08-09 23:02                     ` Segher Boessenkool
  0 siblings, 0 replies; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-09 23:02 UTC (permalink / raw)
  To: Chris Snook
  Cc: paulmck, heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
	cfriesen, akpm, torvalds, jesper.juhl, Geert Uytterhoeven,
	linux-arch, zlynx, schwidefsky, davem, wensong, wjiang

>>>> So, why not use the well-defined alternative?
>>> Because we don't need to, and it hurts performance.
>> It hurts performance by implementing 32-bit atomic reads in assembler?
>
> No, I misunderstood the question.  Implementing 32-bit atomic reads in 
> assembler is redundant, because any sane compiler, *particularly* and 
> optimizing compiler (and we're only in this mess because of optimizing 
> compilers)

Oh please, don't tell me you don't want an optimising compiler.
And if you _do_ want one, well you're in this mess because you
chose C as implementation language and C has some pretty strange
rules.  Trying to use not-all-that-well-defined-and-completely-
misunderstood features of the language doesn't make things easier;
trying to use something that isn't even part of the language and
that your particular compiler originally supported by accident,
and that isn't yet an officially supported feature, and that on
top of it all has a track record of problems -- well it makes me
wonder if you're in this game for fun or what.

>  will give us that automatically without the assembler.

No, it does *not* give it to you automatically; you have to do
either the asm() thing, or the not-defined-at-all *(volatile *)&
thing.

> Yes, it is legal for a compiler to violate this assumption.  It is 
> also legal for us to refuse to maintain compatibility with compilers 
> that suck this badly.

So that's rm include/linux/compiler-gcc*.h then.  Good luck with
the intel compiler, maybe it works more to your liking.

Segher

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 15:24       ` Chris Snook
  2007-08-09 15:50         ` Segher Boessenkool
@ 2007-08-09 16:10         ` Paul E. McKenney
  2007-08-09 16:36           ` Chris Snook
  2007-08-10  8:21           ` Herbert Xu
  1 sibling, 2 replies; 36+ messages in thread
From: Paul E. McKenney @ 2007-08-09 16:10 UTC (permalink / raw)
  To: Chris Snook
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Thu, Aug 09, 2007 at 11:24:22AM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >On Thu, Aug 09, 2007 at 10:53:14AM -0400, Chris Snook wrote:
> >>Paul E. McKenney wrote:
> >>>Why not the same access-once semantics for atomic_set() as
> >>>for atomic_read()?  As this patch stands, it might introduce
> >>>architecture-specific compiler-induced bugs due to the fact that
> >>>atomic_set() used to imply volatile behavior but no longer does.
> >>When we make the volatile cast in atomic_read(), we're casting an rvalue 
> >>to volatile.  This unambiguously tells the compiler that we want to 
> >>re-load that register from memory.  What's "volatile behavior" for an 
> >>lvalue?
> >
> >I was absolutely -not- suggesting volatile behavior for lvalues.
> >
> >Instead, I am asking for volatile behavior from an -rvalue-.  In the
> >case of atomic_read(), it is the atomic_t being read from.  In the case
> >of atomic_set(), it is the atomic_t being written to.  As suggested in
> >my previous email:
> >
> >#define atomic_set(v,i)		((*(volatile int *)&(v)->counter) = 
> >(i))
> >#define atomic64_set(v,i)	((*(volatile long *)&(v)->counter) = (i))
> 
> That looks like a volatile lvalue to me.  I confess I didn't exactly ace 
> compilers.  Care to explain this?

OK, so I am dylexic this morning.  Never could tell left from right...

This is indeed an lvalue, as is any non-function expression that you
can apply the "&" prefix operator to.  Including the expression in
your proposed definitions of atomic_set() and atomic_set64().

An lvalue is any expression that -could- appear on the left-hand side
of an assignment operator, regardless of where it actually appears.
More precisely, an lvalue is an expression that refers to a variable in
such a way that the variable might be both loaded from and stored to,
ignoring things like "const" for the moment.

Because "(v)->counter" could be either loaded from or stored to, it
is an lvalue, which is why the compiler didn't scream bloody murder
at you when you took the address of it.

> >Again, the architectures that used to have their "counter" declared
> >as volatile will lose volatile semantics on atomic_set() with your
> >patch, which might result in bugs due to overly imaginative compiler
> >optimizations.  The above would prevent any such bugs from appearing.
> >
> >>                                                                      A 
> >>write to an lvalue already implies an eventual write to memory, so this 
> >>would be a no-op. Maybe you'll write to the register a few times before 
> >>flushing it to memory, but it will happen eventually.  With an rvalue, 
> >>there's no guarantee that it will *ever* load from memory, which is what 
> >>volatile fixes.
> >>
> >>I think what you have in mind is LOCK_PREFIX behavior, which is not the 
> >>purpose of atomic_set.  We use LOCK_PREFIX in the inline assembly for the 
> >>atomic_* operations that read, modify, and write a value, only because it 
> >>is necessary to perform that entire transaction atomically.
> >
> >No LOCK_PREFIX, thank you!!!  I just want to make sure that the compiler
> >doesn't push the store down out of a loop, split the store, allow the
> >store to happen twice (e.g., to allow different code paths to be merged),
> >and all the other tricks that the C standard permits compilers to pull.
> 
> We can't have split stores because we don't use atomic64_t on 32-bit 
> architectures.  volatile won't save you from pushing stores out of loops 
> unless you're also doing reads.  This is why we use reads to flush writes 
> to mmio registers.  In this case, an atomic_read() with volatile in it will 
> suffice. Storing twice is perfectly legal here, though it's unlikely that 
> an optimizing compiler smart enough to create the problem we're addressing 
> would ever do that.

The compiler is within its rights to read a 32-bit quantity 16 bits at
at time, even on a 32-bit machine.  I would be glad to help pummel any
compiler writer that pulls such a dirty trick, but the C standard really
does permit this.

Use of volatile does in fact save you from the compiler pushing stores out
of loops regardless of whether you are also doing reads.  The C standard
has the notion of sequence points, which occur at various places including
the ends of statements and the control expressions for "if" and "while"
statements.  The compiler is not permitted to move volatile references
across a sequence point.  Therefore, the compiler is not allowed to
push a volatile store out of a loop.  Now the CPU might well do such a
reordering, but that is a separate issue to be dealt with via memory
barriers.  Note that it is the CPU and I/O system, not the compiler,
that is forcing you to use reads to flush writes to MMIO registers.

And you would be amazed at what compiler writers will do in order to
get an additional fraction of a percent out of SpecCPU...

In short, please retain atomic_set()'s volatility, especially on those
architectures that declared the atomic_t's counter to be volatile.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 16:10         ` Paul E. McKenney
@ 2007-08-09 16:36           ` Chris Snook
  2007-08-09 16:58             ` Paul E. McKenney
  2007-08-09 18:51             ` Segher Boessenkool
  2007-08-10  8:21           ` Herbert Xu
  1 sibling, 2 replies; 36+ messages in thread
From: Chris Snook @ 2007-08-09 16:36 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Paul E. McKenney wrote:
> The compiler is within its rights to read a 32-bit quantity 16 bits at
> at time, even on a 32-bit machine.  I would be glad to help pummel any
> compiler writer that pulls such a dirty trick, but the C standard really
> does permit this.

Yes, but we don't write code for these compilers.  There are countless pieces of 
kernel code which would break in this condition, and there doesn't seem to be 
any interest in fixing this.

> Use of volatile does in fact save you from the compiler pushing stores out
> of loops regardless of whether you are also doing reads.  The C standard
> has the notion of sequence points, which occur at various places including
> the ends of statements and the control expressions for "if" and "while"
> statements.  The compiler is not permitted to move volatile references
> across a sequence point.  Therefore, the compiler is not allowed to
> push a volatile store out of a loop.  Now the CPU might well do such a
> reordering, but that is a separate issue to be dealt with via memory
> barriers.  Note that it is the CPU and I/O system, not the compiler,
> that is forcing you to use reads to flush writes to MMIO registers.

Sequence points enforce read-after-write ordering, not write-after-write.  We 
flush writes with reads for MMIO because of this effect as well as the CPU/bus 
effects.

> And you would be amazed at what compiler writers will do in order to
> get an additional fraction of a percent out of SpecCPU...

Probably not :)

> In short, please retain atomic_set()'s volatility, especially on those
> architectures that declared the atomic_t's counter to be volatile.

Like i386 and x86_64?  These used to have volatile in the atomic_t declaration. 
  We removed it, and the sky did not fall.

	-- Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 16:36           ` Chris Snook
@ 2007-08-09 16:58             ` Paul E. McKenney
  2007-08-09 17:14               ` Chris Snook
  2007-08-09 18:51             ` Segher Boessenkool
  1 sibling, 1 reply; 36+ messages in thread
From: Paul E. McKenney @ 2007-08-09 16:58 UTC (permalink / raw)
  To: Chris Snook
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Thu, Aug 09, 2007 at 12:36:17PM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >The compiler is within its rights to read a 32-bit quantity 16 bits at
> >at time, even on a 32-bit machine.  I would be glad to help pummel any
> >compiler writer that pulls such a dirty trick, but the C standard really
> >does permit this.
> 
> Yes, but we don't write code for these compilers.  There are countless 
> pieces of kernel code which would break in this condition, and there 
> doesn't seem to be any interest in fixing this.
> 
> >Use of volatile does in fact save you from the compiler pushing stores out
> >of loops regardless of whether you are also doing reads.  The C standard
> >has the notion of sequence points, which occur at various places including
> >the ends of statements and the control expressions for "if" and "while"
> >statements.  The compiler is not permitted to move volatile references
> >across a sequence point.  Therefore, the compiler is not allowed to
> >push a volatile store out of a loop.  Now the CPU might well do such a
> >reordering, but that is a separate issue to be dealt with via memory
> >barriers.  Note that it is the CPU and I/O system, not the compiler,
> >that is forcing you to use reads to flush writes to MMIO registers.
> 
> Sequence points enforce read-after-write ordering, not write-after-write.  
> We flush writes with reads for MMIO because of this effect as well as the 
> CPU/bus effects.

Neither volatile reads nor volatile writes may be moved across sequence
points.

> >And you would be amazed at what compiler writers will do in order to
> >get an additional fraction of a percent out of SpecCPU...
> 
> Probably not :)
> 
> >In short, please retain atomic_set()'s volatility, especially on those
> >architectures that declared the atomic_t's counter to be volatile.
> 
> Like i386 and x86_64?  These used to have volatile in the atomic_t 
> declaration. We removed it, and the sky did not fall.

Interesting.  You tested all possible configs on all possible hardware?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 16:58             ` Paul E. McKenney
@ 2007-08-09 17:14               ` Chris Snook
  2007-08-09 17:41                 ` Paul E. McKenney
  2007-08-09 19:17                 ` Segher Boessenkool
  0 siblings, 2 replies; 36+ messages in thread
From: Chris Snook @ 2007-08-09 17:14 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Paul E. McKenney wrote:
> On Thu, Aug 09, 2007 at 12:36:17PM -0400, Chris Snook wrote:
>> Paul E. McKenney wrote:
>>> The compiler is within its rights to read a 32-bit quantity 16 bits at
>>> at time, even on a 32-bit machine.  I would be glad to help pummel any
>>> compiler writer that pulls such a dirty trick, but the C standard really
>>> does permit this.
>> Yes, but we don't write code for these compilers.  There are countless 
>> pieces of kernel code which would break in this condition, and there 
>> doesn't seem to be any interest in fixing this.
>>
>>> Use of volatile does in fact save you from the compiler pushing stores out
>>> of loops regardless of whether you are also doing reads.  The C standard
>>> has the notion of sequence points, which occur at various places including
>>> the ends of statements and the control expressions for "if" and "while"
>>> statements.  The compiler is not permitted to move volatile references
>>> across a sequence point.  Therefore, the compiler is not allowed to
>>> push a volatile store out of a loop.  Now the CPU might well do such a
>>> reordering, but that is a separate issue to be dealt with via memory
>>> barriers.  Note that it is the CPU and I/O system, not the compiler,
>>> that is forcing you to use reads to flush writes to MMIO registers.
>> Sequence points enforce read-after-write ordering, not write-after-write.  
>> We flush writes with reads for MMIO because of this effect as well as the 
>> CPU/bus effects.
> 
> Neither volatile reads nor volatile writes may be moved across sequence
> points.

By the compiler, or by the CPU?  If you're depending on volatile writes being 
visible to other CPUs, you're screwed either way, because the CPU can hold that 
data in cache as long as it wants before it writes it to memory.  When this 
finally does happen, it will happen atomically, which is all that atomic_set 
guarantees.  If you need to guarantee that the value is written to memory at a 
particular time in your execution sequence, you either have to read it from 
memory to force the compiler to store it first (and a volatile cast in 
atomic_read will suffice for this) or you have to use LOCK_PREFIX instructions 
which will invalidate remote cache lines containing the same variable.  This 
patch doesn't change either of these cases.

>>> And you would be amazed at what compiler writers will do in order to
>>> get an additional fraction of a percent out of SpecCPU...
>> Probably not :)
>>
>>> In short, please retain atomic_set()'s volatility, especially on those
>>> architectures that declared the atomic_t's counter to be volatile.
>> Like i386 and x86_64?  These used to have volatile in the atomic_t 
>> declaration. We removed it, and the sky did not fall.
> 
> Interesting.  You tested all possible configs on all possible hardware?

No, but I can reason about it and be confident that this won't break anything 
that isn't already broken.  At worst, this patch will make any existing subtly 
incorrect uses of atomic_t much more obvious and easier to track down.

	-- Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 17:14               ` Chris Snook
@ 2007-08-09 17:41                 ` Paul E. McKenney
  2007-08-09 18:13                   ` Chris Snook
  2007-08-09 19:17                 ` Segher Boessenkool
  1 sibling, 1 reply; 36+ messages in thread
From: Paul E. McKenney @ 2007-08-09 17:41 UTC (permalink / raw)
  To: Chris Snook
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >On Thu, Aug 09, 2007 at 12:36:17PM -0400, Chris Snook wrote:
> >>Paul E. McKenney wrote:
> >>>The compiler is within its rights to read a 32-bit quantity 16 bits at
> >>>at time, even on a 32-bit machine.  I would be glad to help pummel any
> >>>compiler writer that pulls such a dirty trick, but the C standard really
> >>>does permit this.
> >>Yes, but we don't write code for these compilers.  There are countless 
> >>pieces of kernel code which would break in this condition, and there 
> >>doesn't seem to be any interest in fixing this.
> >>
> >>>Use of volatile does in fact save you from the compiler pushing stores 
> >>>out
> >>>of loops regardless of whether you are also doing reads.  The C standard
> >>>has the notion of sequence points, which occur at various places 
> >>>including
> >>>the ends of statements and the control expressions for "if" and "while"
> >>>statements.  The compiler is not permitted to move volatile references
> >>>across a sequence point.  Therefore, the compiler is not allowed to
> >>>push a volatile store out of a loop.  Now the CPU might well do such a
> >>>reordering, but that is a separate issue to be dealt with via memory
> >>>barriers.  Note that it is the CPU and I/O system, not the compiler,
> >>>that is forcing you to use reads to flush writes to MMIO registers.
> >>Sequence points enforce read-after-write ordering, not write-after-write. 
> >>We flush writes with reads for MMIO because of this effect as well as the 
> >>CPU/bus effects.
> >
> >Neither volatile reads nor volatile writes may be moved across sequence
> >points.
> 
> By the compiler, or by the CPU?

As mentioned in earlier emails, by the compiler.  The CPU knows nothing
of C sequence points.

>                                 If you're depending on volatile writes 
> being visible to other CPUs, you're screwed either way, because the CPU can 
> hold that data in cache as long as it wants before it writes it to memory.  
> When this finally does happen, it will happen atomically, which is all that 
> atomic_set guarantees.  If you need to guarantee that the value is written 
> to memory at a particular time in your execution sequence, you either have 
> to read it from memory to force the compiler to store it first (and a 
> volatile cast in atomic_read will suffice for this) or you have to use 
> LOCK_PREFIX instructions which will invalidate remote cache lines 
> containing the same variable.  This patch doesn't change either of these 
> cases.

The case that it -can- change is interactions with interrupt handlers.
And NMI/SMI handlers, for that matter.

> >>>And you would be amazed at what compiler writers will do in order to
> >>>get an additional fraction of a percent out of SpecCPU...
> >>Probably not :)
> >>
> >>>In short, please retain atomic_set()'s volatility, especially on those
> >>>architectures that declared the atomic_t's counter to be volatile.
> >>Like i386 and x86_64?  These used to have volatile in the atomic_t 
> >>declaration. We removed it, and the sky did not fall.
> >
> >Interesting.  You tested all possible configs on all possible hardware?
> 
> No, but I can reason about it and be confident that this won't break 
> anything that isn't already broken.  At worst, this patch will make any 
> existing subtly incorrect uses of atomic_t much more obvious and easier to 
> track down.

You took interrupt and NMI/SMI handlers into account?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 17:41                 ` Paul E. McKenney
@ 2007-08-09 18:13                   ` Chris Snook
  2007-08-09 18:45                     ` Paul E. McKenney
  0 siblings, 1 reply; 36+ messages in thread
From: Chris Snook @ 2007-08-09 18:13 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Paul E. McKenney wrote:
> On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
>>                                 If you're depending on volatile writes 
>> being visible to other CPUs, you're screwed either way, because the CPU can 
>> hold that data in cache as long as it wants before it writes it to memory.  
>> When this finally does happen, it will happen atomically, which is all that 
>> atomic_set guarantees.  If you need to guarantee that the value is written 
>> to memory at a particular time in your execution sequence, you either have 
>> to read it from memory to force the compiler to store it first (and a 
>> volatile cast in atomic_read will suffice for this) or you have to use 
>> LOCK_PREFIX instructions which will invalidate remote cache lines 
>> containing the same variable.  This patch doesn't change either of these 
>> cases.
> 
> The case that it -can- change is interactions with interrupt handlers.
> And NMI/SMI handlers, for that matter.

You have a point here, but only if you can guarantee that the interrupt handler 
is running on a processor sharing the cache that has the not-yet-written 
volatile value.  That implies a strictly non-SMP architecture.  At the moment, 
none of those have volatile in their declaration of atomic_t, so this patch 
can't break any of them.

	-- Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 18:13                   ` Chris Snook
@ 2007-08-09 18:45                     ` Paul E. McKenney
  2007-08-09 19:24                       ` Chris Snook
  0 siblings, 1 reply; 36+ messages in thread
From: Paul E. McKenney @ 2007-08-09 18:45 UTC (permalink / raw)
  To: Chris Snook
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
> >>                                If you're depending on volatile writes 
> >>being visible to other CPUs, you're screwed either way, because the CPU 
> >>can hold that data in cache as long as it wants before it writes it to 
> >>memory.  When this finally does happen, it will happen atomically, which 
> >>is all that atomic_set guarantees.  If you need to guarantee that the 
> >>value is written to memory at a particular time in your execution 
> >>sequence, you either have to read it from memory to force the compiler to 
> >>store it first (and a volatile cast in atomic_read will suffice for this) 
> >>or you have to use LOCK_PREFIX instructions which will invalidate remote 
> >>cache lines containing the same variable.  This patch doesn't change 
> >>either of these cases.
> >
> >The case that it -can- change is interactions with interrupt handlers.
> >And NMI/SMI handlers, for that matter.
> 
> You have a point here, but only if you can guarantee that the interrupt 
> handler is running on a processor sharing the cache that has the 
> not-yet-written volatile value.  That implies a strictly non-SMP 
> architecture.  At the moment, none of those have volatile in their 
> declaration of atomic_t, so this patch can't break any of them.

This can also happen when using per-CPU variables.  And there are a
number of per-CPU variables that are either atomic themselves or are
structures containing atomic fields.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 18:45                     ` Paul E. McKenney
@ 2007-08-09 19:24                       ` Chris Snook
  2007-08-10  1:28                         ` Paul E. McKenney
  0 siblings, 1 reply; 36+ messages in thread
From: Chris Snook @ 2007-08-09 19:24 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Paul E. McKenney wrote:
> On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote:
>> Paul E. McKenney wrote:
>>> On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
>>>>                                If you're depending on volatile writes 
>>>> being visible to other CPUs, you're screwed either way, because the CPU 
>>>> can hold that data in cache as long as it wants before it writes it to 
>>>> memory.  When this finally does happen, it will happen atomically, which 
>>>> is all that atomic_set guarantees.  If you need to guarantee that the 
>>>> value is written to memory at a particular time in your execution 
>>>> sequence, you either have to read it from memory to force the compiler to 
>>>> store it first (and a volatile cast in atomic_read will suffice for this) 
>>>> or you have to use LOCK_PREFIX instructions which will invalidate remote 
>>>> cache lines containing the same variable.  This patch doesn't change 
>>>> either of these cases.
>>> The case that it -can- change is interactions with interrupt handlers.
>>> And NMI/SMI handlers, for that matter.
>> You have a point here, but only if you can guarantee that the interrupt 
>> handler is running on a processor sharing the cache that has the 
>> not-yet-written volatile value.  That implies a strictly non-SMP 
>> architecture.  At the moment, none of those have volatile in their 
>> declaration of atomic_t, so this patch can't break any of them.
> 
> This can also happen when using per-CPU variables.  And there are a
> number of per-CPU variables that are either atomic themselves or are
> structures containing atomic fields.

Accessing per-CPU variables in this fashion reliably already requires a suitable 
smp/non-smp read/write memory barrier.  I maintain that if we break anything 
with this change, it was really already broken, if less obviously.  Can you give 
a real or synthetic example of legitimate code that could break?

	-- Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 19:24                       ` Chris Snook
@ 2007-08-10  1:28                         ` Paul E. McKenney
  2007-08-10 19:49                           ` Chris Snook
  0 siblings, 1 reply; 36+ messages in thread
From: Paul E. McKenney @ 2007-08-10  1:28 UTC (permalink / raw)
  To: Chris Snook
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Thu, Aug 09, 2007 at 03:24:40PM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote:
> >>Paul E. McKenney wrote:
> >>>On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
> >>>>                               If you're depending on volatile writes 
> >>>>being visible to other CPUs, you're screwed either way, because the CPU 
> >>>>can hold that data in cache as long as it wants before it writes it to 
> >>>>memory.  When this finally does happen, it will happen atomically, 
> >>>>which is all that atomic_set guarantees.  If you need to guarantee that 
> >>>>the value is written to memory at a particular time in your execution 
> >>>>sequence, you either have to read it from memory to force the compiler 
> >>>>to store it first (and a volatile cast in atomic_read will suffice for 
> >>>>this) or you have to use LOCK_PREFIX instructions which will invalidate 
> >>>>remote cache lines containing the same variable.  This patch doesn't 
> >>>>change either of these cases.
> >>>The case that it -can- change is interactions with interrupt handlers.
> >>>And NMI/SMI handlers, for that matter.
> >>You have a point here, but only if you can guarantee that the interrupt 
> >>handler is running on a processor sharing the cache that has the 
> >>not-yet-written volatile value.  That implies a strictly non-SMP 
> >>architecture.  At the moment, none of those have volatile in their 
> >>declaration of atomic_t, so this patch can't break any of them.
> >
> >This can also happen when using per-CPU variables.  And there are a
> >number of per-CPU variables that are either atomic themselves or are
> >structures containing atomic fields.
> 
> Accessing per-CPU variables in this fashion reliably already requires a 
> suitable smp/non-smp read/write memory barrier.  I maintain that if we 
> break anything with this change, it was really already broken, if less 
> obviously.  Can you give a real or synthetic example of legitimate code 
> that could break?

My main concern is actually the lack of symmetry -- I would expect
that an atomic_set() would have the same properties as atomic_read().
It is easy and cheap to provide them with similar properties, so why not?
Debugging even a single problem would consume far more time than simply
giving them corresponding semantics.

But you asked for examples.  These are synthetic, and of course legitimacy
is in the eye of the beholder.

1.  Watchdog variable.

	atomic_t watchdog = ATOMIC_INIT(0);

	...

	int i;
	while (!done) {

		/* Do so stuff that doesn't take more than a few us. */
		/* Could do atomic increment, but throughput penalty. */

		i++;
		atomic_set(&watchdog, i);
	}
	do_something_with(&watchdog);


	/* Every so often on some other CPU... */

	if ((new_watchdog = atomic_read(&watchdog)) == old_watchdog)
		die_horribly();
	old_watchdog = new_watchdog;


	If atomic_set() did not have volatile semantics, the compiler
	would be within its rights optimizing it to simply get the
	final value of "i" after exit from the loop.  This would cause
	the watchdog check to fail spuriously.  Memory barriers are
	not required in this case, because the CPU cannot hang onto
	the value for very long -- we don't care about the exact value,
	or about exact synchronization, but rather about whether or
	not the value is changing.

	In this (toy) example, one might replace the atomic_set() with
	an atomic increment (though that might be too expensive in some
	cases) or with something like:

		atomic_set(&watchdog, atomic_read(&watchdog) + 1);

	However, other cases might not permit this transformation,
	for example, an existing heavily used API might take int rather
	than atomic_t.

	Some will no doubt argue that this example should use a
	macro or an asm similar to the "forget()" asm put forward
	elsewhere in this thread.

2.  Communicating both with interrupt handler and with other CPUs.
    For example, data elements that are built up in a location visible
    to interrupts and NMIs, and then added as a unit to a data structure
    visible to other CPUs.  This more-realistic example is abbreviated
    to the point of pointlessness as follows:

	struct foo {
		atomic_t a;
		atomic_t b;
	};

	DEFINE_PER_CPU(struct foo *, staging) = NULL;

	/* Create element in staging area. */

	__get_cpu_var(staging) = kzalloc(sizeof(*p), GFP_WHATEVER);
	if (__get_cpu_var(staging) == NULL)
		die_horribly();
	/* allocate an element of some per-CPU array, get the result in "i" */
	atomic_set(__get_cpu_var(staging).a, i);
	/* allocate another element of a per-CPU array, with result in "i" */
	atomic_set(__get_cpu_var(staging).b, i);
	rcu_assign_pointer(some_global_place, __get_cpu_var(staging));

	If atomic_set() didn't have volatile semantics, then an interrupt
	or NMI handler could see the atomic_set() to .a and .b out of
	order due to compiler optimizations.

Remember, you -did- ask for these!!!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-10  1:28                         ` Paul E. McKenney
@ 2007-08-10 19:49                           ` Chris Snook
  2007-08-10 20:26                             ` Paul E. McKenney
  0 siblings, 1 reply; 36+ messages in thread
From: Chris Snook @ 2007-08-10 19:49 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Paul E. McKenney wrote:
> On Thu, Aug 09, 2007 at 03:24:40PM -0400, Chris Snook wrote:
>> Paul E. McKenney wrote:
>>> On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote:
>>>> Paul E. McKenney wrote:
>>>>> On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
>>>>>>                               If you're depending on volatile writes 
>>>>>> being visible to other CPUs, you're screwed either way, because the CPU 
>>>>>> can hold that data in cache as long as it wants before it writes it to 
>>>>>> memory.  When this finally does happen, it will happen atomically, 
>>>>>> which is all that atomic_set guarantees.  If you need to guarantee that 
>>>>>> the value is written to memory at a particular time in your execution 
>>>>>> sequence, you either have to read it from memory to force the compiler 
>>>>>> to store it first (and a volatile cast in atomic_read will suffice for 
>>>>>> this) or you have to use LOCK_PREFIX instructions which will invalidate 
>>>>>> remote cache lines containing the same variable.  This patch doesn't 
>>>>>> change either of these cases.
>>>>> The case that it -can- change is interactions with interrupt handlers.
>>>>> And NMI/SMI handlers, for that matter.
>>>> You have a point here, but only if you can guarantee that the interrupt 
>>>> handler is running on a processor sharing the cache that has the 
>>>> not-yet-written volatile value.  That implies a strictly non-SMP 
>>>> architecture.  At the moment, none of those have volatile in their 
>>>> declaration of atomic_t, so this patch can't break any of them.
>>> This can also happen when using per-CPU variables.  And there are a
>>> number of per-CPU variables that are either atomic themselves or are
>>> structures containing atomic fields.
>> Accessing per-CPU variables in this fashion reliably already requires a 
>> suitable smp/non-smp read/write memory barrier.  I maintain that if we 
>> break anything with this change, it was really already broken, if less 
>> obviously.  Can you give a real or synthetic example of legitimate code 
>> that could break?
> 
> My main concern is actually the lack of symmetry -- I would expect
> that an atomic_set() would have the same properties as atomic_read().
> It is easy and cheap to provide them with similar properties, so why not?
> Debugging even a single problem would consume far more time than simply
> giving them corresponding semantics.
> 
> But you asked for examples.  These are synthetic, and of course legitimacy
> is in the eye of the beholder.
> 
> 1.  Watchdog variable.
> 
> 	atomic_t watchdog = ATOMIC_INIT(0);
> 
> 	...
> 
> 	int i;
> 	while (!done) {
> 
> 		/* Do so stuff that doesn't take more than a few us. */
> 		/* Could do atomic increment, but throughput penalty. */
> 
> 		i++;
> 		atomic_set(&watchdog, i);
> 	}
> 	do_something_with(&watchdog);
> 
> 
> 	/* Every so often on some other CPU... */
> 
> 	if ((new_watchdog = atomic_read(&watchdog)) == old_watchdog)
> 		die_horribly();
> 	old_watchdog = new_watchdog;
> 
> 
> 	If atomic_set() did not have volatile semantics, the compiler
> 	would be within its rights optimizing it to simply get the
> 	final value of "i" after exit from the loop.  This would cause
> 	the watchdog check to fail spuriously.  Memory barriers are
> 	not required in this case, because the CPU cannot hang onto
> 	the value for very long -- we don't care about the exact value,
> 	or about exact synchronization, but rather about whether or
> 	not the value is changing.
> 
> 	In this (toy) example, one might replace the atomic_set() with
> 	an atomic increment (though that might be too expensive in some
> 	cases) or with something like:
> 
> 		atomic_set(&watchdog, atomic_read(&watchdog) + 1);
> 
> 	However, other cases might not permit this transformation,
> 	for example, an existing heavily used API might take int rather
> 	than atomic_t.
> 
> 	Some will no doubt argue that this example should use a
> 	macro or an asm similar to the "forget()" asm put forward
> 	elsewhere in this thread.
> 
> 2.  Communicating both with interrupt handler and with other CPUs.
>     For example, data elements that are built up in a location visible
>     to interrupts and NMIs, and then added as a unit to a data structure
>     visible to other CPUs.  This more-realistic example is abbreviated
>     to the point of pointlessness as follows:
> 
> 	struct foo {
> 		atomic_t a;
> 		atomic_t b;
> 	};
> 
> 	DEFINE_PER_CPU(struct foo *, staging) = NULL;
> 
> 	/* Create element in staging area. */
> 
> 	__get_cpu_var(staging) = kzalloc(sizeof(*p), GFP_WHATEVER);
> 	if (__get_cpu_var(staging) == NULL)
> 		die_horribly();
> 	/* allocate an element of some per-CPU array, get the result in "i" */
> 	atomic_set(__get_cpu_var(staging).a, i);
> 	/* allocate another element of a per-CPU array, with result in "i" */
> 	atomic_set(__get_cpu_var(staging).b, i);
> 	rcu_assign_pointer(some_global_place, __get_cpu_var(staging));
> 
> 	If atomic_set() didn't have volatile semantics, then an interrupt
> 	or NMI handler could see the atomic_set() to .a and .b out of
> 	order due to compiler optimizations.
> 
> Remember, you -did- ask for these!!!  ;-)

Ok, I'm convinced.  Part of the motivation here is to avoid heisenbugs, 
so if people expect volatile atomic_set behavior, I'm inclined to give 
it to them.  I don't really feel like indulging the compiler bug 
paranoiacs, but developer expectations are a legitimate motivation, and 
a major part of why I posted this in the first place.  I'll resubmit the 
patchset with a volatile cast in atomic_set.  Before I do, is there 
anything *else* that desperately needs such a cast?  As far as I can 
tell, all the other functions are implemented with __asm__ __volatile__, 
or with spinlocks that use that under the hood.

	-- Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-10 19:49                           ` Chris Snook
@ 2007-08-10 20:26                             ` Paul E. McKenney
  0 siblings, 0 replies; 36+ messages in thread
From: Paul E. McKenney @ 2007-08-10 20:26 UTC (permalink / raw)
  To: Chris Snook
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Fri, Aug 10, 2007 at 03:49:03PM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >On Thu, Aug 09, 2007 at 03:24:40PM -0400, Chris Snook wrote:
> >>Paul E. McKenney wrote:
> >>>On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote:
> >>>>Paul E. McKenney wrote:
> >>>>>On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
> >>>>>>                              If you're depending on volatile writes 
> >>>>>>being visible to other CPUs, you're screwed either way, because the 
> >>>>>>CPU can hold that data in cache as long as it wants before it writes 
> >>>>>>it to memory.  When this finally does happen, it will happen 
> >>>>>>atomically, which is all that atomic_set guarantees.  If you need to 
> >>>>>>guarantee that the value is written to memory at a particular time in 
> >>>>>>your execution sequence, you either have to read it from memory to 
> >>>>>>force the compiler to store it first (and a volatile cast in 
> >>>>>>atomic_read will suffice for this) or you have to use LOCK_PREFIX 
> >>>>>>instructions which will invalidate remote cache lines containing the 
> >>>>>>same variable.  This patch doesn't change either of these cases.
> >>>>>The case that it -can- change is interactions with interrupt handlers.
> >>>>>And NMI/SMI handlers, for that matter.
> >>>>You have a point here, but only if you can guarantee that the interrupt 
> >>>>handler is running on a processor sharing the cache that has the 
> >>>>not-yet-written volatile value.  That implies a strictly non-SMP 
> >>>>architecture.  At the moment, none of those have volatile in their 
> >>>>declaration of atomic_t, so this patch can't break any of them.
> >>>This can also happen when using per-CPU variables.  And there are a
> >>>number of per-CPU variables that are either atomic themselves or are
> >>>structures containing atomic fields.
> >>Accessing per-CPU variables in this fashion reliably already requires a 
> >>suitable smp/non-smp read/write memory barrier.  I maintain that if we 
> >>break anything with this change, it was really already broken, if less 
> >>obviously.  Can you give a real or synthetic example of legitimate code 
> >>that could break?
> >
> >My main concern is actually the lack of symmetry -- I would expect
> >that an atomic_set() would have the same properties as atomic_read().
> >It is easy and cheap to provide them with similar properties, so why not?
> >Debugging even a single problem would consume far more time than simply
> >giving them corresponding semantics.
> >
> >But you asked for examples.  These are synthetic, and of course legitimacy
> >is in the eye of the beholder.
> >
> >1.  Watchdog variable.
> >
> >	atomic_t watchdog = ATOMIC_INIT(0);
> >
> >	...
> >
> >	int i;
> >	while (!done) {
> >
> >		/* Do so stuff that doesn't take more than a few us. */
> >		/* Could do atomic increment, but throughput penalty. */
> >
> >		i++;
> >		atomic_set(&watchdog, i);
> >	}
> >	do_something_with(&watchdog);
> >
> >
> >	/* Every so often on some other CPU... */
> >
> >	if ((new_watchdog = atomic_read(&watchdog)) == old_watchdog)
> >		die_horribly();
> >	old_watchdog = new_watchdog;
> >
> >
> >	If atomic_set() did not have volatile semantics, the compiler
> >	would be within its rights optimizing it to simply get the
> >	final value of "i" after exit from the loop.  This would cause
> >	the watchdog check to fail spuriously.  Memory barriers are
> >	not required in this case, because the CPU cannot hang onto
> >	the value for very long -- we don't care about the exact value,
> >	or about exact synchronization, but rather about whether or
> >	not the value is changing.
> >
> >	In this (toy) example, one might replace the atomic_set() with
> >	an atomic increment (though that might be too expensive in some
> >	cases) or with something like:
> >
> >		atomic_set(&watchdog, atomic_read(&watchdog) + 1);
> >
> >	However, other cases might not permit this transformation,
> >	for example, an existing heavily used API might take int rather
> >	than atomic_t.
> >
> >	Some will no doubt argue that this example should use a
> >	macro or an asm similar to the "forget()" asm put forward
> >	elsewhere in this thread.
> >
> >2.  Communicating both with interrupt handler and with other CPUs.
> >    For example, data elements that are built up in a location visible
> >    to interrupts and NMIs, and then added as a unit to a data structure
> >    visible to other CPUs.  This more-realistic example is abbreviated
> >    to the point of pointlessness as follows:
> >
> >	struct foo {
> >		atomic_t a;
> >		atomic_t b;
> >	};
> >
> >	DEFINE_PER_CPU(struct foo *, staging) = NULL;
> >
> >	/* Create element in staging area. */
> >
> >	__get_cpu_var(staging) = kzalloc(sizeof(*p), GFP_WHATEVER);
> >	if (__get_cpu_var(staging) == NULL)
> >		die_horribly();
> >	/* allocate an element of some per-CPU array, get the result in "i" 
> >	*/
> >	atomic_set(__get_cpu_var(staging).a, i);
> >	/* allocate another element of a per-CPU array, with result in "i" */
> >	atomic_set(__get_cpu_var(staging).b, i);
> >	rcu_assign_pointer(some_global_place, __get_cpu_var(staging));
> >
> >	If atomic_set() didn't have volatile semantics, then an interrupt
> >	or NMI handler could see the atomic_set() to .a and .b out of
> >	order due to compiler optimizations.
> >
> >Remember, you -did- ask for these!!!  ;-)
> 
> Ok, I'm convinced.  Part of the motivation here is to avoid heisenbugs, 
> so if people expect volatile atomic_set behavior, I'm inclined to give 
> it to them.  I don't really feel like indulging the compiler bug 
> paranoiacs, but developer expectations are a legitimate motivation, and 
> a major part of why I posted this in the first place.  I'll resubmit the 
> patchset with a volatile cast in atomic_set.  Before I do, is there 
> anything *else* that desperately needs such a cast?  As far as I can 
> tell, all the other functions are implemented with __asm__ __volatile__, 
> or with spinlocks that use that under the hood.

Sounds good!!!

The only other API that I am aware of needing volatile semantics is
rcu_dereference(), but I already sent a patch in for it.  So as far
as I know, atomic_read() and atomic_set() should cover it.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 17:14               ` Chris Snook
  2007-08-09 17:41                 ` Paul E. McKenney
@ 2007-08-09 19:17                 ` Segher Boessenkool
  1 sibling, 0 replies; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-09 19:17 UTC (permalink / raw)
  To: Chris Snook
  Cc: wjiang, rpjday, wensong, heiko.carstens, linux-kernel, ak, netdev,
	paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds,
	schwidefsky, davem, cfriesen, zlynx

> If you need to guarantee that the value is written to memory at a 
> particular time in your execution sequence, you either have to read it 
> from memory to force the compiler to store it first

That isn't enough.  The CPU will happily read the datum back from
its own store queue before it ever hit memory or cache or any
external bus.  If you need the value to be globally visible before
another store, you use wmb(); if you need it before some read,
you use mb().

These concepts are completely separate from both atomicity and
volatility (which aren't the same themselves).

> No, but I can reason about it and be confident that this won't break 
> anything that isn't already broken.  At worst, this patch will make 
> any existing subtly incorrect uses of atomic_t much more obvious and 
> easier to track down.

PR22278, PR29753 -- both P2 bugs, and both about basic *(volatile *)
usage.  Yeah, it's definitely not problematic, esp. not if you want
to support older compilers than 4.2 too.</sarcasm>

Segher

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 16:36           ` Chris Snook
  2007-08-09 16:58             ` Paul E. McKenney
@ 2007-08-09 18:51             ` Segher Boessenkool
  2007-08-09 19:30               ` Chris Snook
  1 sibling, 1 reply; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-09 18:51 UTC (permalink / raw)
  To: Chris Snook
  Cc: wjiang, rpjday, wensong, heiko.carstens, linux-kernel, ak, netdev,
	paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds,
	schwidefsky, davem, cfriesen, zlynx

>> The compiler is within its rights to read a 32-bit quantity 16 bits at
>> at time, even on a 32-bit machine.  I would be glad to help pummel any
>> compiler writer that pulls such a dirty trick, but the C standard 
>> really
>> does permit this.
>
> Yes, but we don't write code for these compilers.  There are countless 
> pieces of kernel code which would break in this condition, and there 
> doesn't seem to be any interest in fixing this.

"Other things are broken too".  Great argument :-)

> Sequence points enforce read-after-write ordering, not 
> write-after-write.

Sequence points order *all* side effects; sequence points exist in the
domain of the abstract sequential model of the C language only.  The
compiler translates that to machine code that is equivalent to that C
code under the "as-if" rule; but this is still in that abstract model,
which doesn't include things such as SMP, visibility by I/O devices,
store queues, etc. etc.

> We flush writes with reads for MMIO because of this effect as well as 
> the CPU/bus effects.

You cannot flush all MMIO writes with reads; this is a PCI-specific
thing.  And even then, you need more than just the read itself: you
have to make sure the read completed and returned data.

>> In short, please retain atomic_set()'s volatility, especially on those
>> architectures that declared the atomic_t's counter to be volatile.
>
> Like i386 and x86_64?  These used to have volatile in the atomic_t 
> declaration.  We removed it, and the sky did not fall.

And this proves what?  Lots of stuff "works" by accident.

Segher

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 18:51             ` Segher Boessenkool
@ 2007-08-09 19:30               ` Chris Snook
  0 siblings, 0 replies; 36+ messages in thread
From: Chris Snook @ 2007-08-09 19:30 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: wjiang, rpjday, wensong, heiko.carstens, linux-kernel, ak, netdev,
	paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds,
	schwidefsky, davem, cfriesen, zlynx

Segher Boessenkool wrote:
>>> The compiler is within its rights to read a 32-bit quantity 16 bits at
>>> at time, even on a 32-bit machine.  I would be glad to help pummel any
>>> compiler writer that pulls such a dirty trick, but the C standard really
>>> does permit this.
>>
>> Yes, but we don't write code for these compilers.  There are countless 
>> pieces of kernel code which would break in this condition, and there 
>> doesn't seem to be any interest in fixing this.
> 
> "Other things are broken too".  Great argument :-)

We make plenty of practical assumptions in the kernel, and declare incorrect 
things which violate them, even in cases where there's no commandment from the 
heavens forbidding them.  Since the whole point of this exercise is to prevent 
badness with *optimizing* compilers, it's quite reasonable to declare broken any 
so-called optimizer which violates these trivial assumptions.

>>> In short, please retain atomic_set()'s volatility, especially on those
>>> architectures that declared the atomic_t's counter to be volatile.
>>
>> Like i386 and x86_64?  These used to have volatile in the atomic_t 
>> declaration.  We removed it, and the sky did not fall.
> 
> And this proves what?  Lots of stuff "works" by accident.

If something breaks because of this, it was already broken, but hidden a lot 
better.  I don't see much of a downside to exposing and fixing those bugs.

	-- Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-09 16:10         ` Paul E. McKenney
  2007-08-09 16:36           ` Chris Snook
@ 2007-08-10  8:21           ` Herbert Xu
  2007-08-10  9:08             ` Andi Kleen
  2007-08-10 20:07             ` Segher Boessenkool
  1 sibling, 2 replies; 36+ messages in thread
From: Herbert Xu @ 2007-08-10  8:21 UTC (permalink / raw)
  To: paulmck
  Cc: csnook, linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
>
> The compiler is within its rights to read a 32-bit quantity 16 bits at
> at time, even on a 32-bit machine.  I would be glad to help pummel any
> compiler writer that pulls such a dirty trick, but the C standard really
> does permit this.

Code all over the kernel assumes that 32-bit reads/writes
are atomic so while such a compiler might be legal it certainly
can't compile Linux.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-10  8:21           ` Herbert Xu
@ 2007-08-10  9:08             ` Andi Kleen
  2007-08-10 15:02               ` Paul E. McKenney
  2007-08-10 20:07             ` Segher Boessenkool
  1 sibling, 1 reply; 36+ messages in thread
From: Andi Kleen @ 2007-08-10  9:08 UTC (permalink / raw)
  To: Herbert Xu
  Cc: paulmck, csnook, linux-kernel, linux-arch, torvalds, netdev, akpm,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Friday 10 August 2007 10:21:46 Herbert Xu wrote:
> Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> >
> > The compiler is within its rights to read a 32-bit quantity 16 bits at
> > at time, even on a 32-bit machine.  I would be glad to help pummel any
> > compiler writer that pulls such a dirty trick, but the C standard really
> > does permit this.
> 
> Code all over the kernel assumes that 32-bit reads/writes
> are atomic so while such a compiler might be legal it certainly
> can't compile Linux.

Yes, the kernel requirements are much stricter than ISO-C. And besides
it is a heavy user of C extensions anyways. On the other hand some of the
C99 extensions are not allowed. And then there is sparse, which enforces
a language which sometimes is quite far from standard C. You could say it is 
written in Linux-C, not ISO C. 

-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-10  9:08             ` Andi Kleen
@ 2007-08-10 15:02               ` Paul E. McKenney
  0 siblings, 0 replies; 36+ messages in thread
From: Paul E. McKenney @ 2007-08-10 15:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Herbert Xu, csnook, linux-kernel, linux-arch, torvalds, netdev,
	akpm, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Fri, Aug 10, 2007 at 11:08:20AM +0200, Andi Kleen wrote:
> On Friday 10 August 2007 10:21:46 Herbert Xu wrote:
> > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > >
> > > The compiler is within its rights to read a 32-bit quantity 16 bits at
> > > at time, even on a 32-bit machine.  I would be glad to help pummel any
> > > compiler writer that pulls such a dirty trick, but the C standard really
> > > does permit this.
> > 
> > Code all over the kernel assumes that 32-bit reads/writes
> > are atomic so while such a compiler might be legal it certainly
> > can't compile Linux.
> 
> Yes, the kernel requirements are much stricter than ISO-C. And besides
> it is a heavy user of C extensions anyways. On the other hand some of the
> C99 extensions are not allowed. And then there is sparse, which enforces
> a language which sometimes is quite far from standard C. You could say it is 
> written in Linux-C, not ISO C. 

Understood.  My question is "why do we want the semantics of atomic_read()
and atomic_set() to differ?"

						Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-10  8:21           ` Herbert Xu
  2007-08-10  9:08             ` Andi Kleen
@ 2007-08-10 20:07             ` Segher Boessenkool
  2007-08-11  0:00               ` Herbert Xu
  1 sibling, 1 reply; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-10 20:07 UTC (permalink / raw)
  To: Herbert Xu
  Cc: paulmck, heiko.carstens, horms, linux-kernel, csnook, rpjday,
	netdev, ak, cfriesen, akpm, torvalds, jesper.juhl, linux-arch,
	zlynx, schwidefsky, davem, wensong, wjiang

>> The compiler is within its rights to read a 32-bit quantity 16 bits at
>> at time, even on a 32-bit machine.  I would be glad to help pummel any
>> compiler writer that pulls such a dirty trick, but the C standard 
>> really
>> does permit this.
>
> Code all over the kernel assumes that 32-bit reads/writes
> are atomic so while such a compiler might be legal it certainly
> can't compile Linux.

That means GCC cannot compile Linux; it already optimises
some accesses to scalars to smaller accesses when it knows
it is allowed to.  Not often though, since it hardly ever
helps in the cost model it employs.


Segher


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-10 20:07             ` Segher Boessenkool
@ 2007-08-11  0:00               ` Herbert Xu
  2007-08-11  0:38                 ` Segher Boessenkool
  0 siblings, 1 reply; 36+ messages in thread
From: Herbert Xu @ 2007-08-11  0:00 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: paulmck, heiko.carstens, horms, linux-kernel, csnook, rpjday,
	netdev, ak, cfriesen, akpm, torvalds, jesper.juhl, linux-arch,
	zlynx, schwidefsky, davem, wensong, wjiang

On Fri, Aug 10, 2007 at 10:07:27PM +0200, Segher Boessenkool wrote:
> 
> That means GCC cannot compile Linux; it already optimises
> some accesses to scalars to smaller accesses when it knows
> it is allowed to.  Not often though, since it hardly ever
> helps in the cost model it employs.

Please give an example code snippet + gcc version + arch
to back this up.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-11  0:00               ` Herbert Xu
@ 2007-08-11  0:38                 ` Segher Boessenkool
  2007-08-11  0:43                   ` Herbert Xu
  2007-08-11  4:38                   ` Valdis.Kletnieks
  0 siblings, 2 replies; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-11  0:38 UTC (permalink / raw)
  To: Herbert Xu
  Cc: paulmck, heiko.carstens, horms, linux-kernel, csnook, rpjday,
	netdev, ak, cfriesen, akpm, torvalds, jesper.juhl, linux-arch,
	zlynx, schwidefsky, davem, wensong, wjiang

>> That means GCC cannot compile Linux; it already optimises
>> some accesses to scalars to smaller accesses when it knows
>> it is allowed to.  Not often though, since it hardly ever
>> helps in the cost model it employs.
>
> Please give an example code snippet + gcc version + arch
> to back this up.

	unsigned char f(unsigned long *p)
	{
	        return *p & 1;
	}

with both

	powerpc64-linux-gcc (GCC) 4.3.0 20070731 (experimental)

and

	powerpc64-linux-gcc-4.2.0 (GCC) 4.2.0

(sorry, I don't have anything newer or older right now; if you
really care, I can test with those too)

generate (in 64-bit mode):

	.L.f:
		lbz 3,7(3)
		rldicl 3,3,0,63
		blr

and in 32-bit mode:

	f:
		stwu 1,-16(1)
		nop
		nop
		lbz 3,3(3)
		addi 1,1,16
		rlwinm 3,3,0,31,31
		blr

(the nops are because I use --with-cpu=970).


But perhaps you do not care for PowerPC, in which case:

	i686-linux-gcc (GCC) 4.2.0 20060410 (experimental)

(sorry for the old version, I don't build x86 compilers
all that often; also I don't have a 64-bit version right
now):

	f:
		pushl   %ebp
		movl    %esp, %ebp
		movl    8(%ebp), %eax
		popl    %ebp
		movzbl  (%eax), %eax
		andl    $1, %eax
		ret


If you want testing with any other versions, and/or for
any other target architecture, I can do that; it takes a
few minutes to build a compiler.

It is quite hard to build a testcase that reads more than
one part of the "long", since for small testcases the
compiler will almost always be smart enough to do one
bigger read instead; but it certainly isn't inconceivable,
and anyway the compiler would be fully in its right to do
reads non-atomically if not instructed otherwise.


Segher


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-11  0:38                 ` Segher Boessenkool
@ 2007-08-11  0:43                   ` Herbert Xu
  2007-08-11  0:50                     ` Segher Boessenkool
  2007-08-11  4:38                   ` Valdis.Kletnieks
  1 sibling, 1 reply; 36+ messages in thread
From: Herbert Xu @ 2007-08-11  0:43 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: paulmck, heiko.carstens, horms, linux-kernel, csnook, rpjday,
	netdev, ak, cfriesen, akpm, torvalds, jesper.juhl, linux-arch,
	zlynx, schwidefsky, davem, wensong, wjiang

On Sat, Aug 11, 2007 at 02:38:40AM +0200, Segher Boessenkool wrote:
> >>That means GCC cannot compile Linux; it already optimises
> >>some accesses to scalars to smaller accesses when it knows
> >>it is allowed to.  Not often though, since it hardly ever
> >>helps in the cost model it employs.
> >
> >Please give an example code snippet + gcc version + arch
> >to back this up.
> 
> 	unsigned char f(unsigned long *p)
> 	{
> 	        return *p & 1;
> 	}

This doesn't really matter since we only care about the LSB.
Do you have an example where gcc reads it non-atmoically and
we care about all parts?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-11  0:43                   ` Herbert Xu
@ 2007-08-11  0:50                     ` Segher Boessenkool
  0 siblings, 0 replies; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-11  0:50 UTC (permalink / raw)
  To: Herbert Xu
  Cc: paulmck, heiko.carstens, horms, linux-kernel, csnook, rpjday,
	netdev, ak, cfriesen, akpm, torvalds, jesper.juhl, linux-arch,
	zlynx, schwidefsky, davem, wensong, wjiang

>>>> That means GCC cannot compile Linux; it already optimises
>>>> some accesses to scalars to smaller accesses when it knows
>>>> it is allowed to.  Not often though, since it hardly ever
>>>> helps in the cost model it employs.
>>>
>>> Please give an example code snippet + gcc version + arch
>>> to back this up.
>>
>> 	unsigned char f(unsigned long *p)
>> 	{
>> 	        return *p & 1;
>> 	}
>
> This doesn't really matter since we only care about the LSB.

It is exactly what I claimed, and what you asked proof of.

> Do you have an example where gcc reads it non-atmoically and
> we care about all parts?

Like I explained in the original mail; no, I suspect such
a testcase will be really hard to construct, esp. as a small
testcase.  I have no reason to believe it is impossible to
do so though -- maybe someone else can write trickier code
than I can, in which case, please do so.


Segher


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
  2007-08-11  0:38                 ` Segher Boessenkool
  2007-08-11  0:43                   ` Herbert Xu
@ 2007-08-11  4:38                   ` Valdis.Kletnieks
  1 sibling, 0 replies; 36+ messages in thread
From: Valdis.Kletnieks @ 2007-08-11  4:38 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Herbert Xu, paulmck, heiko.carstens, horms, linux-kernel, csnook,
	rpjday, netdev, ak, cfriesen, akpm, torvalds, jesper.juhl,
	linux-arch, zlynx, schwidefsky, davem, wensong, wjiang

[-- Attachment #1: Type: text/plain, Size: 686 bytes --]

On Sat, 11 Aug 2007 02:38:40 +0200, Segher Boessenkool said:
> >> That means GCC cannot compile Linux; it already optimises
> >> some accesses to scalars to smaller accesses when it knows
> >> it is allowed to.  Not often though, since it hardly ever
> >> helps in the cost model it employs.
> >
> > Please give an example code snippet + gcc version + arch
> > to back this up.
> 
> 	unsigned char f(unsigned long *p)
> 	{
> 	        return *p & 1;
> 	}

Not really valid, because it's still able to do one atomic access to
compute the result.

Now, if you had found an example where it converts a 32-bit atomic access into
2 separate 16-bit accesses that weren't atomic as a whole....

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2007-08-11  4:38 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-09 13:24 [PATCH 1/24] make atomic_read() behave consistently on alpha Chris Snook
2007-08-09 14:32 ` Paul E. McKenney
2007-08-09 14:53   ` Chris Snook
2007-08-09 15:04     ` Paul E. McKenney
2007-08-09 15:24       ` Chris Snook
2007-08-09 15:50         ` Segher Boessenkool
2007-08-09 16:20           ` Chris Snook
2007-08-09 18:38             ` Segher Boessenkool
2007-08-09 19:05               ` Chris Snook
2007-08-09 19:19                 ` Segher Boessenkool
2007-08-09 19:25                 ` Geert Uytterhoeven
2007-08-09 19:47                   ` Chris Snook
2007-08-09 23:02                     ` Segher Boessenkool
2007-08-09 16:10         ` Paul E. McKenney
2007-08-09 16:36           ` Chris Snook
2007-08-09 16:58             ` Paul E. McKenney
2007-08-09 17:14               ` Chris Snook
2007-08-09 17:41                 ` Paul E. McKenney
2007-08-09 18:13                   ` Chris Snook
2007-08-09 18:45                     ` Paul E. McKenney
2007-08-09 19:24                       ` Chris Snook
2007-08-10  1:28                         ` Paul E. McKenney
2007-08-10 19:49                           ` Chris Snook
2007-08-10 20:26                             ` Paul E. McKenney
2007-08-09 19:17                 ` Segher Boessenkool
2007-08-09 18:51             ` Segher Boessenkool
2007-08-09 19:30               ` Chris Snook
2007-08-10  8:21           ` Herbert Xu
2007-08-10  9:08             ` Andi Kleen
2007-08-10 15:02               ` Paul E. McKenney
2007-08-10 20:07             ` Segher Boessenkool
2007-08-11  0:00               ` Herbert Xu
2007-08-11  0:38                 ` Segher Boessenkool
2007-08-11  0:43                   ` Herbert Xu
2007-08-11  0:50                     ` Segher Boessenkool
2007-08-11  4:38                   ` Valdis.Kletnieks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).