* [PATCH 1/24] make atomic_read() behave consistently on alpha @ 2007-08-09 13:24 Chris Snook 2007-08-09 14:32 ` Paul E. McKenney 0 siblings, 1 reply; 36+ messages in thread From: Chris Snook @ 2007-08-09 13:24 UTC (permalink / raw) To: linux-kernel, linux-arch, torvalds Cc: netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl From: Chris Snook <csnook@redhat.com> Purify volatile use for atomic[64]_t on alpha. Signed-off-by: Chris Snook <csnook@redhat.com> --- linux-2.6.23-rc2-orig/include/asm-alpha/atomic.h 2007-07-08 19:32:17.000000000 -0400 +++ linux-2.6.23-rc2/include/asm-alpha/atomic.h 2007-08-09 09:19:00.000000000 -0400 @@ -14,18 +14,22 @@ /* - * Counter is volatile to make sure gcc doesn't try to be clever - * and move things around on us. We need to use _exactly_ the address - * the user gave us, not some alias that contains the same information. + * Make sure gcc doesn't try to be clever and move things around + * on us. We need to use _exactly_ the address the user gave us, + * not some alias that contains the same information. */ -typedef struct { volatile int counter; } atomic_t; -typedef struct { volatile long counter; } atomic64_t; +typedef struct { int counter; } atomic_t; +typedef struct { long counter; } atomic64_t; #define ATOMIC_INIT(i) ( (atomic_t) { (i) } ) #define ATOMIC64_INIT(i) ( (atomic64_t) { (i) } ) -#define atomic_read(v) ((v)->counter + 0) -#define atomic64_read(v) ((v)->counter + 0) +/* + * Casting to volatile here minimizes the need for barriers, + * without having to declare the type itself as volatile. + */ +#define atomic_read(v) (*(volatile int *)&(v)->counter + 0) +#define atomic64_read(v) (*(volatile long *)&(v)->counter + 0) #define atomic_set(v,i) ((v)->counter = (i)) #define atomic64_set(v,i) ((v)->counter = (i)) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 13:24 [PATCH 1/24] make atomic_read() behave consistently on alpha Chris Snook @ 2007-08-09 14:32 ` Paul E. McKenney 2007-08-09 14:53 ` Chris Snook 0 siblings, 1 reply; 36+ messages in thread From: Paul E. McKenney @ 2007-08-09 14:32 UTC (permalink / raw) To: Chris Snook Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Thu, Aug 09, 2007 at 09:24:42AM -0400, Chris Snook wrote: > From: Chris Snook <csnook@redhat.com> > > Purify volatile use for atomic[64]_t on alpha. Why not the same access-once semantics for atomic_set() as for atomic_read()? As this patch stands, it might introduce architecture-specific compiler-induced bugs due to the fact that atomic_set() used to imply volatile behavior but no longer does. See below for one approach to fixing this. > Signed-off-by: Chris Snook <csnook@redhat.com> > > --- linux-2.6.23-rc2-orig/include/asm-alpha/atomic.h 2007-07-08 19:32:17.000000000 -0400 > +++ linux-2.6.23-rc2/include/asm-alpha/atomic.h 2007-08-09 09:19:00.000000000 -0400 > @@ -14,18 +14,22 @@ > > > /* > - * Counter is volatile to make sure gcc doesn't try to be clever > - * and move things around on us. We need to use _exactly_ the address > - * the user gave us, not some alias that contains the same information. > + * Make sure gcc doesn't try to be clever and move things around > + * on us. We need to use _exactly_ the address the user gave us, > + * not some alias that contains the same information. > */ > -typedef struct { volatile int counter; } atomic_t; > -typedef struct { volatile long counter; } atomic64_t; > +typedef struct { int counter; } atomic_t; > +typedef struct { long counter; } atomic64_t; > > #define ATOMIC_INIT(i) ( (atomic_t) { (i) } ) > #define ATOMIC64_INIT(i) ( (atomic64_t) { (i) } ) > > -#define atomic_read(v) ((v)->counter + 0) > -#define atomic64_read(v) ((v)->counter + 0) > +/* > + * Casting to volatile here minimizes the need for barriers, > + * without having to declare the type itself as volatile. > + */ > +#define atomic_read(v) (*(volatile int *)&(v)->counter + 0) > +#define atomic64_read(v) (*(volatile long *)&(v)->counter + 0) > > #define atomic_set(v,i) ((v)->counter = (i)) > #define atomic64_set(v,i) ((v)->counter = (i)) How about: #define atomic_set(v,i) ((*(volatile int *)&(v)->counter) = (i)) #define atomic64_set(v,i) ((*(volatile long *)&(v)->counter) = (i)) Thanx, Paul > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 14:32 ` Paul E. McKenney @ 2007-08-09 14:53 ` Chris Snook 2007-08-09 15:04 ` Paul E. McKenney 0 siblings, 1 reply; 36+ messages in thread From: Chris Snook @ 2007-08-09 14:53 UTC (permalink / raw) To: paulmck Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Paul E. McKenney wrote: > Why not the same access-once semantics for atomic_set() as > for atomic_read()? As this patch stands, it might introduce > architecture-specific compiler-induced bugs due to the fact that > atomic_set() used to imply volatile behavior but no longer does. When we make the volatile cast in atomic_read(), we're casting an rvalue to volatile. This unambiguously tells the compiler that we want to re-load that register from memory. What's "volatile behavior" for an lvalue? A write to an lvalue already implies an eventual write to memory, so this would be a no-op. Maybe you'll write to the register a few times before flushing it to memory, but it will happen eventually. With an rvalue, there's no guarantee that it will *ever* load from memory, which is what volatile fixes. I think what you have in mind is LOCK_PREFIX behavior, which is not the purpose of atomic_set. We use LOCK_PREFIX in the inline assembly for the atomic_* operations that read, modify, and write a value, only because it is necessary to perform that entire transaction atomically. -- Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 14:53 ` Chris Snook @ 2007-08-09 15:04 ` Paul E. McKenney 2007-08-09 15:24 ` Chris Snook 0 siblings, 1 reply; 36+ messages in thread From: Paul E. McKenney @ 2007-08-09 15:04 UTC (permalink / raw) To: Chris Snook Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Thu, Aug 09, 2007 at 10:53:14AM -0400, Chris Snook wrote: > Paul E. McKenney wrote: > >Why not the same access-once semantics for atomic_set() as > >for atomic_read()? As this patch stands, it might introduce > >architecture-specific compiler-induced bugs due to the fact that > >atomic_set() used to imply volatile behavior but no longer does. > > When we make the volatile cast in atomic_read(), we're casting an rvalue to > volatile. This unambiguously tells the compiler that we want to re-load > that register from memory. What's "volatile behavior" for an lvalue? I was absolutely -not- suggesting volatile behavior for lvalues. Instead, I am asking for volatile behavior from an -rvalue-. In the case of atomic_read(), it is the atomic_t being read from. In the case of atomic_set(), it is the atomic_t being written to. As suggested in my previous email: #define atomic_set(v,i) ((*(volatile int *)&(v)->counter) = (i)) #define atomic64_set(v,i) ((*(volatile long *)&(v)->counter) = (i)) Again, the architectures that used to have their "counter" declared as volatile will lose volatile semantics on atomic_set() with your patch, which might result in bugs due to overly imaginative compiler optimizations. The above would prevent any such bugs from appearing. > A > write to an lvalue already implies an eventual write to memory, so this > would be a no-op. Maybe you'll write to the register a few times before > flushing it to memory, but it will happen eventually. With an rvalue, > there's no guarantee that it will *ever* load from memory, which is what > volatile fixes. > > I think what you have in mind is LOCK_PREFIX behavior, which is not the > purpose of atomic_set. We use LOCK_PREFIX in the inline assembly for the > atomic_* operations that read, modify, and write a value, only because it > is necessary to perform that entire transaction atomically. No LOCK_PREFIX, thank you!!! I just want to make sure that the compiler doesn't push the store down out of a loop, split the store, allow the store to happen twice (e.g., to allow different code paths to be merged), and all the other tricks that the C standard permits compilers to pull. Thanx, Paul ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 15:04 ` Paul E. McKenney @ 2007-08-09 15:24 ` Chris Snook 2007-08-09 15:50 ` Segher Boessenkool 2007-08-09 16:10 ` Paul E. McKenney 0 siblings, 2 replies; 36+ messages in thread From: Chris Snook @ 2007-08-09 15:24 UTC (permalink / raw) To: paulmck Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Paul E. McKenney wrote: > On Thu, Aug 09, 2007 at 10:53:14AM -0400, Chris Snook wrote: >> Paul E. McKenney wrote: >>> Why not the same access-once semantics for atomic_set() as >>> for atomic_read()? As this patch stands, it might introduce >>> architecture-specific compiler-induced bugs due to the fact that >>> atomic_set() used to imply volatile behavior but no longer does. >> When we make the volatile cast in atomic_read(), we're casting an rvalue to >> volatile. This unambiguously tells the compiler that we want to re-load >> that register from memory. What's "volatile behavior" for an lvalue? > > I was absolutely -not- suggesting volatile behavior for lvalues. > > Instead, I am asking for volatile behavior from an -rvalue-. In the > case of atomic_read(), it is the atomic_t being read from. In the case > of atomic_set(), it is the atomic_t being written to. As suggested in > my previous email: > > #define atomic_set(v,i) ((*(volatile int *)&(v)->counter) = (i)) > #define atomic64_set(v,i) ((*(volatile long *)&(v)->counter) = (i)) That looks like a volatile lvalue to me. I confess I didn't exactly ace compilers. Care to explain this? > Again, the architectures that used to have their "counter" declared > as volatile will lose volatile semantics on atomic_set() with your > patch, which might result in bugs due to overly imaginative compiler > optimizations. The above would prevent any such bugs from appearing. > >> A >> write to an lvalue already implies an eventual write to memory, so this >> would be a no-op. Maybe you'll write to the register a few times before >> flushing it to memory, but it will happen eventually. With an rvalue, >> there's no guarantee that it will *ever* load from memory, which is what >> volatile fixes. >> >> I think what you have in mind is LOCK_PREFIX behavior, which is not the >> purpose of atomic_set. We use LOCK_PREFIX in the inline assembly for the >> atomic_* operations that read, modify, and write a value, only because it >> is necessary to perform that entire transaction atomically. > > No LOCK_PREFIX, thank you!!! I just want to make sure that the compiler > doesn't push the store down out of a loop, split the store, allow the > store to happen twice (e.g., to allow different code paths to be merged), > and all the other tricks that the C standard permits compilers to pull. We can't have split stores because we don't use atomic64_t on 32-bit architectures. volatile won't save you from pushing stores out of loops unless you're also doing reads. This is why we use reads to flush writes to mmio registers. In this case, an atomic_read() with volatile in it will suffice. Storing twice is perfectly legal here, though it's unlikely that an optimizing compiler smart enough to create the problem we're addressing would ever do that. -- Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 15:24 ` Chris Snook @ 2007-08-09 15:50 ` Segher Boessenkool 2007-08-09 16:20 ` Chris Snook 2007-08-09 16:10 ` Paul E. McKenney 1 sibling, 1 reply; 36+ messages in thread From: Segher Boessenkool @ 2007-08-09 15:50 UTC (permalink / raw) To: Chris Snook Cc: wjiang, rpjday, wensong, heiko.carstens, linux-kernel, ak, netdev, paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds, schwidefsky, davem, cfriesen, zlynx > We can't have split stores because we don't use atomic64_t on 32-bit > architectures. That's not true; the compiler is free to split all stores (and reads) from memory however it wants. It is debatable whether "volatile" would prevent this as well, certainly it is unsafe if you want to be portable. GCC will do its best to not split volatile memory accesses, but bugs in this area do happen a lot (because the compiler code for volatile isn't as well exercised as most other compiler code, and because it is simply a hard subject; and there is no real formalised model for what GCC should do). The only safe way to get atomic accesses is to write assembler code. Are there any downsides to that? I don't see any. Segher ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 15:50 ` Segher Boessenkool @ 2007-08-09 16:20 ` Chris Snook 2007-08-09 18:38 ` Segher Boessenkool 0 siblings, 1 reply; 36+ messages in thread From: Chris Snook @ 2007-08-09 16:20 UTC (permalink / raw) To: Segher Boessenkool Cc: wjiang, rpjday, wensong, heiko.carstens, linux-kernel, ak, netdev, paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds, schwidefsky, davem, cfriesen, zlynx Segher Boessenkool wrote: >> We can't have split stores because we don't use atomic64_t on 32-bit >> architectures. > > That's not true; the compiler is free to split all stores > (and reads) from memory however it wants. It is debatable > whether "volatile" would prevent this as well, certainly > it is unsafe if you want to be portable. GCC will do its > best to not split volatile memory accesses, but bugs in > this area do happen a lot (because the compiler code for > volatile isn't as well exercised as most other compiler > code, and because it is simply a hard subject; and there > is no real formalised model for what GCC should do). > > The only safe way to get atomic accesses is to write > assembler code. Are there any downsides to that? I don't > see any. The assumption that aligned word reads and writes are atomic, and that words are aligned unless explicitly packed otherwise, is endemic in the kernel. No sane compiler violates this assumption. It's true that we're not portable to insane compilers after this patch, but we never were in the first place. -- Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 16:20 ` Chris Snook @ 2007-08-09 18:38 ` Segher Boessenkool 2007-08-09 19:05 ` Chris Snook 0 siblings, 1 reply; 36+ messages in thread From: Segher Boessenkool @ 2007-08-09 18:38 UTC (permalink / raw) To: Chris Snook Cc: wjiang, cfriesen, wensong, heiko.carstens, linux-kernel, ak, netdev, paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds, zlynx, rpjday, schwidefsky, davem >> The only safe way to get atomic accesses is to write >> assembler code. Are there any downsides to that? I don't >> see any. > > The assumption that aligned word reads and writes are atomic, and that > words are aligned unless explicitly packed otherwise, is endemic in > the kernel. No sane compiler violates this assumption. It's true > that we're not portable to insane compilers after this patch, but we > never were in the first place. You didn't answer my question: are there any downsides to using explicit coded-in-assembler accesses for atomic accesses? You can handwave all you want that it should "just work" with volatile accesses, but volatility != atomicity, volatile in C is really badly defined, GCC never officially gave stronger guarantees, and we have a bugzilla full of PRs to show what a minefield it is. So, why not use the well-defined alternative? Segher ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 18:38 ` Segher Boessenkool @ 2007-08-09 19:05 ` Chris Snook 2007-08-09 19:19 ` Segher Boessenkool 2007-08-09 19:25 ` Geert Uytterhoeven 0 siblings, 2 replies; 36+ messages in thread From: Chris Snook @ 2007-08-09 19:05 UTC (permalink / raw) To: Segher Boessenkool Cc: wjiang, cfriesen, wensong, heiko.carstens, linux-kernel, ak, netdev, paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds, zlynx, rpjday, schwidefsky, davem Segher Boessenkool wrote: >>> The only safe way to get atomic accesses is to write >>> assembler code. Are there any downsides to that? I don't >>> see any. >> >> The assumption that aligned word reads and writes are atomic, and that >> words are aligned unless explicitly packed otherwise, is endemic in >> the kernel. No sane compiler violates this assumption. It's true >> that we're not portable to insane compilers after this patch, but we >> never were in the first place. > > You didn't answer my question: are there any downsides to using > explicit coded-in-assembler accesses for atomic accesses? You > can handwave all you want that it should "just work" with > volatile accesses, but volatility != atomicity, volatile in C > is really badly defined, GCC never officially gave stronger > guarantees, and we have a bugzilla full of PRs to show what a > minefield it is. > > So, why not use the well-defined alternative? Because we don't need to, and it hurts performance. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 19:05 ` Chris Snook @ 2007-08-09 19:19 ` Segher Boessenkool 2007-08-09 19:25 ` Geert Uytterhoeven 1 sibling, 0 replies; 36+ messages in thread From: Segher Boessenkool @ 2007-08-09 19:19 UTC (permalink / raw) To: Chris Snook Cc: wjiang, schwidefsky, wensong, heiko.carstens, linux-kernel, ak, cfriesen, netdev, paulmck, horms, akpm, linux-arch, jesper.juhl, davem, torvalds, zlynx, rpjday >>>> The only safe way to get atomic accesses is to write >>>> assembler code. Are there any downsides to that? I don't >>>> see any. >>> >>> The assumption that aligned word reads and writes are atomic, and >>> that words are aligned unless explicitly packed otherwise, is >>> endemic in the kernel. No sane compiler violates this assumption. >>> It's true that we're not portable to insane compilers after this >>> patch, but we never were in the first place. >> You didn't answer my question: are there any downsides to using >> explicit coded-in-assembler accesses for atomic accesses? You >> can handwave all you want that it should "just work" with >> volatile accesses, but volatility != atomicity, volatile in C >> is really badly defined, GCC never officially gave stronger >> guarantees, and we have a bugzilla full of PRs to show what a >> minefield it is. >> So, why not use the well-defined alternative? > > Because we don't need to, You don't need to use volatile objects, or accesses through valatile-cast pointers, either. > and it hurts performance. Please show how it does this -- one load is one load either way, and one store is one store. Segher ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 19:05 ` Chris Snook 2007-08-09 19:19 ` Segher Boessenkool @ 2007-08-09 19:25 ` Geert Uytterhoeven 2007-08-09 19:47 ` Chris Snook 1 sibling, 1 reply; 36+ messages in thread From: Geert Uytterhoeven @ 2007-08-09 19:25 UTC (permalink / raw) To: Chris Snook Cc: Segher Boessenkool, wjiang, cfriesen, wensong, heiko.carstens, linux-kernel, ak, netdev, paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds, zlynx, rpjday, schwidefsky, davem On Thu, 9 Aug 2007, Chris Snook wrote: > Segher Boessenkool wrote: > > > > The only safe way to get atomic accesses is to write > > > > assembler code. Are there any downsides to that? I don't > > > > see any. > > > > > > The assumption that aligned word reads and writes are atomic, and that > > > words are aligned unless explicitly packed otherwise, is endemic in the > > > kernel. No sane compiler violates this assumption. It's true that we're > > > not portable to insane compilers after this patch, but we never were in > > > the first place. > > > > You didn't answer my question: are there any downsides to using > > explicit coded-in-assembler accesses for atomic accesses? You > > can handwave all you want that it should "just work" with > > volatile accesses, but volatility != atomicity, volatile in C > > is really badly defined, GCC never officially gave stronger > > guarantees, and we have a bugzilla full of PRs to show what a > > minefield it is. > > > > So, why not use the well-defined alternative? > > Because we don't need to, and it hurts performance. It hurts performance by implementing 32-bit atomic reads in assembler? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 19:25 ` Geert Uytterhoeven @ 2007-08-09 19:47 ` Chris Snook 2007-08-09 23:02 ` Segher Boessenkool 0 siblings, 1 reply; 36+ messages in thread From: Chris Snook @ 2007-08-09 19:47 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Segher Boessenkool, wjiang, cfriesen, wensong, heiko.carstens, linux-kernel, ak, netdev, paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds, zlynx, rpjday, schwidefsky, davem Geert Uytterhoeven wrote: > On Thu, 9 Aug 2007, Chris Snook wrote: >> Segher Boessenkool wrote: >>>>> The only safe way to get atomic accesses is to write >>>>> assembler code. Are there any downsides to that? I don't >>>>> see any. >>>> The assumption that aligned word reads and writes are atomic, and that >>>> words are aligned unless explicitly packed otherwise, is endemic in the >>>> kernel. No sane compiler violates this assumption. It's true that we're >>>> not portable to insane compilers after this patch, but we never were in >>>> the first place. >>> You didn't answer my question: are there any downsides to using >>> explicit coded-in-assembler accesses for atomic accesses? You >>> can handwave all you want that it should "just work" with >>> volatile accesses, but volatility != atomicity, volatile in C >>> is really badly defined, GCC never officially gave stronger >>> guarantees, and we have a bugzilla full of PRs to show what a >>> minefield it is. >>> >>> So, why not use the well-defined alternative? >> Because we don't need to, and it hurts performance. > > It hurts performance by implementing 32-bit atomic reads in assembler? No, I misunderstood the question. Implementing 32-bit atomic reads in assembler is redundant, because any sane compiler, *particularly* and optimizing compiler (and we're only in this mess because of optimizing compilers) will give us that automatically without the assembler. Yes, it is legal for a compiler to violate this assumption. It is also legal for us to refuse to maintain compatibility with compilers that suck this badly. That decision was made a very long time ago, and I consider it the correct decision. -- Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 19:47 ` Chris Snook @ 2007-08-09 23:02 ` Segher Boessenkool 0 siblings, 0 replies; 36+ messages in thread From: Segher Boessenkool @ 2007-08-09 23:02 UTC (permalink / raw) To: Chris Snook Cc: paulmck, heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm, torvalds, jesper.juhl, Geert Uytterhoeven, linux-arch, zlynx, schwidefsky, davem, wensong, wjiang >>>> So, why not use the well-defined alternative? >>> Because we don't need to, and it hurts performance. >> It hurts performance by implementing 32-bit atomic reads in assembler? > > No, I misunderstood the question. Implementing 32-bit atomic reads in > assembler is redundant, because any sane compiler, *particularly* and > optimizing compiler (and we're only in this mess because of optimizing > compilers) Oh please, don't tell me you don't want an optimising compiler. And if you _do_ want one, well you're in this mess because you chose C as implementation language and C has some pretty strange rules. Trying to use not-all-that-well-defined-and-completely- misunderstood features of the language doesn't make things easier; trying to use something that isn't even part of the language and that your particular compiler originally supported by accident, and that isn't yet an officially supported feature, and that on top of it all has a track record of problems -- well it makes me wonder if you're in this game for fun or what. > will give us that automatically without the assembler. No, it does *not* give it to you automatically; you have to do either the asm() thing, or the not-defined-at-all *(volatile *)& thing. > Yes, it is legal for a compiler to violate this assumption. It is > also legal for us to refuse to maintain compatibility with compilers > that suck this badly. So that's rm include/linux/compiler-gcc*.h then. Good luck with the intel compiler, maybe it works more to your liking. Segher ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 15:24 ` Chris Snook 2007-08-09 15:50 ` Segher Boessenkool @ 2007-08-09 16:10 ` Paul E. McKenney 2007-08-09 16:36 ` Chris Snook 2007-08-10 8:21 ` Herbert Xu 1 sibling, 2 replies; 36+ messages in thread From: Paul E. McKenney @ 2007-08-09 16:10 UTC (permalink / raw) To: Chris Snook Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Thu, Aug 09, 2007 at 11:24:22AM -0400, Chris Snook wrote: > Paul E. McKenney wrote: > >On Thu, Aug 09, 2007 at 10:53:14AM -0400, Chris Snook wrote: > >>Paul E. McKenney wrote: > >>>Why not the same access-once semantics for atomic_set() as > >>>for atomic_read()? As this patch stands, it might introduce > >>>architecture-specific compiler-induced bugs due to the fact that > >>>atomic_set() used to imply volatile behavior but no longer does. > >>When we make the volatile cast in atomic_read(), we're casting an rvalue > >>to volatile. This unambiguously tells the compiler that we want to > >>re-load that register from memory. What's "volatile behavior" for an > >>lvalue? > > > >I was absolutely -not- suggesting volatile behavior for lvalues. > > > >Instead, I am asking for volatile behavior from an -rvalue-. In the > >case of atomic_read(), it is the atomic_t being read from. In the case > >of atomic_set(), it is the atomic_t being written to. As suggested in > >my previous email: > > > >#define atomic_set(v,i) ((*(volatile int *)&(v)->counter) = > >(i)) > >#define atomic64_set(v,i) ((*(volatile long *)&(v)->counter) = (i)) > > That looks like a volatile lvalue to me. I confess I didn't exactly ace > compilers. Care to explain this? OK, so I am dylexic this morning. Never could tell left from right... This is indeed an lvalue, as is any non-function expression that you can apply the "&" prefix operator to. Including the expression in your proposed definitions of atomic_set() and atomic_set64(). An lvalue is any expression that -could- appear on the left-hand side of an assignment operator, regardless of where it actually appears. More precisely, an lvalue is an expression that refers to a variable in such a way that the variable might be both loaded from and stored to, ignoring things like "const" for the moment. Because "(v)->counter" could be either loaded from or stored to, it is an lvalue, which is why the compiler didn't scream bloody murder at you when you took the address of it. > >Again, the architectures that used to have their "counter" declared > >as volatile will lose volatile semantics on atomic_set() with your > >patch, which might result in bugs due to overly imaginative compiler > >optimizations. The above would prevent any such bugs from appearing. > > > >> A > >>write to an lvalue already implies an eventual write to memory, so this > >>would be a no-op. Maybe you'll write to the register a few times before > >>flushing it to memory, but it will happen eventually. With an rvalue, > >>there's no guarantee that it will *ever* load from memory, which is what > >>volatile fixes. > >> > >>I think what you have in mind is LOCK_PREFIX behavior, which is not the > >>purpose of atomic_set. We use LOCK_PREFIX in the inline assembly for the > >>atomic_* operations that read, modify, and write a value, only because it > >>is necessary to perform that entire transaction atomically. > > > >No LOCK_PREFIX, thank you!!! I just want to make sure that the compiler > >doesn't push the store down out of a loop, split the store, allow the > >store to happen twice (e.g., to allow different code paths to be merged), > >and all the other tricks that the C standard permits compilers to pull. > > We can't have split stores because we don't use atomic64_t on 32-bit > architectures. volatile won't save you from pushing stores out of loops > unless you're also doing reads. This is why we use reads to flush writes > to mmio registers. In this case, an atomic_read() with volatile in it will > suffice. Storing twice is perfectly legal here, though it's unlikely that > an optimizing compiler smart enough to create the problem we're addressing > would ever do that. The compiler is within its rights to read a 32-bit quantity 16 bits at at time, even on a 32-bit machine. I would be glad to help pummel any compiler writer that pulls such a dirty trick, but the C standard really does permit this. Use of volatile does in fact save you from the compiler pushing stores out of loops regardless of whether you are also doing reads. The C standard has the notion of sequence points, which occur at various places including the ends of statements and the control expressions for "if" and "while" statements. The compiler is not permitted to move volatile references across a sequence point. Therefore, the compiler is not allowed to push a volatile store out of a loop. Now the CPU might well do such a reordering, but that is a separate issue to be dealt with via memory barriers. Note that it is the CPU and I/O system, not the compiler, that is forcing you to use reads to flush writes to MMIO registers. And you would be amazed at what compiler writers will do in order to get an additional fraction of a percent out of SpecCPU... In short, please retain atomic_set()'s volatility, especially on those architectures that declared the atomic_t's counter to be volatile. Thanx, Paul ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 16:10 ` Paul E. McKenney @ 2007-08-09 16:36 ` Chris Snook 2007-08-09 16:58 ` Paul E. McKenney 2007-08-09 18:51 ` Segher Boessenkool 2007-08-10 8:21 ` Herbert Xu 1 sibling, 2 replies; 36+ messages in thread From: Chris Snook @ 2007-08-09 16:36 UTC (permalink / raw) To: paulmck Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Paul E. McKenney wrote: > The compiler is within its rights to read a 32-bit quantity 16 bits at > at time, even on a 32-bit machine. I would be glad to help pummel any > compiler writer that pulls such a dirty trick, but the C standard really > does permit this. Yes, but we don't write code for these compilers. There are countless pieces of kernel code which would break in this condition, and there doesn't seem to be any interest in fixing this. > Use of volatile does in fact save you from the compiler pushing stores out > of loops regardless of whether you are also doing reads. The C standard > has the notion of sequence points, which occur at various places including > the ends of statements and the control expressions for "if" and "while" > statements. The compiler is not permitted to move volatile references > across a sequence point. Therefore, the compiler is not allowed to > push a volatile store out of a loop. Now the CPU might well do such a > reordering, but that is a separate issue to be dealt with via memory > barriers. Note that it is the CPU and I/O system, not the compiler, > that is forcing you to use reads to flush writes to MMIO registers. Sequence points enforce read-after-write ordering, not write-after-write. We flush writes with reads for MMIO because of this effect as well as the CPU/bus effects. > And you would be amazed at what compiler writers will do in order to > get an additional fraction of a percent out of SpecCPU... Probably not :) > In short, please retain atomic_set()'s volatility, especially on those > architectures that declared the atomic_t's counter to be volatile. Like i386 and x86_64? These used to have volatile in the atomic_t declaration. We removed it, and the sky did not fall. -- Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 16:36 ` Chris Snook @ 2007-08-09 16:58 ` Paul E. McKenney 2007-08-09 17:14 ` Chris Snook 2007-08-09 18:51 ` Segher Boessenkool 1 sibling, 1 reply; 36+ messages in thread From: Paul E. McKenney @ 2007-08-09 16:58 UTC (permalink / raw) To: Chris Snook Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Thu, Aug 09, 2007 at 12:36:17PM -0400, Chris Snook wrote: > Paul E. McKenney wrote: > >The compiler is within its rights to read a 32-bit quantity 16 bits at > >at time, even on a 32-bit machine. I would be glad to help pummel any > >compiler writer that pulls such a dirty trick, but the C standard really > >does permit this. > > Yes, but we don't write code for these compilers. There are countless > pieces of kernel code which would break in this condition, and there > doesn't seem to be any interest in fixing this. > > >Use of volatile does in fact save you from the compiler pushing stores out > >of loops regardless of whether you are also doing reads. The C standard > >has the notion of sequence points, which occur at various places including > >the ends of statements and the control expressions for "if" and "while" > >statements. The compiler is not permitted to move volatile references > >across a sequence point. Therefore, the compiler is not allowed to > >push a volatile store out of a loop. Now the CPU might well do such a > >reordering, but that is a separate issue to be dealt with via memory > >barriers. Note that it is the CPU and I/O system, not the compiler, > >that is forcing you to use reads to flush writes to MMIO registers. > > Sequence points enforce read-after-write ordering, not write-after-write. > We flush writes with reads for MMIO because of this effect as well as the > CPU/bus effects. Neither volatile reads nor volatile writes may be moved across sequence points. > >And you would be amazed at what compiler writers will do in order to > >get an additional fraction of a percent out of SpecCPU... > > Probably not :) > > >In short, please retain atomic_set()'s volatility, especially on those > >architectures that declared the atomic_t's counter to be volatile. > > Like i386 and x86_64? These used to have volatile in the atomic_t > declaration. We removed it, and the sky did not fall. Interesting. You tested all possible configs on all possible hardware? Thanx, Paul ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 16:58 ` Paul E. McKenney @ 2007-08-09 17:14 ` Chris Snook 2007-08-09 17:41 ` Paul E. McKenney 2007-08-09 19:17 ` Segher Boessenkool 0 siblings, 2 replies; 36+ messages in thread From: Chris Snook @ 2007-08-09 17:14 UTC (permalink / raw) To: paulmck Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Paul E. McKenney wrote: > On Thu, Aug 09, 2007 at 12:36:17PM -0400, Chris Snook wrote: >> Paul E. McKenney wrote: >>> The compiler is within its rights to read a 32-bit quantity 16 bits at >>> at time, even on a 32-bit machine. I would be glad to help pummel any >>> compiler writer that pulls such a dirty trick, but the C standard really >>> does permit this. >> Yes, but we don't write code for these compilers. There are countless >> pieces of kernel code which would break in this condition, and there >> doesn't seem to be any interest in fixing this. >> >>> Use of volatile does in fact save you from the compiler pushing stores out >>> of loops regardless of whether you are also doing reads. The C standard >>> has the notion of sequence points, which occur at various places including >>> the ends of statements and the control expressions for "if" and "while" >>> statements. The compiler is not permitted to move volatile references >>> across a sequence point. Therefore, the compiler is not allowed to >>> push a volatile store out of a loop. Now the CPU might well do such a >>> reordering, but that is a separate issue to be dealt with via memory >>> barriers. Note that it is the CPU and I/O system, not the compiler, >>> that is forcing you to use reads to flush writes to MMIO registers. >> Sequence points enforce read-after-write ordering, not write-after-write. >> We flush writes with reads for MMIO because of this effect as well as the >> CPU/bus effects. > > Neither volatile reads nor volatile writes may be moved across sequence > points. By the compiler, or by the CPU? If you're depending on volatile writes being visible to other CPUs, you're screwed either way, because the CPU can hold that data in cache as long as it wants before it writes it to memory. When this finally does happen, it will happen atomically, which is all that atomic_set guarantees. If you need to guarantee that the value is written to memory at a particular time in your execution sequence, you either have to read it from memory to force the compiler to store it first (and a volatile cast in atomic_read will suffice for this) or you have to use LOCK_PREFIX instructions which will invalidate remote cache lines containing the same variable. This patch doesn't change either of these cases. >>> And you would be amazed at what compiler writers will do in order to >>> get an additional fraction of a percent out of SpecCPU... >> Probably not :) >> >>> In short, please retain atomic_set()'s volatility, especially on those >>> architectures that declared the atomic_t's counter to be volatile. >> Like i386 and x86_64? These used to have volatile in the atomic_t >> declaration. We removed it, and the sky did not fall. > > Interesting. You tested all possible configs on all possible hardware? No, but I can reason about it and be confident that this won't break anything that isn't already broken. At worst, this patch will make any existing subtly incorrect uses of atomic_t much more obvious and easier to track down. -- Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 17:14 ` Chris Snook @ 2007-08-09 17:41 ` Paul E. McKenney 2007-08-09 18:13 ` Chris Snook 2007-08-09 19:17 ` Segher Boessenkool 1 sibling, 1 reply; 36+ messages in thread From: Paul E. McKenney @ 2007-08-09 17:41 UTC (permalink / raw) To: Chris Snook Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote: > Paul E. McKenney wrote: > >On Thu, Aug 09, 2007 at 12:36:17PM -0400, Chris Snook wrote: > >>Paul E. McKenney wrote: > >>>The compiler is within its rights to read a 32-bit quantity 16 bits at > >>>at time, even on a 32-bit machine. I would be glad to help pummel any > >>>compiler writer that pulls such a dirty trick, but the C standard really > >>>does permit this. > >>Yes, but we don't write code for these compilers. There are countless > >>pieces of kernel code which would break in this condition, and there > >>doesn't seem to be any interest in fixing this. > >> > >>>Use of volatile does in fact save you from the compiler pushing stores > >>>out > >>>of loops regardless of whether you are also doing reads. The C standard > >>>has the notion of sequence points, which occur at various places > >>>including > >>>the ends of statements and the control expressions for "if" and "while" > >>>statements. The compiler is not permitted to move volatile references > >>>across a sequence point. Therefore, the compiler is not allowed to > >>>push a volatile store out of a loop. Now the CPU might well do such a > >>>reordering, but that is a separate issue to be dealt with via memory > >>>barriers. Note that it is the CPU and I/O system, not the compiler, > >>>that is forcing you to use reads to flush writes to MMIO registers. > >>Sequence points enforce read-after-write ordering, not write-after-write. > >>We flush writes with reads for MMIO because of this effect as well as the > >>CPU/bus effects. > > > >Neither volatile reads nor volatile writes may be moved across sequence > >points. > > By the compiler, or by the CPU? As mentioned in earlier emails, by the compiler. The CPU knows nothing of C sequence points. > If you're depending on volatile writes > being visible to other CPUs, you're screwed either way, because the CPU can > hold that data in cache as long as it wants before it writes it to memory. > When this finally does happen, it will happen atomically, which is all that > atomic_set guarantees. If you need to guarantee that the value is written > to memory at a particular time in your execution sequence, you either have > to read it from memory to force the compiler to store it first (and a > volatile cast in atomic_read will suffice for this) or you have to use > LOCK_PREFIX instructions which will invalidate remote cache lines > containing the same variable. This patch doesn't change either of these > cases. The case that it -can- change is interactions with interrupt handlers. And NMI/SMI handlers, for that matter. > >>>And you would be amazed at what compiler writers will do in order to > >>>get an additional fraction of a percent out of SpecCPU... > >>Probably not :) > >> > >>>In short, please retain atomic_set()'s volatility, especially on those > >>>architectures that declared the atomic_t's counter to be volatile. > >>Like i386 and x86_64? These used to have volatile in the atomic_t > >>declaration. We removed it, and the sky did not fall. > > > >Interesting. You tested all possible configs on all possible hardware? > > No, but I can reason about it and be confident that this won't break > anything that isn't already broken. At worst, this patch will make any > existing subtly incorrect uses of atomic_t much more obvious and easier to > track down. You took interrupt and NMI/SMI handlers into account? Thanx, Paul ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 17:41 ` Paul E. McKenney @ 2007-08-09 18:13 ` Chris Snook 2007-08-09 18:45 ` Paul E. McKenney 0 siblings, 1 reply; 36+ messages in thread From: Chris Snook @ 2007-08-09 18:13 UTC (permalink / raw) To: paulmck Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Paul E. McKenney wrote: > On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote: >> If you're depending on volatile writes >> being visible to other CPUs, you're screwed either way, because the CPU can >> hold that data in cache as long as it wants before it writes it to memory. >> When this finally does happen, it will happen atomically, which is all that >> atomic_set guarantees. If you need to guarantee that the value is written >> to memory at a particular time in your execution sequence, you either have >> to read it from memory to force the compiler to store it first (and a >> volatile cast in atomic_read will suffice for this) or you have to use >> LOCK_PREFIX instructions which will invalidate remote cache lines >> containing the same variable. This patch doesn't change either of these >> cases. > > The case that it -can- change is interactions with interrupt handlers. > And NMI/SMI handlers, for that matter. You have a point here, but only if you can guarantee that the interrupt handler is running on a processor sharing the cache that has the not-yet-written volatile value. That implies a strictly non-SMP architecture. At the moment, none of those have volatile in their declaration of atomic_t, so this patch can't break any of them. -- Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 18:13 ` Chris Snook @ 2007-08-09 18:45 ` Paul E. McKenney 2007-08-09 19:24 ` Chris Snook 0 siblings, 1 reply; 36+ messages in thread From: Paul E. McKenney @ 2007-08-09 18:45 UTC (permalink / raw) To: Chris Snook Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote: > Paul E. McKenney wrote: > >On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote: > >> If you're depending on volatile writes > >>being visible to other CPUs, you're screwed either way, because the CPU > >>can hold that data in cache as long as it wants before it writes it to > >>memory. When this finally does happen, it will happen atomically, which > >>is all that atomic_set guarantees. If you need to guarantee that the > >>value is written to memory at a particular time in your execution > >>sequence, you either have to read it from memory to force the compiler to > >>store it first (and a volatile cast in atomic_read will suffice for this) > >>or you have to use LOCK_PREFIX instructions which will invalidate remote > >>cache lines containing the same variable. This patch doesn't change > >>either of these cases. > > > >The case that it -can- change is interactions with interrupt handlers. > >And NMI/SMI handlers, for that matter. > > You have a point here, but only if you can guarantee that the interrupt > handler is running on a processor sharing the cache that has the > not-yet-written volatile value. That implies a strictly non-SMP > architecture. At the moment, none of those have volatile in their > declaration of atomic_t, so this patch can't break any of them. This can also happen when using per-CPU variables. And there are a number of per-CPU variables that are either atomic themselves or are structures containing atomic fields. Thanx, Paul ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 18:45 ` Paul E. McKenney @ 2007-08-09 19:24 ` Chris Snook 2007-08-10 1:28 ` Paul E. McKenney 0 siblings, 1 reply; 36+ messages in thread From: Chris Snook @ 2007-08-09 19:24 UTC (permalink / raw) To: paulmck Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Paul E. McKenney wrote: > On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote: >> Paul E. McKenney wrote: >>> On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote: >>>> If you're depending on volatile writes >>>> being visible to other CPUs, you're screwed either way, because the CPU >>>> can hold that data in cache as long as it wants before it writes it to >>>> memory. When this finally does happen, it will happen atomically, which >>>> is all that atomic_set guarantees. If you need to guarantee that the >>>> value is written to memory at a particular time in your execution >>>> sequence, you either have to read it from memory to force the compiler to >>>> store it first (and a volatile cast in atomic_read will suffice for this) >>>> or you have to use LOCK_PREFIX instructions which will invalidate remote >>>> cache lines containing the same variable. This patch doesn't change >>>> either of these cases. >>> The case that it -can- change is interactions with interrupt handlers. >>> And NMI/SMI handlers, for that matter. >> You have a point here, but only if you can guarantee that the interrupt >> handler is running on a processor sharing the cache that has the >> not-yet-written volatile value. That implies a strictly non-SMP >> architecture. At the moment, none of those have volatile in their >> declaration of atomic_t, so this patch can't break any of them. > > This can also happen when using per-CPU variables. And there are a > number of per-CPU variables that are either atomic themselves or are > structures containing atomic fields. Accessing per-CPU variables in this fashion reliably already requires a suitable smp/non-smp read/write memory barrier. I maintain that if we break anything with this change, it was really already broken, if less obviously. Can you give a real or synthetic example of legitimate code that could break? -- Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 19:24 ` Chris Snook @ 2007-08-10 1:28 ` Paul E. McKenney 2007-08-10 19:49 ` Chris Snook 0 siblings, 1 reply; 36+ messages in thread From: Paul E. McKenney @ 2007-08-10 1:28 UTC (permalink / raw) To: Chris Snook Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Thu, Aug 09, 2007 at 03:24:40PM -0400, Chris Snook wrote: > Paul E. McKenney wrote: > >On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote: > >>Paul E. McKenney wrote: > >>>On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote: > >>>> If you're depending on volatile writes > >>>>being visible to other CPUs, you're screwed either way, because the CPU > >>>>can hold that data in cache as long as it wants before it writes it to > >>>>memory. When this finally does happen, it will happen atomically, > >>>>which is all that atomic_set guarantees. If you need to guarantee that > >>>>the value is written to memory at a particular time in your execution > >>>>sequence, you either have to read it from memory to force the compiler > >>>>to store it first (and a volatile cast in atomic_read will suffice for > >>>>this) or you have to use LOCK_PREFIX instructions which will invalidate > >>>>remote cache lines containing the same variable. This patch doesn't > >>>>change either of these cases. > >>>The case that it -can- change is interactions with interrupt handlers. > >>>And NMI/SMI handlers, for that matter. > >>You have a point here, but only if you can guarantee that the interrupt > >>handler is running on a processor sharing the cache that has the > >>not-yet-written volatile value. That implies a strictly non-SMP > >>architecture. At the moment, none of those have volatile in their > >>declaration of atomic_t, so this patch can't break any of them. > > > >This can also happen when using per-CPU variables. And there are a > >number of per-CPU variables that are either atomic themselves or are > >structures containing atomic fields. > > Accessing per-CPU variables in this fashion reliably already requires a > suitable smp/non-smp read/write memory barrier. I maintain that if we > break anything with this change, it was really already broken, if less > obviously. Can you give a real or synthetic example of legitimate code > that could break? My main concern is actually the lack of symmetry -- I would expect that an atomic_set() would have the same properties as atomic_read(). It is easy and cheap to provide them with similar properties, so why not? Debugging even a single problem would consume far more time than simply giving them corresponding semantics. But you asked for examples. These are synthetic, and of course legitimacy is in the eye of the beholder. 1. Watchdog variable. atomic_t watchdog = ATOMIC_INIT(0); ... int i; while (!done) { /* Do so stuff that doesn't take more than a few us. */ /* Could do atomic increment, but throughput penalty. */ i++; atomic_set(&watchdog, i); } do_something_with(&watchdog); /* Every so often on some other CPU... */ if ((new_watchdog = atomic_read(&watchdog)) == old_watchdog) die_horribly(); old_watchdog = new_watchdog; If atomic_set() did not have volatile semantics, the compiler would be within its rights optimizing it to simply get the final value of "i" after exit from the loop. This would cause the watchdog check to fail spuriously. Memory barriers are not required in this case, because the CPU cannot hang onto the value for very long -- we don't care about the exact value, or about exact synchronization, but rather about whether or not the value is changing. In this (toy) example, one might replace the atomic_set() with an atomic increment (though that might be too expensive in some cases) or with something like: atomic_set(&watchdog, atomic_read(&watchdog) + 1); However, other cases might not permit this transformation, for example, an existing heavily used API might take int rather than atomic_t. Some will no doubt argue that this example should use a macro or an asm similar to the "forget()" asm put forward elsewhere in this thread. 2. Communicating both with interrupt handler and with other CPUs. For example, data elements that are built up in a location visible to interrupts and NMIs, and then added as a unit to a data structure visible to other CPUs. This more-realistic example is abbreviated to the point of pointlessness as follows: struct foo { atomic_t a; atomic_t b; }; DEFINE_PER_CPU(struct foo *, staging) = NULL; /* Create element in staging area. */ __get_cpu_var(staging) = kzalloc(sizeof(*p), GFP_WHATEVER); if (__get_cpu_var(staging) == NULL) die_horribly(); /* allocate an element of some per-CPU array, get the result in "i" */ atomic_set(__get_cpu_var(staging).a, i); /* allocate another element of a per-CPU array, with result in "i" */ atomic_set(__get_cpu_var(staging).b, i); rcu_assign_pointer(some_global_place, __get_cpu_var(staging)); If atomic_set() didn't have volatile semantics, then an interrupt or NMI handler could see the atomic_set() to .a and .b out of order due to compiler optimizations. Remember, you -did- ask for these!!! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-10 1:28 ` Paul E. McKenney @ 2007-08-10 19:49 ` Chris Snook 2007-08-10 20:26 ` Paul E. McKenney 0 siblings, 1 reply; 36+ messages in thread From: Chris Snook @ 2007-08-10 19:49 UTC (permalink / raw) To: paulmck Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Paul E. McKenney wrote: > On Thu, Aug 09, 2007 at 03:24:40PM -0400, Chris Snook wrote: >> Paul E. McKenney wrote: >>> On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote: >>>> Paul E. McKenney wrote: >>>>> On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote: >>>>>> If you're depending on volatile writes >>>>>> being visible to other CPUs, you're screwed either way, because the CPU >>>>>> can hold that data in cache as long as it wants before it writes it to >>>>>> memory. When this finally does happen, it will happen atomically, >>>>>> which is all that atomic_set guarantees. If you need to guarantee that >>>>>> the value is written to memory at a particular time in your execution >>>>>> sequence, you either have to read it from memory to force the compiler >>>>>> to store it first (and a volatile cast in atomic_read will suffice for >>>>>> this) or you have to use LOCK_PREFIX instructions which will invalidate >>>>>> remote cache lines containing the same variable. This patch doesn't >>>>>> change either of these cases. >>>>> The case that it -can- change is interactions with interrupt handlers. >>>>> And NMI/SMI handlers, for that matter. >>>> You have a point here, but only if you can guarantee that the interrupt >>>> handler is running on a processor sharing the cache that has the >>>> not-yet-written volatile value. That implies a strictly non-SMP >>>> architecture. At the moment, none of those have volatile in their >>>> declaration of atomic_t, so this patch can't break any of them. >>> This can also happen when using per-CPU variables. And there are a >>> number of per-CPU variables that are either atomic themselves or are >>> structures containing atomic fields. >> Accessing per-CPU variables in this fashion reliably already requires a >> suitable smp/non-smp read/write memory barrier. I maintain that if we >> break anything with this change, it was really already broken, if less >> obviously. Can you give a real or synthetic example of legitimate code >> that could break? > > My main concern is actually the lack of symmetry -- I would expect > that an atomic_set() would have the same properties as atomic_read(). > It is easy and cheap to provide them with similar properties, so why not? > Debugging even a single problem would consume far more time than simply > giving them corresponding semantics. > > But you asked for examples. These are synthetic, and of course legitimacy > is in the eye of the beholder. > > 1. Watchdog variable. > > atomic_t watchdog = ATOMIC_INIT(0); > > ... > > int i; > while (!done) { > > /* Do so stuff that doesn't take more than a few us. */ > /* Could do atomic increment, but throughput penalty. */ > > i++; > atomic_set(&watchdog, i); > } > do_something_with(&watchdog); > > > /* Every so often on some other CPU... */ > > if ((new_watchdog = atomic_read(&watchdog)) == old_watchdog) > die_horribly(); > old_watchdog = new_watchdog; > > > If atomic_set() did not have volatile semantics, the compiler > would be within its rights optimizing it to simply get the > final value of "i" after exit from the loop. This would cause > the watchdog check to fail spuriously. Memory barriers are > not required in this case, because the CPU cannot hang onto > the value for very long -- we don't care about the exact value, > or about exact synchronization, but rather about whether or > not the value is changing. > > In this (toy) example, one might replace the atomic_set() with > an atomic increment (though that might be too expensive in some > cases) or with something like: > > atomic_set(&watchdog, atomic_read(&watchdog) + 1); > > However, other cases might not permit this transformation, > for example, an existing heavily used API might take int rather > than atomic_t. > > Some will no doubt argue that this example should use a > macro or an asm similar to the "forget()" asm put forward > elsewhere in this thread. > > 2. Communicating both with interrupt handler and with other CPUs. > For example, data elements that are built up in a location visible > to interrupts and NMIs, and then added as a unit to a data structure > visible to other CPUs. This more-realistic example is abbreviated > to the point of pointlessness as follows: > > struct foo { > atomic_t a; > atomic_t b; > }; > > DEFINE_PER_CPU(struct foo *, staging) = NULL; > > /* Create element in staging area. */ > > __get_cpu_var(staging) = kzalloc(sizeof(*p), GFP_WHATEVER); > if (__get_cpu_var(staging) == NULL) > die_horribly(); > /* allocate an element of some per-CPU array, get the result in "i" */ > atomic_set(__get_cpu_var(staging).a, i); > /* allocate another element of a per-CPU array, with result in "i" */ > atomic_set(__get_cpu_var(staging).b, i); > rcu_assign_pointer(some_global_place, __get_cpu_var(staging)); > > If atomic_set() didn't have volatile semantics, then an interrupt > or NMI handler could see the atomic_set() to .a and .b out of > order due to compiler optimizations. > > Remember, you -did- ask for these!!! ;-) Ok, I'm convinced. Part of the motivation here is to avoid heisenbugs, so if people expect volatile atomic_set behavior, I'm inclined to give it to them. I don't really feel like indulging the compiler bug paranoiacs, but developer expectations are a legitimate motivation, and a major part of why I posted this in the first place. I'll resubmit the patchset with a volatile cast in atomic_set. Before I do, is there anything *else* that desperately needs such a cast? As far as I can tell, all the other functions are implemented with __asm__ __volatile__, or with spinlocks that use that under the hood. -- Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-10 19:49 ` Chris Snook @ 2007-08-10 20:26 ` Paul E. McKenney 0 siblings, 0 replies; 36+ messages in thread From: Paul E. McKenney @ 2007-08-10 20:26 UTC (permalink / raw) To: Chris Snook Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Fri, Aug 10, 2007 at 03:49:03PM -0400, Chris Snook wrote: > Paul E. McKenney wrote: > >On Thu, Aug 09, 2007 at 03:24:40PM -0400, Chris Snook wrote: > >>Paul E. McKenney wrote: > >>>On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote: > >>>>Paul E. McKenney wrote: > >>>>>On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote: > >>>>>> If you're depending on volatile writes > >>>>>>being visible to other CPUs, you're screwed either way, because the > >>>>>>CPU can hold that data in cache as long as it wants before it writes > >>>>>>it to memory. When this finally does happen, it will happen > >>>>>>atomically, which is all that atomic_set guarantees. If you need to > >>>>>>guarantee that the value is written to memory at a particular time in > >>>>>>your execution sequence, you either have to read it from memory to > >>>>>>force the compiler to store it first (and a volatile cast in > >>>>>>atomic_read will suffice for this) or you have to use LOCK_PREFIX > >>>>>>instructions which will invalidate remote cache lines containing the > >>>>>>same variable. This patch doesn't change either of these cases. > >>>>>The case that it -can- change is interactions with interrupt handlers. > >>>>>And NMI/SMI handlers, for that matter. > >>>>You have a point here, but only if you can guarantee that the interrupt > >>>>handler is running on a processor sharing the cache that has the > >>>>not-yet-written volatile value. That implies a strictly non-SMP > >>>>architecture. At the moment, none of those have volatile in their > >>>>declaration of atomic_t, so this patch can't break any of them. > >>>This can also happen when using per-CPU variables. And there are a > >>>number of per-CPU variables that are either atomic themselves or are > >>>structures containing atomic fields. > >>Accessing per-CPU variables in this fashion reliably already requires a > >>suitable smp/non-smp read/write memory barrier. I maintain that if we > >>break anything with this change, it was really already broken, if less > >>obviously. Can you give a real or synthetic example of legitimate code > >>that could break? > > > >My main concern is actually the lack of symmetry -- I would expect > >that an atomic_set() would have the same properties as atomic_read(). > >It is easy and cheap to provide them with similar properties, so why not? > >Debugging even a single problem would consume far more time than simply > >giving them corresponding semantics. > > > >But you asked for examples. These are synthetic, and of course legitimacy > >is in the eye of the beholder. > > > >1. Watchdog variable. > > > > atomic_t watchdog = ATOMIC_INIT(0); > > > > ... > > > > int i; > > while (!done) { > > > > /* Do so stuff that doesn't take more than a few us. */ > > /* Could do atomic increment, but throughput penalty. */ > > > > i++; > > atomic_set(&watchdog, i); > > } > > do_something_with(&watchdog); > > > > > > /* Every so often on some other CPU... */ > > > > if ((new_watchdog = atomic_read(&watchdog)) == old_watchdog) > > die_horribly(); > > old_watchdog = new_watchdog; > > > > > > If atomic_set() did not have volatile semantics, the compiler > > would be within its rights optimizing it to simply get the > > final value of "i" after exit from the loop. This would cause > > the watchdog check to fail spuriously. Memory barriers are > > not required in this case, because the CPU cannot hang onto > > the value for very long -- we don't care about the exact value, > > or about exact synchronization, but rather about whether or > > not the value is changing. > > > > In this (toy) example, one might replace the atomic_set() with > > an atomic increment (though that might be too expensive in some > > cases) or with something like: > > > > atomic_set(&watchdog, atomic_read(&watchdog) + 1); > > > > However, other cases might not permit this transformation, > > for example, an existing heavily used API might take int rather > > than atomic_t. > > > > Some will no doubt argue that this example should use a > > macro or an asm similar to the "forget()" asm put forward > > elsewhere in this thread. > > > >2. Communicating both with interrupt handler and with other CPUs. > > For example, data elements that are built up in a location visible > > to interrupts and NMIs, and then added as a unit to a data structure > > visible to other CPUs. This more-realistic example is abbreviated > > to the point of pointlessness as follows: > > > > struct foo { > > atomic_t a; > > atomic_t b; > > }; > > > > DEFINE_PER_CPU(struct foo *, staging) = NULL; > > > > /* Create element in staging area. */ > > > > __get_cpu_var(staging) = kzalloc(sizeof(*p), GFP_WHATEVER); > > if (__get_cpu_var(staging) == NULL) > > die_horribly(); > > /* allocate an element of some per-CPU array, get the result in "i" > > */ > > atomic_set(__get_cpu_var(staging).a, i); > > /* allocate another element of a per-CPU array, with result in "i" */ > > atomic_set(__get_cpu_var(staging).b, i); > > rcu_assign_pointer(some_global_place, __get_cpu_var(staging)); > > > > If atomic_set() didn't have volatile semantics, then an interrupt > > or NMI handler could see the atomic_set() to .a and .b out of > > order due to compiler optimizations. > > > >Remember, you -did- ask for these!!! ;-) > > Ok, I'm convinced. Part of the motivation here is to avoid heisenbugs, > so if people expect volatile atomic_set behavior, I'm inclined to give > it to them. I don't really feel like indulging the compiler bug > paranoiacs, but developer expectations are a legitimate motivation, and > a major part of why I posted this in the first place. I'll resubmit the > patchset with a volatile cast in atomic_set. Before I do, is there > anything *else* that desperately needs such a cast? As far as I can > tell, all the other functions are implemented with __asm__ __volatile__, > or with spinlocks that use that under the hood. Sounds good!!! The only other API that I am aware of needing volatile semantics is rcu_dereference(), but I already sent a patch in for it. So as far as I know, atomic_read() and atomic_set() should cover it. Thanx, Paul ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 17:14 ` Chris Snook 2007-08-09 17:41 ` Paul E. McKenney @ 2007-08-09 19:17 ` Segher Boessenkool 1 sibling, 0 replies; 36+ messages in thread From: Segher Boessenkool @ 2007-08-09 19:17 UTC (permalink / raw) To: Chris Snook Cc: wjiang, rpjday, wensong, heiko.carstens, linux-kernel, ak, netdev, paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds, schwidefsky, davem, cfriesen, zlynx > If you need to guarantee that the value is written to memory at a > particular time in your execution sequence, you either have to read it > from memory to force the compiler to store it first That isn't enough. The CPU will happily read the datum back from its own store queue before it ever hit memory or cache or any external bus. If you need the value to be globally visible before another store, you use wmb(); if you need it before some read, you use mb(). These concepts are completely separate from both atomicity and volatility (which aren't the same themselves). > No, but I can reason about it and be confident that this won't break > anything that isn't already broken. At worst, this patch will make > any existing subtly incorrect uses of atomic_t much more obvious and > easier to track down. PR22278, PR29753 -- both P2 bugs, and both about basic *(volatile *) usage. Yeah, it's definitely not problematic, esp. not if you want to support older compilers than 4.2 too.</sarcasm> Segher ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 16:36 ` Chris Snook 2007-08-09 16:58 ` Paul E. McKenney @ 2007-08-09 18:51 ` Segher Boessenkool 2007-08-09 19:30 ` Chris Snook 1 sibling, 1 reply; 36+ messages in thread From: Segher Boessenkool @ 2007-08-09 18:51 UTC (permalink / raw) To: Chris Snook Cc: wjiang, rpjday, wensong, heiko.carstens, linux-kernel, ak, netdev, paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds, schwidefsky, davem, cfriesen, zlynx >> The compiler is within its rights to read a 32-bit quantity 16 bits at >> at time, even on a 32-bit machine. I would be glad to help pummel any >> compiler writer that pulls such a dirty trick, but the C standard >> really >> does permit this. > > Yes, but we don't write code for these compilers. There are countless > pieces of kernel code which would break in this condition, and there > doesn't seem to be any interest in fixing this. "Other things are broken too". Great argument :-) > Sequence points enforce read-after-write ordering, not > write-after-write. Sequence points order *all* side effects; sequence points exist in the domain of the abstract sequential model of the C language only. The compiler translates that to machine code that is equivalent to that C code under the "as-if" rule; but this is still in that abstract model, which doesn't include things such as SMP, visibility by I/O devices, store queues, etc. etc. > We flush writes with reads for MMIO because of this effect as well as > the CPU/bus effects. You cannot flush all MMIO writes with reads; this is a PCI-specific thing. And even then, you need more than just the read itself: you have to make sure the read completed and returned data. >> In short, please retain atomic_set()'s volatility, especially on those >> architectures that declared the atomic_t's counter to be volatile. > > Like i386 and x86_64? These used to have volatile in the atomic_t > declaration. We removed it, and the sky did not fall. And this proves what? Lots of stuff "works" by accident. Segher ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 18:51 ` Segher Boessenkool @ 2007-08-09 19:30 ` Chris Snook 0 siblings, 0 replies; 36+ messages in thread From: Chris Snook @ 2007-08-09 19:30 UTC (permalink / raw) To: Segher Boessenkool Cc: wjiang, rpjday, wensong, heiko.carstens, linux-kernel, ak, netdev, paulmck, horms, akpm, linux-arch, jesper.juhl, torvalds, schwidefsky, davem, cfriesen, zlynx Segher Boessenkool wrote: >>> The compiler is within its rights to read a 32-bit quantity 16 bits at >>> at time, even on a 32-bit machine. I would be glad to help pummel any >>> compiler writer that pulls such a dirty trick, but the C standard really >>> does permit this. >> >> Yes, but we don't write code for these compilers. There are countless >> pieces of kernel code which would break in this condition, and there >> doesn't seem to be any interest in fixing this. > > "Other things are broken too". Great argument :-) We make plenty of practical assumptions in the kernel, and declare incorrect things which violate them, even in cases where there's no commandment from the heavens forbidding them. Since the whole point of this exercise is to prevent badness with *optimizing* compilers, it's quite reasonable to declare broken any so-called optimizer which violates these trivial assumptions. >>> In short, please retain atomic_set()'s volatility, especially on those >>> architectures that declared the atomic_t's counter to be volatile. >> >> Like i386 and x86_64? These used to have volatile in the atomic_t >> declaration. We removed it, and the sky did not fall. > > And this proves what? Lots of stuff "works" by accident. If something breaks because of this, it was already broken, but hidden a lot better. I don't see much of a downside to exposing and fixing those bugs. -- Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-09 16:10 ` Paul E. McKenney 2007-08-09 16:36 ` Chris Snook @ 2007-08-10 8:21 ` Herbert Xu 2007-08-10 9:08 ` Andi Kleen 2007-08-10 20:07 ` Segher Boessenkool 1 sibling, 2 replies; 36+ messages in thread From: Herbert Xu @ 2007-08-10 8:21 UTC (permalink / raw) To: paulmck Cc: csnook, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > The compiler is within its rights to read a 32-bit quantity 16 bits at > at time, even on a 32-bit machine. I would be glad to help pummel any > compiler writer that pulls such a dirty trick, but the C standard really > does permit this. Code all over the kernel assumes that 32-bit reads/writes are atomic so while such a compiler might be legal it certainly can't compile Linux. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-10 8:21 ` Herbert Xu @ 2007-08-10 9:08 ` Andi Kleen 2007-08-10 15:02 ` Paul E. McKenney 2007-08-10 20:07 ` Segher Boessenkool 1 sibling, 1 reply; 36+ messages in thread From: Andi Kleen @ 2007-08-10 9:08 UTC (permalink / raw) To: Herbert Xu Cc: paulmck, csnook, linux-kernel, linux-arch, torvalds, netdev, akpm, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Friday 10 August 2007 10:21:46 Herbert Xu wrote: > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > > The compiler is within its rights to read a 32-bit quantity 16 bits at > > at time, even on a 32-bit machine. I would be glad to help pummel any > > compiler writer that pulls such a dirty trick, but the C standard really > > does permit this. > > Code all over the kernel assumes that 32-bit reads/writes > are atomic so while such a compiler might be legal it certainly > can't compile Linux. Yes, the kernel requirements are much stricter than ISO-C. And besides it is a heavy user of C extensions anyways. On the other hand some of the C99 extensions are not allowed. And then there is sparse, which enforces a language which sometimes is quite far from standard C. You could say it is written in Linux-C, not ISO C. -Andi ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-10 9:08 ` Andi Kleen @ 2007-08-10 15:02 ` Paul E. McKenney 0 siblings, 0 replies; 36+ messages in thread From: Paul E. McKenney @ 2007-08-10 15:02 UTC (permalink / raw) To: Andi Kleen Cc: Herbert Xu, csnook, linux-kernel, linux-arch, torvalds, netdev, akpm, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Fri, Aug 10, 2007 at 11:08:20AM +0200, Andi Kleen wrote: > On Friday 10 August 2007 10:21:46 Herbert Xu wrote: > > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > > > > The compiler is within its rights to read a 32-bit quantity 16 bits at > > > at time, even on a 32-bit machine. I would be glad to help pummel any > > > compiler writer that pulls such a dirty trick, but the C standard really > > > does permit this. > > > > Code all over the kernel assumes that 32-bit reads/writes > > are atomic so while such a compiler might be legal it certainly > > can't compile Linux. > > Yes, the kernel requirements are much stricter than ISO-C. And besides > it is a heavy user of C extensions anyways. On the other hand some of the > C99 extensions are not allowed. And then there is sparse, which enforces > a language which sometimes is quite far from standard C. You could say it is > written in Linux-C, not ISO C. Understood. My question is "why do we want the semantics of atomic_read() and atomic_set() to differ?" Thanx, Paul ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-10 8:21 ` Herbert Xu 2007-08-10 9:08 ` Andi Kleen @ 2007-08-10 20:07 ` Segher Boessenkool 2007-08-11 0:00 ` Herbert Xu 1 sibling, 1 reply; 36+ messages in thread From: Segher Boessenkool @ 2007-08-10 20:07 UTC (permalink / raw) To: Herbert Xu Cc: paulmck, heiko.carstens, horms, linux-kernel, csnook, rpjday, netdev, ak, cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, schwidefsky, davem, wensong, wjiang >> The compiler is within its rights to read a 32-bit quantity 16 bits at >> at time, even on a 32-bit machine. I would be glad to help pummel any >> compiler writer that pulls such a dirty trick, but the C standard >> really >> does permit this. > > Code all over the kernel assumes that 32-bit reads/writes > are atomic so while such a compiler might be legal it certainly > can't compile Linux. That means GCC cannot compile Linux; it already optimises some accesses to scalars to smaller accesses when it knows it is allowed to. Not often though, since it hardly ever helps in the cost model it employs. Segher ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-10 20:07 ` Segher Boessenkool @ 2007-08-11 0:00 ` Herbert Xu 2007-08-11 0:38 ` Segher Boessenkool 0 siblings, 1 reply; 36+ messages in thread From: Herbert Xu @ 2007-08-11 0:00 UTC (permalink / raw) To: Segher Boessenkool Cc: paulmck, heiko.carstens, horms, linux-kernel, csnook, rpjday, netdev, ak, cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, schwidefsky, davem, wensong, wjiang On Fri, Aug 10, 2007 at 10:07:27PM +0200, Segher Boessenkool wrote: > > That means GCC cannot compile Linux; it already optimises > some accesses to scalars to smaller accesses when it knows > it is allowed to. Not often though, since it hardly ever > helps in the cost model it employs. Please give an example code snippet + gcc version + arch to back this up. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-11 0:00 ` Herbert Xu @ 2007-08-11 0:38 ` Segher Boessenkool 2007-08-11 0:43 ` Herbert Xu 2007-08-11 4:38 ` Valdis.Kletnieks 0 siblings, 2 replies; 36+ messages in thread From: Segher Boessenkool @ 2007-08-11 0:38 UTC (permalink / raw) To: Herbert Xu Cc: paulmck, heiko.carstens, horms, linux-kernel, csnook, rpjday, netdev, ak, cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, schwidefsky, davem, wensong, wjiang >> That means GCC cannot compile Linux; it already optimises >> some accesses to scalars to smaller accesses when it knows >> it is allowed to. Not often though, since it hardly ever >> helps in the cost model it employs. > > Please give an example code snippet + gcc version + arch > to back this up. unsigned char f(unsigned long *p) { return *p & 1; } with both powerpc64-linux-gcc (GCC) 4.3.0 20070731 (experimental) and powerpc64-linux-gcc-4.2.0 (GCC) 4.2.0 (sorry, I don't have anything newer or older right now; if you really care, I can test with those too) generate (in 64-bit mode): .L.f: lbz 3,7(3) rldicl 3,3,0,63 blr and in 32-bit mode: f: stwu 1,-16(1) nop nop lbz 3,3(3) addi 1,1,16 rlwinm 3,3,0,31,31 blr (the nops are because I use --with-cpu=970). But perhaps you do not care for PowerPC, in which case: i686-linux-gcc (GCC) 4.2.0 20060410 (experimental) (sorry for the old version, I don't build x86 compilers all that often; also I don't have a 64-bit version right now): f: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax popl %ebp movzbl (%eax), %eax andl $1, %eax ret If you want testing with any other versions, and/or for any other target architecture, I can do that; it takes a few minutes to build a compiler. It is quite hard to build a testcase that reads more than one part of the "long", since for small testcases the compiler will almost always be smart enough to do one bigger read instead; but it certainly isn't inconceivable, and anyway the compiler would be fully in its right to do reads non-atomically if not instructed otherwise. Segher ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-11 0:38 ` Segher Boessenkool @ 2007-08-11 0:43 ` Herbert Xu 2007-08-11 0:50 ` Segher Boessenkool 2007-08-11 4:38 ` Valdis.Kletnieks 1 sibling, 1 reply; 36+ messages in thread From: Herbert Xu @ 2007-08-11 0:43 UTC (permalink / raw) To: Segher Boessenkool Cc: paulmck, heiko.carstens, horms, linux-kernel, csnook, rpjday, netdev, ak, cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, schwidefsky, davem, wensong, wjiang On Sat, Aug 11, 2007 at 02:38:40AM +0200, Segher Boessenkool wrote: > >>That means GCC cannot compile Linux; it already optimises > >>some accesses to scalars to smaller accesses when it knows > >>it is allowed to. Not often though, since it hardly ever > >>helps in the cost model it employs. > > > >Please give an example code snippet + gcc version + arch > >to back this up. > > unsigned char f(unsigned long *p) > { > return *p & 1; > } This doesn't really matter since we only care about the LSB. Do you have an example where gcc reads it non-atmoically and we care about all parts? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-11 0:43 ` Herbert Xu @ 2007-08-11 0:50 ` Segher Boessenkool 0 siblings, 0 replies; 36+ messages in thread From: Segher Boessenkool @ 2007-08-11 0:50 UTC (permalink / raw) To: Herbert Xu Cc: paulmck, heiko.carstens, horms, linux-kernel, csnook, rpjday, netdev, ak, cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, schwidefsky, davem, wensong, wjiang >>>> That means GCC cannot compile Linux; it already optimises >>>> some accesses to scalars to smaller accesses when it knows >>>> it is allowed to. Not often though, since it hardly ever >>>> helps in the cost model it employs. >>> >>> Please give an example code snippet + gcc version + arch >>> to back this up. >> >> unsigned char f(unsigned long *p) >> { >> return *p & 1; >> } > > This doesn't really matter since we only care about the LSB. It is exactly what I claimed, and what you asked proof of. > Do you have an example where gcc reads it non-atmoically and > we care about all parts? Like I explained in the original mail; no, I suspect such a testcase will be really hard to construct, esp. as a small testcase. I have no reason to believe it is impossible to do so though -- maybe someone else can write trickier code than I can, in which case, please do so. Segher ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha 2007-08-11 0:38 ` Segher Boessenkool 2007-08-11 0:43 ` Herbert Xu @ 2007-08-11 4:38 ` Valdis.Kletnieks 1 sibling, 0 replies; 36+ messages in thread From: Valdis.Kletnieks @ 2007-08-11 4:38 UTC (permalink / raw) To: Segher Boessenkool Cc: Herbert Xu, paulmck, heiko.carstens, horms, linux-kernel, csnook, rpjday, netdev, ak, cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, schwidefsky, davem, wensong, wjiang [-- Attachment #1: Type: text/plain, Size: 686 bytes --] On Sat, 11 Aug 2007 02:38:40 +0200, Segher Boessenkool said: > >> That means GCC cannot compile Linux; it already optimises > >> some accesses to scalars to smaller accesses when it knows > >> it is allowed to. Not often though, since it hardly ever > >> helps in the cost model it employs. > > > > Please give an example code snippet + gcc version + arch > > to back this up. > > unsigned char f(unsigned long *p) > { > return *p & 1; > } Not really valid, because it's still able to do one atomic access to compute the result. Now, if you had found an example where it converts a 32-bit atomic access into 2 separate 16-bit accesses that weren't atomic as a whole.... [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2007-08-11 4:38 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-08-09 13:24 [PATCH 1/24] make atomic_read() behave consistently on alpha Chris Snook 2007-08-09 14:32 ` Paul E. McKenney 2007-08-09 14:53 ` Chris Snook 2007-08-09 15:04 ` Paul E. McKenney 2007-08-09 15:24 ` Chris Snook 2007-08-09 15:50 ` Segher Boessenkool 2007-08-09 16:20 ` Chris Snook 2007-08-09 18:38 ` Segher Boessenkool 2007-08-09 19:05 ` Chris Snook 2007-08-09 19:19 ` Segher Boessenkool 2007-08-09 19:25 ` Geert Uytterhoeven 2007-08-09 19:47 ` Chris Snook 2007-08-09 23:02 ` Segher Boessenkool 2007-08-09 16:10 ` Paul E. McKenney 2007-08-09 16:36 ` Chris Snook 2007-08-09 16:58 ` Paul E. McKenney 2007-08-09 17:14 ` Chris Snook 2007-08-09 17:41 ` Paul E. McKenney 2007-08-09 18:13 ` Chris Snook 2007-08-09 18:45 ` Paul E. McKenney 2007-08-09 19:24 ` Chris Snook 2007-08-10 1:28 ` Paul E. McKenney 2007-08-10 19:49 ` Chris Snook 2007-08-10 20:26 ` Paul E. McKenney 2007-08-09 19:17 ` Segher Boessenkool 2007-08-09 18:51 ` Segher Boessenkool 2007-08-09 19:30 ` Chris Snook 2007-08-10 8:21 ` Herbert Xu 2007-08-10 9:08 ` Andi Kleen 2007-08-10 15:02 ` Paul E. McKenney 2007-08-10 20:07 ` Segher Boessenkool 2007-08-11 0:00 ` Herbert Xu 2007-08-11 0:38 ` Segher Boessenkool 2007-08-11 0:43 ` Herbert Xu 2007-08-11 0:50 ` Segher Boessenkool 2007-08-11 4:38 ` Valdis.Kletnieks
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).