* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion @ 2001-10-10 15:24 Paul McKenney 2001-10-10 16:58 ` Andrea Arcangeli 2001-10-12 5:43 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with Albert D. Cahalan 0 siblings, 2 replies; 23+ messages in thread From: Paul McKenney @ 2001-10-10 15:24 UTC (permalink / raw) To: Andrea Arcangeli Cc: Peter Rival, Ivan Kokshaysky, Jay.Estabrook, linux-kernel, lse-tech, Richard Henderson, cardoza, woodward > > true for older alphas, especially because they are strictly in-order > > machines, unlike ev6. > > Yes, it sounds strange. However According to Paul this would not be the > cpu but a cache coherency issue. rmb() would enforce the cache coherency > etc... so maybe the issue is related to old SMP motherboard etc... not > even to the cpus ... dunno. But as said it sounded very strange that > also new chips and new boards have such a weird reodering trouble. It sounded strange to me, too. ;-) And my first reading of the Alpha AXP Archtecture RM didn't help me much. It was indeed a cache coherency issue. The architect I talked to felt that it was a feature rather than a bug. I have an email in to him. In the meantime, Compaq's patent #6,108,737 leads me to believe that others in DEC/Compaq also believe it to be a feature. The paragraph starting at column 2 line 20 of the body of this patent states: In a weakly-ordered system, an order is imposed between selected sets of memory reference operations, while other operations are considered unordered. One or more MB instructions are used to indicate the required order. In the case of an MB instruction defined by the Alpha (R) 21264 processor instruction set, the MB denotes that all memory reference instructions above the MB (i.e., pre-MB instructions) are ordered before all reference instructions after the MB (i.e., post-MB instructions). However, no order is required between reference instructions that are not separated by an MB. (The patent talks about the WMB instruction later on.) In other words, if there is no MB, the CPU is not required to maintain ordering. Regardless of data dependencies or anything else. There is also an application note at http://www.openvms.compaq.com/wizard/wiz_2637.html which states: For instance, your producer must issue a "memory barrier" instruction after writing the data to shared memory and before inserting it on the queue; likewise, your consumer must issue a memory barrier instruction after removing an item from the queue and before reading from its memory. Otherwise, you risk seeing stale data, since, while the Alpha processor does provide coherent memory, it does not provide implicit ordering of reads and writes. (That is, the write of the producer's data might reach memory after the write of the queue, such that the consumer might read the new item from the queue but get the previous values from the item's memory. Note that they require a memory barrier (rmb()) between the time the item is removed from the queue and the time that the data in the item is referenced, despite the fact that there is a data dependency between the dequeueing and the dereferencing. So, again, data dependency does -not- substitute for an MB on Alpha. Comments from DEC/Compaq people who know Alpha architecture details? Thanx, Paul > > I suspect some confusion here - probably that architect meant loads > > to independent addresses. Of course, in this case mb() is required > > to assure ordering. > > > > Ivan. > > > Andrea > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-10 15:24 [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Paul McKenney @ 2001-10-10 16:58 ` Andrea Arcangeli 2001-10-10 17:25 ` Linus Torvalds 2001-10-12 5:43 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with Albert D. Cahalan 1 sibling, 1 reply; 23+ messages in thread From: Andrea Arcangeli @ 2001-10-10 16:58 UTC (permalink / raw) To: Paul McKenney Cc: Peter Rival, Ivan Kokshaysky, Jay.Estabrook, linux-kernel, lse-tech, Richard Henderson, cardoza, woodward On Wed, Oct 10, 2001 at 08:24:11AM -0700, Paul McKenney wrote: > which states: > > For instance, your producer must issue a "memory barrier" instruction > after writing the data to shared memory and before inserting it on > the queue; likewise, your consumer must issue a memory barrier > instruction after removing an item from the queue and before reading > from its memory. Otherwise, you risk seeing stale data, since, > while the Alpha processor does provide coherent memory, it does > not provide implicit ordering of reads and writes. (That is, the > write of the producer's data might reach memory after the write of > the queue, such that the consumer might read the new item from the > queue but get the previous values from the item's memory. > > Note that they require a memory barrier (rmb()) between the time the > item is removed from the queue and the time that the data in the item > is referenced, despite the fact that there is a data dependency between > the dequeueing and the dereferencing. So, again, data dependency does > -not- substitute for an MB on Alpha. This is very explicit indeed. In short I guess what happens is that the reader may have old data in its cache, and the rmb() makes sure the cache used after the rmb() is not older than the cache used before the rmb(). However the more I think about it the more I suspect we'd better use rmb() in all readers in the common code, rather than defining rmbdd with IPI on alpha, maybe alpha after all isn't the only one that needs the rmb() but it's the only one defining the behaviour in detail? I can very well imagine other archs also not invalidating all caches of all other cpus after a write followed by wmb, the synchronization can be delayed safely if the other cpus are readers, and they didn't issued an explicit rmb. So what alpha is doing can really be seen as a "feature" that allows to delay synchronization of caches after writes and wmb, unless an explicit order is requested by the reader via rmb (or better mb on alpha). The fact is that if there's old data in cache, a data dependency isn't different than non data dependency if the cpu caches aren't synchronized or invalidated during wmb run by the producer. Andrea ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-10 16:58 ` Andrea Arcangeli @ 2001-10-10 17:25 ` Linus Torvalds 2001-10-12 5:06 ` Rusty Russell 0 siblings, 1 reply; 23+ messages in thread From: Linus Torvalds @ 2001-10-10 17:25 UTC (permalink / raw) To: linux-kernel In article <20011010185848.D726@athlon.random>, Andrea Arcangeli <andrea@suse.de> wrote: > >However the more I think about it the more I suspect we'd better use >rmb() in all readers in the common code Absolutely. It's not that expensive an operation on sane hardware. And it's definitely conceptually the only right thing to do - we're saying that we're doing a read that depends on a previous read having seen previous memory. Ergo, "rmb()". Of course, right now Linux only exports a subset of the potential memory barriers, and maybe we should export a fuller set - allowing CPU's that have stricter ordering to possibly make it a no-op. But thinking about even something like x86, I don't see where Intel would guarantee that two reads (data-dependent or not) would have some implicit memory ordering. Re-ordering reads with data dependencies is hard, but it happens quite naturally in a CPU that does address speculation. I don't know of anybody who does that, but I bet _somebody_ will. Maybe even the P4? Linus ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-10 17:25 ` Linus Torvalds @ 2001-10-12 5:06 ` Rusty Russell 2001-10-12 16:28 ` Linus Torvalds 0 siblings, 1 reply; 23+ messages in thread From: Rusty Russell @ 2001-10-12 5:06 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Wed, 10 Oct 2001 17:25:22 +0000 (UTC) torvalds@transmeta.com (Linus Torvalds) wrote: > Absolutely. It's not that expensive an operation on sane hardware. And > it's definitely conceptually the only right thing to do - we're saying > that we're doing a read that depends on a previous read having seen > previous memory. Ergo, "rmb()". Accessing pointer contents after you dereference the pointer is "obvious": we've been trying to get Richard to understand the problem for FIVE MONTHS, and he's not stupid! The PPC manual (thanks Paul M) clearly indicates rmbdd() is not neccessary. That they mention it explicitly suggests it's going to happen on more architectures: you are correct, we should sprinkle rmbdd() everywhere (rmb() is heavy on current PPC) and I'll update the Kernel Locking Guide now the rules have changed.[1] Rusty. [1] Aren't we lucky our documentation is so sparse noone can accuse us of being inconsistent? 8) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-12 5:06 ` Rusty Russell @ 2001-10-12 16:28 ` Linus Torvalds 2001-10-12 19:50 ` Al Dunsmuir ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Linus Torvalds @ 2001-10-12 16:28 UTC (permalink / raw) To: Rusty Russell; +Cc: linux-kernel On Fri, 12 Oct 2001, Rusty Russell wrote: > > The PPC manual (thanks Paul M) clearly indicates rmbdd() is not neccessary. > That they mention it explicitly suggests it's going to happen on more > architectures: you are correct, we should sprinkle rmbdd() everywhere > (rmb() is heavy on current PPC) and I'll update the Kernel Locking Guide now > the rules have changed.[1] I would suggest re-naming "rmbdd()". I _assume_ that "dd" stands for "data dependent", but quite frankly, "rmbdd" looks like the standard IBM "we lost every vowel ever invented" kind of assembly lanaguage to me. I'm sure that having programmed PPC assembly language, you find it very natural (IBM motto: "We found five vowels hiding in a corner, and we used them _all_ for the 'eieio' instruction so that we wouldn't have to use them anywhere else"). But for us normal people, contractions that have more than 3 letters are hard to remember. I wouldn't mind making the other memory barriers more descriptive either, but with something like "mb()" at least you don't have to look five times to make sure you got all the letters in the right order.. (IBM motto: "If you can't read our assembly language, you must be borderline dyslexic, and we don't want you to mess with it anyway"). So how about just going all the way and calling it what it is: "data_dependent_read_barrier()", with a notice in the PPC docs about how the PPC cannot speculate reads anyway, so it's a no-op. (IBM motto: "TEN vowels? Don't you know vowels are scrd?") And hey, if you want to, feel free to create the regular #define read_barrier() rmb() #define write_barrier() wmb() #define memory_barrier() mb() too. It's not as if we should ever have that many of them, and when we _do_ have them, they might as well stand out to make it clear that we're doing subtle things.. Although that "data-dependent read barrier" is a lot more subtle than most. Linus ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-12 16:28 ` Linus Torvalds @ 2001-10-12 19:50 ` Al Dunsmuir 2001-10-13 1:07 ` Paul Mackerras 2001-10-13 7:38 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Rusty Russell 2 siblings, 0 replies; 23+ messages in thread From: Al Dunsmuir @ 2001-10-12 19:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel@vger.kernel.org On Fri, 12 Oct 2001 09:28:07 -0700 (PDT), Linus Torvalds wrote: >I'm sure that having programmed PPC assembly language, you find it very >natural (IBM motto: "We found five vowels hiding in a corner, and we used >them _all_ for the 'eieio' instruction so that we wouldn't have to use >them anywhere else"). More likely a bad pun on the children's song "Old McDonald had a farm". I have no insider knowledge, but I've worked for IBM (on S/390) for 22 years, _am_ dyslexic, and enjoy bad puns <GRIN> Al Dunsmuir ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-12 16:28 ` Linus Torvalds 2001-10-12 19:50 ` Al Dunsmuir @ 2001-10-13 1:07 ` Paul Mackerras 2001-10-13 1:54 ` Davide Libenzi 2001-10-13 2:00 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Linus Torvalds 2001-10-13 7:38 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Rusty Russell 2 siblings, 2 replies; 23+ messages in thread From: Paul Mackerras @ 2001-10-13 1:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Linus Torvalds writes: > So how about just going all the way and calling it what it is: > "data_dependent_read_barrier()", with a notice in the PPC docs about how > the PPC cannot speculate reads anyway, so it's a no-op. To set the record straight, the PPC architecture spec says that implementations *can* speculate reads, but they have to make it look as if dependent loads have a read barrier between them. And it isn't speculated reads that are the problem in the alpha case, it's the fact that the cache can reorder invalidations that are received from the bus. That's why you can read the new value of p but the old value of *p on one processor after another processor has just done something like a = 1; wmb(); p = &a. My impression from what Paul McKenney was saying was that on most modern architectures other than alpha, dependent loads act as if they have a read barrier between them. What we need to know is which architectures specify that behaviour in their architecture spec, as against those which do that today but which might not do it tomorrow. I just looked at the SPARC V9 specification; it has a formal definition of the memory model which precludes reordering dependent loads (once again this is an "as if" rule). So on ppc and sparc64 we have an assurance that we won't need an rmb() between dependent loads in the future. As for intel x86, is there a architecture spec that talks about things like memory ordering? My impression is that the x86 architecture is pretty much defined by its implementations but I could well be wrong. > too. It's not as if we should ever have that many of them, and when we > _do_ have them, they might as well stand out to make it clear that we're > doing subtle things.. Although that "data-dependent read barrier" is a lot > more subtle than most. Indeed... getting the new p but the old *p is quite counter-intuitive IMHO. Paul. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-13 1:07 ` Paul Mackerras @ 2001-10-13 1:54 ` Davide Libenzi 2001-10-13 2:04 ` Linus Torvalds 2001-10-13 14:11 ` Re: RFC: patch to allow lock-free traversal of lists Paul E. McKenney 2001-10-13 2:00 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Linus Torvalds 1 sibling, 2 replies; 23+ messages in thread From: Davide Libenzi @ 2001-10-13 1:54 UTC (permalink / raw) To: Paul Mackerras; +Cc: Linus Torvalds, linux-kernel On Sat, 13 Oct 2001, Paul Mackerras wrote: > Linus Torvalds writes: > > > So how about just going all the way and calling it what it is: > > "data_dependent_read_barrier()", with a notice in the PPC docs about how > > the PPC cannot speculate reads anyway, so it's a no-op. > > To set the record straight, the PPC architecture spec says that > implementations *can* speculate reads, but they have to make it look > as if dependent loads have a read barrier between them. > > And it isn't speculated reads that are the problem in the alpha case, > it's the fact that the cache can reorder invalidations that are > received from the bus. That's why you can read the new value of p but > the old value of *p on one processor after another processor has just > done something like a = 1; wmb(); p = &a. I think that necessary condition to have a reordering of bus sent invalidations is to have a partitioned cache architecture. I don't see any valid reason for a cache controller of a linear cache architecture to reorder an invalidation stream coming from a single cpu. If the invalidation sequence for cpu1 ( in a linear cache architecture ) is 1a, 1b, 1c, ... this case can happen : 1a, 2a, 1b, 2b, 2c, 1c, ... | | | ......................... but not this : 1a, 2a, 2b, 1c, 2c, 1b, ... | | | ...............><........ > My impression from what Paul McKenney was saying was that on most > modern architectures other than alpha, dependent loads act as if they > have a read barrier between them. What we need to know is which > architectures specify that behaviour in their architecture spec, as > against those which do that today but which might not do it tomorrow. Uhmm, an architecture that with a = *p; schedule the load of *p before the load of p sounds screwy to me. The problem is that even if cpu1 schedule the load of p before the load of *p and cpu2 does a = 1; wmb(); p = &a; , it could happen that even if from cpu2 the invalidation stream exit in order, cpu1 could see the value of p before the value of *p due a reordering done by the cache controller delivering the stream to cpu1. - Davide ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-13 1:54 ` Davide Libenzi @ 2001-10-13 2:04 ` Linus Torvalds 2001-10-13 2:31 ` Davide Libenzi 2001-10-13 2:49 ` Paul Mackerras 2001-10-13 14:11 ` Re: RFC: patch to allow lock-free traversal of lists Paul E. McKenney 1 sibling, 2 replies; 23+ messages in thread From: Linus Torvalds @ 2001-10-13 2:04 UTC (permalink / raw) To: Davide Libenzi; +Cc: Paul Mackerras, linux-kernel On Fri, 12 Oct 2001, Davide Libenzi wrote: > > The problem is that even if cpu1 schedule the load of p before the > load of *p and cpu2 does a = 1; wmb(); p = &a; , it could happen that > even if from cpu2 the invalidation stream exit in order, cpu1 could see > the value of p before the value of *p due a reordering done by the > cache controller delivering the stream to cpu1. Umm - if that happens, your cache controller isn't honouring the wmb(), and you have problems quite regardless of any load ordering on _any_ CPU. Ehh? Linus ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-13 2:04 ` Linus Torvalds @ 2001-10-13 2:31 ` Davide Libenzi 2001-10-13 2:46 ` Davide Libenzi 2001-10-13 3:30 ` Linus Torvalds 2001-10-13 2:49 ` Paul Mackerras 1 sibling, 2 replies; 23+ messages in thread From: Davide Libenzi @ 2001-10-13 2:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: Davide Libenzi, Paul Mackerras, linux-kernel On Fri, 12 Oct 2001, Linus Torvalds wrote: > > On Fri, 12 Oct 2001, Davide Libenzi wrote: > > > > The problem is that even if cpu1 schedule the load of p before the > > load of *p and cpu2 does a = 1; wmb(); p = &a; , it could happen that > > even if from cpu2 the invalidation stream exit in order, cpu1 could see > > the value of p before the value of *p due a reordering done by the > > cache controller delivering the stream to cpu1. > > Umm - if that happens, your cache controller isn't honouring the wmb(), > and you have problems quite regardless of any load ordering on _any_ CPU. > > Ehh? I'm searching the hp-parisc doc about it but it seems that even Paul McKenney pointed out this. Suppose that p and *p are on two different cache partitions and the invalidation order that comes from the wmb() cpu is 1) *p 2) p Suppose that the partition when *p lies is damn busy and the one where p lies is free. The reader cpu could pickup the value of p before the value of *p by reading the old value of a The barrier on the reader side should establish a checkpoint that enforce the commit of *p before p - Davide ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-13 2:31 ` Davide Libenzi @ 2001-10-13 2:46 ` Davide Libenzi 2001-10-13 3:30 ` Linus Torvalds 1 sibling, 0 replies; 23+ messages in thread From: Davide Libenzi @ 2001-10-13 2:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paul Mackerras, lkml On Fri, 12 Oct 2001, Davide Libenzi wrote: > On Fri, 12 Oct 2001, Linus Torvalds wrote: > > > > > On Fri, 12 Oct 2001, Davide Libenzi wrote: > > > > > > The problem is that even if cpu1 schedule the load of p before the > > > load of *p and cpu2 does a = 1; wmb(); p = &a; , it could happen that > > > even if from cpu2 the invalidation stream exit in order, cpu1 could see > > > the value of p before the value of *p due a reordering done by the > > > cache controller delivering the stream to cpu1. > > > > Umm - if that happens, your cache controller isn't honouring the wmb(), > > and you have problems quite regardless of any load ordering on _any_ CPU. > > > > Ehh? > > I'm searching the hp-parisc doc about it but it seems that even Paul > McKenney pointed out this. ops, alpha - Davide ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-13 2:31 ` Davide Libenzi 2001-10-13 2:46 ` Davide Libenzi @ 2001-10-13 3:30 ` Linus Torvalds 1 sibling, 0 replies; 23+ messages in thread From: Linus Torvalds @ 2001-10-13 3:30 UTC (permalink / raw) To: Davide Libenzi; +Cc: Paul Mackerras, linux-kernel On Fri, 12 Oct 2001, Davide Libenzi wrote: > > Suppose that p and *p are on two different cache partitions and the > invalidation order that comes from the wmb() cpu is 1) *p 2) p > Suppose that the partition when *p lies is damn busy and the one where > p lies is free. > The reader cpu could pickup the value of p before the value of *p by > reading the old value of a Ahh.. I misunderstood. You are arguing for the rmb() even if the CPU doesn't speculate addresses for loads. Yes, I agree. Linus ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-13 2:04 ` Linus Torvalds 2001-10-13 2:31 ` Davide Libenzi @ 2001-10-13 2:49 ` Paul Mackerras 1 sibling, 0 replies; 23+ messages in thread From: Paul Mackerras @ 2001-10-13 2:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Linus Torvalds writes: > On Fri, 12 Oct 2001, Davide Libenzi wrote: > > > > The problem is that even if cpu1 schedule the load of p before the > > load of *p and cpu2 does a = 1; wmb(); p = &a; , it could happen that > > even if from cpu2 the invalidation stream exit in order, cpu1 could see > > the value of p before the value of *p due a reordering done by the > > cache controller delivering the stream to cpu1. > > Umm - if that happens, your cache controller isn't honouring the wmb(), > and you have problems quite regardless of any load ordering on _any_ CPU. Well yes. But this is what happens on alpha apparently. On alpha, it seems that the wmb only has an effect on the cache of the processor doing the writes, it doesn't affect any other caches. The wmb ensures that the invalidates from the two writes go out onto the bus in the right order, but the wmb itself doesn't go on the bus. Thus the invalidates can get processed in the reverse order by a receiving cache. I presume that an rmb() on alpha must wait for all outstanding invalidates to be processed by the cache. But I'm not an expert on the alpha by any means. Paul. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re: RFC: patch to allow lock-free traversal of lists 2001-10-13 1:54 ` Davide Libenzi 2001-10-13 2:04 ` Linus Torvalds @ 2001-10-13 14:11 ` Paul E. McKenney 1 sibling, 0 replies; 23+ messages in thread From: Paul E. McKenney @ 2001-10-13 14:11 UTC (permalink / raw) To: Davide Libenzi; +Cc: Paul Mackerras, Linus Torvalds, linux-kernel >> To set the record straight, the PPC architecture spec says that >> implementations *can* speculate reads, but they have to make it look >> as if dependent loads have a read barrier between them. >> >> And it isn't speculated reads that are the problem in the alpha case, >> it's the fact that the cache can reorder invalidations that are >> received from the bus. That's why you can read the new value of p but >> the old value of *p on one processor after another processor has just >> done something like a = 1; wmb(); p = &a. > >I think that necessary condition to have a reordering of bus sent >invalidations is to have a partitioned cache architecture. >I don't see any valid reason for a cache controller of a linear cache >architecture to reorder an invalidation stream coming from a single cpu. >If the invalidation sequence for cpu1 ( in a linear cache architecture ) >is 1a, 1b, 1c, ... this case can happen : > >1a, 2a, 1b, 2b, 2c, 1c, ... >| | | >......................... > >but not this : > >1a, 2a, 2b, 1c, 2c, 1b, ... >| | | >...............><........ A split cache is certainly one way to get the updates reordered. It is not the only one. As Linus pointed out earlier, value speculation can have the same effect (though I do not believe that value speculation is in the cards for the i386 architecture). >> My impression from what Paul McKenney was saying was that on most >> modern architectures other than alpha, dependent loads act as if they >> have a read barrier between them. What we need to know is which >> architectures specify that behaviour in their architecture spec, as >> against those which do that today but which might not do it tomorrow. > >Uhmm, an architecture that with a = *p; schedule the load of *p before >the load of p sounds screwy to me. Agreed, but value speculation could do just that. However, value speculation would be handled correctly by rmbdd() and by wmbdd(), though the latter case relies on precise interrupts. >The problem is that even if cpu1 schedule the load of p before the >load of *p and cpu2 does a = 1; wmb(); p = &a; , it could happen that >even if from cpu2 the invalidation stream exit in order, cpu1 could see >the value of p before the value of *p due a reordering done by the >cache controller delivering the stream to cpu1. Yep! I did not create Alpha, I am just trying to help everyone understand how to live with it. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-13 1:07 ` Paul Mackerras 2001-10-13 1:54 ` Davide Libenzi @ 2001-10-13 2:00 ` Linus Torvalds 2001-10-13 13:54 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists Alan Cox 1 sibling, 1 reply; 23+ messages in thread From: Linus Torvalds @ 2001-10-13 2:00 UTC (permalink / raw) To: Paul Mackerras; +Cc: linux-kernel On Sat, 13 Oct 2001, Paul Mackerras wrote: > > As for intel x86, is there a architecture spec that talks about things > like memory ordering? My impression is that the x86 architecture is > pretty much defined by its implementations but I could well be wrong. It is discussed in the multi-procesor management section, under "memory ordering", and it does say that "reads can be carried out specilatively and in any order". HOWEVER, it does have what Intel calles "processor order", and it claims that "writes by a single processor are observed in the same order by all processors." Which implies that if the _same_ CPU wrote *p and p, there is apparently an ordering constraint there. But I don't think that's the case here. So the x86 apparently needs a rmb(). (But Intel has redefined the memory ordering so many times that they might redefine it in the future too and say that dependent loads are ok. I suspect most of the definitions are of the type "Oh, it used to be ok in the implementation even though it wasn't defined, and it turns out that Windows doesn't work if we change it, so we'll define darkness to be the new standard"..) > Indeed... getting the new p but the old *p is quite > counter-intuitive IMHO. I disagree. I think the alpha has it right - it says that if you care about ordering, you have to tell so. No ifs, no buts, no special cases ("Oh, dependent loads are special"). Of course, I used to absolutely _hate_ the x86 ordering rules just because they are so ad-hoc. But I'm getting more and more used to them, and the Intel "write ordered with store-buffer forwarding" model ends up being really nice for locking ;) Linus ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists 2001-10-13 2:00 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Linus Torvalds @ 2001-10-13 13:54 ` Alan Cox 0 siblings, 0 replies; 23+ messages in thread From: Alan Cox @ 2001-10-13 13:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paul Mackerras, linux-kernel > It is discussed in the multi-procesor management section, under "memory > ordering", and it does say that "reads can be carried out specilatively > and in any order". > > HOWEVER, it does have what Intel calles "processor order", and it claims > that "writes by a single processor are observed in the same order by all > processors." The other thing on the intel side is that you have to read the errata documentation. There are an interesting collection of misordering errata. > (But Intel has redefined the memory ordering so many times that they might > redefine it in the future too and say that dependent loads are ok. I > suspect most of the definitions are of the type "Oh, it used to be ok in > the implementation even though it wasn't defined, and it turns out that > Windows doesn't work if we change it, so we'll define darkness to be the > new standard"..) Its notable that the folks who did looser ordering x86 clones had MTRRs to enable the performance boost Alan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion 2001-10-12 16:28 ` Linus Torvalds 2001-10-12 19:50 ` Al Dunsmuir 2001-10-13 1:07 ` Paul Mackerras @ 2001-10-13 7:38 ` Rusty Russell 2001-10-13 16:28 ` Re: RFC: patch to allow lock-free traversal of lists with insertion ^M Paul E. McKenney 2 siblings, 1 reply; 23+ messages in thread From: Rusty Russell @ 2001-10-13 7:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: dipankar, linux-kernel, paul.mckenney In message <Pine.LNX.4.33.0110120919130.31677-100000@penguin.transmeta.com> you write: > And hey, if you want to, feel free to create the regular > > #define read_barrier() rmb() > #define write_barrier() wmb() > #define memory_barrier() mb() I agree... read_barrier_depends() then? Rusty. -- Premature optmztion is rt of all evl. --DK ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re: RFC: patch to allow lock-free traversal of lists with insertion ^M 2001-10-13 7:38 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Rusty Russell @ 2001-10-13 16:28 ` Paul E. McKenney 2001-10-13 21:23 ` Rusty Russell 0 siblings, 1 reply; 23+ messages in thread From: Paul E. McKenney @ 2001-10-13 16:28 UTC (permalink / raw) To: Rusty Russell; +Cc: Linus Torvalds, dipankar, linux-kernel, paul.mckenney >In message <Pine.LNX.4.33.0110120919130.31677-100000@penguin.transmeta.com> you > write: >> And hey, if you want to, feel free to create the regular >> >> #define read_barrier() rmb() >> #define write_barrier() wmb() >> #define memory_barrier() mb() > >I agree... read_barrier_depends() then? > >Rusty. >-- >Premature optmztion is rt of all evl. --DK OK, here is an RFC patch with the read_barrier_depends(). (I know that the indentation is messed up, will fix when I add the read_barrier() and friends). Thoughts? Especially from people familiar with MIPS and PA-RISC? Thanx, Paul diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-alpha/system.h linux-2.4.10.read_barrier_depends/include/asm-alpha/system.h --- linux-2.4.10/include/asm-alpha/system.h Sun Aug 12 10:38:47 2001 +++ linux-2.4.10.read_barrier_depends/include/asm-alpha/system.h Sat Oct 13 08:40:34 2001 @@ -148,16 +148,21 @@ #define rmb() \ __asm__ __volatile__("mb": : :"memory") +#define read_barrier_depends() \ +__asm__ __volatile__("mb": : :"memory") + #define wmb() \ __asm__ __volatile__("wmb": : :"memory") #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_read_barrier_depends() read_barrier_depends() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_read_barrier_depends() barrier() #define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-arm/system.h linux-2.4.10.read_barrier_depends/include/asm-arm/system.h --- linux-2.4.10/include/asm-arm/system.h Mon Nov 27 17:07:59 2000 +++ linux-2.4.10.read_barrier_depends/include/asm-arm/system.h Sat Oct 13 08:40:40 2001 @@ -38,6 +38,7 @@ #define mb() __asm__ __volatile__ ("" : : : "memory") #define rmb() mb() +#define read_barrier_depends() do { } while(0) #define wmb() mb() #define nop() __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t"); @@ -67,12 +68,14 @@ #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_read_barrier_depends() read_barrier_depends() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_read_barrier_depends() do { } while(0) #define smp_wmb() barrier() #define cli() __cli() diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-cris/system.h linux-2.4.10.read_barrier_depends/include/asm-cris/system.h --- linux-2.4.10/include/asm-cris/system.h Tue May 1 16:05:00 2001 +++ linux-2.4.10.read_barrier_depends/include/asm-cris/system.h Sat Oct 13 08:40:48 2001 @@ -143,15 +143,18 @@ #define mb() __asm__ __volatile__ ("" : : : "memory") #define rmb() mb() +#define read_barrier_depends() do { } while(0) #define wmb() mb() #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_read_barrier_depends() read_barrier_depends() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_read_barrier_depends() do { } while(0) #define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-i386/system.h linux-2.4.10.read_barrier_depends/include/asm-i386/system.h --- linux-2.4.10/include/asm-i386/system.h Sun Sep 23 10:31:01 2001 +++ linux-2.4.10.read_barrier_depends/include/asm-i386/system.h Sat Oct 13 08:40:54 2001 @@ -284,15 +284,18 @@ */ #define mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)": : :"memory") #define rmb() mb() +#define read_barrier_depends() do { } while(0) #define wmb() __asm__ __volatile__ ("": : :"memory") #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_read_barrier_depends() read_barrier_depends() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_read_barrier_depends() do { } while(0) #define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-ia64/system.h linux-2.4.10.read_barrier_depends/include/asm-ia64/system.h --- linux-2.4.10/include/asm-ia64/system.h Tue Jul 31 10:30:09 2001 +++ linux-2.4.10.read_barrier_depends/include/asm-ia64/system.h Sat Oct 13 08:55:13 2001 @@ -85,6 +85,9 @@ * stores and that all following stores will be * visible only after all previous stores. * rmb(): Like wmb(), but for reads. + * read_barrier_depends(): Like rmb(), but only for pairs + * of loads where the second load depends on the + * value loaded by the first. * mb(): wmb()/rmb() combo, i.e., all previous memory * accesses are visible before all subsequent * accesses and vice versa. This is also known as @@ -98,15 +101,18 @@ */ #define mb() __asm__ __volatile__ ("mf" ::: "memory") #define rmb() mb() +#define read_barrier_depends() do { } while(0) #define wmb() mb() #ifdef CONFIG_SMP # define smp_mb() mb() # define smp_rmb() rmb() +# define smp_read_barrier_depends() read_barrier_depends() # define smp_wmb() wmb() #else # define smp_mb() barrier() # define smp_rmb() barrier() +# define smp_read_barrier_depends() do { } while(0) # define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-m68k/system.h linux-2.4.10.read_barrier_depends/include/asm-m68k/system.h --- linux-2.4.10/include/asm-m68k/system.h Mon Jun 11 19:15:27 2001 +++ linux-2.4.10.read_barrier_depends/include/asm-m68k/system.h Sat Oct 13 08:41:08 2001 @@ -80,12 +80,14 @@ #define nop() do { asm volatile ("nop"); barrier(); } while (0) #define mb() barrier() #define rmb() barrier() +#define read_barrier_depends() do { } while(0) #define wmb() barrier() #define set_mb(var, value) do { xchg(&var, value); } while (0) #define set_wmb(var, value) do { var = value; wmb(); } while (0) #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_read_barrier_depends() do { } while(0) #define smp_wmb() barrier() diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-mips/system.h linux-2.4.10.read_barrier_depends/include/asm-mips/system.h --- linux-2.4.10/include/asm-mips/system.h Sun Sep 9 10:43:01 2001 +++ linux-2.4.10.read_barrier_depends/include/asm-mips/system.h Sat Oct 13 08:41:14 2001 @@ -150,6 +150,7 @@ #include <asm/wbflush.h> #define rmb() do { } while(0) +#define read_barrier_depends() do { } while(0) #define wmb() wbflush() #define mb() wbflush() @@ -166,6 +167,7 @@ : /* no input */ \ : "memory") #define rmb() mb() +#define read_barrier_depends() do { } while(0) #define wmb() mb() #endif /* CONFIG_CPU_HAS_WB */ @@ -173,10 +175,12 @@ #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_read_barrier_depends() read_barrier_depends() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_read_barrier_depends() do { } while(0) #define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-mips64/system.h linux-2.4.10.read_barrier_depends/include/asm-mips64/system.h --- linux-2.4.10/include/asm-mips64/system.h Wed Jul 4 11:50:39 2001 +++ linux-2.4.10.read_barrier_depends/include/asm-mips64/system.h Sat Oct 13 08:41:22 2001 @@ -147,15 +147,18 @@ : /* no input */ \ : "memory") #define rmb() mb() +#define read_barrier_depends() do { } while(0) #define wmb() mb() #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_read_barrier_depends() read_barrier_depends() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_read_barrier_depends() do { } while(0) #define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-parisc/system.h linux-2.4.10.read_barrier_depends/include/asm-parisc/system.h --- linux-2.4.10/include/asm-parisc/system.h Wed Dec 6 11:46:39 2000 +++ linux-2.4.10.read_barrier_depends/include/asm-parisc/system.h Sat Oct 13 08:41:28 2001 @@ -50,6 +50,7 @@ #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_read_barrier_depends() do { } while(0) #define smp_wmb() wmb() #else /* This is simply the barrier() macro from linux/kernel.h but when serial.c @@ -58,6 +59,7 @@ */ #define smp_mb() __asm__ __volatile__("":::"memory"); #define smp_rmb() __asm__ __volatile__("":::"memory"); +#define smp_read_barrier_depends() do { } while(0) #define smp_wmb() __asm__ __volatile__("":::"memory"); #endif @@ -122,6 +124,7 @@ #define mb() __asm__ __volatile__ ("sync" : : :"memory") #define wmb() mb() +#define read_barrier_depends() do { } while(0) extern unsigned long __xchg(unsigned long, unsigned long *, int); diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-ppc/system.h linux-2.4.10.read_barrier_depends/include/asm-ppc/system.h --- linux-2.4.10/include/asm-ppc/system.h Tue Aug 28 06:58:33 2001 +++ linux-2.4.10.read_barrier_depends/include/asm-ppc/system.h Sat Oct 13 08:41:56 2001 @@ -24,6 +24,8 @@ * * mb() prevents loads and stores being reordered across this point. * rmb() prevents loads being reordered across this point. + * read_barrier_depends() prevents data-dependant loads being reordered + * across this point (nop on PPC). * wmb() prevents stores being reordered across this point. * * We can use the eieio instruction for wmb, but since it doesn't @@ -32,6 +34,7 @@ */ #define mb() __asm__ __volatile__ ("sync" : : : "memory") #define rmb() __asm__ __volatile__ ("sync" : : : "memory") +#define read_barrier_depends() do { } while(0) #define wmb() __asm__ __volatile__ ("eieio" : : : "memory") #define set_mb(var, value) do { var = value; mb(); } while (0) @@ -40,10 +43,12 @@ #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_read_barrier_depends() read_barrier_depends() #define smp_wmb() wmb() #else #define smp_mb() __asm__ __volatile__("": : :"memory") #define smp_rmb() __asm__ __volatile__("": : :"memory") +#define smp_read_barrier_depends() do { } while(0) #define smp_wmb() __asm__ __volatile__("": : :"memory") #endif /* CONFIG_SMP */ diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-s390/system.h linux-2.4.10.read_barrier_depends/include/asm-s390/system.h --- linux-2.4.10/include/asm-s390/system.h Wed Jul 25 14:12:02 2001 +++ linux-2.4.10.read_barrier_depends/include/asm-s390/system.h Sat Oct 13 08:42:08 2001 @@ -117,9 +117,11 @@ # define SYNC_OTHER_CORES(x) eieio() #define mb() eieio() #define rmb() eieio() +#define read_barrier_depends() do { } while(0) #define wmb() eieio() #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_read_barrier_depends() read_barrier_depends() #define smp_wmb() wmb() #define smp_mb__before_clear_bit() smp_mb() #define smp_mb__after_clear_bit() smp_mb() diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-s390x/system.h linux-2.4.10.read_barrier_depends/include/asm-s390x/system.h --- linux-2.4.10/include/asm-s390x/system.h Wed Jul 25 14:12:03 2001 +++ linux-2.4.10.read_barrier_depends/include/asm-s390x/system.h Sat Oct 13 08:42:14 2001 @@ -130,9 +130,11 @@ # define SYNC_OTHER_CORES(x) eieio() #define mb() eieio() #define rmb() eieio() +#define read_barrier_depends() do { } while(0) #define wmb() eieio() #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_read_barrier_depends() read_barrier_depends() #define smp_wmb() wmb() #define smp_mb__before_clear_bit() smp_mb() #define smp_mb__after_clear_bit() smp_mb() diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sh/system.h linux-2.4.10.read_barrier_depends/include/asm-sh/system.h --- linux-2.4.10/include/asm-sh/system.h Sat Sep 8 12:29:09 2001 +++ linux-2.4.10.read_barrier_depends/include/asm-sh/system.h Sat Oct 13 08:43:04 2001 @@ -88,15 +88,18 @@ #define mb() __asm__ __volatile__ ("": : :"memory") #define rmb() mb() +#define read_barrier_depends() do { } while(0) #define wmb() __asm__ __volatile__ ("": : :"memory") #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_read_barrier_depends() read_barrier_depends() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_read_barrier_depends() do { } while(0) #define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sparc/system.h linux-2.4.10.read_barrier_depends/include/asm-sparc/system.h --- linux-2.4.10/include/asm-sparc/system.h Tue Oct 3 09:24:41 2000 +++ linux-2.4.10.read_barrier_depends/include/asm-sparc/system.h Sat Oct 13 08:43:09 2001 @@ -277,11 +277,13 @@ /* XXX Change this if we ever use a PSO mode kernel. */ #define mb() __asm__ __volatile__ ("" : : : "memory") #define rmb() mb() +#define read_barrier_depends() do { } while(0) #define wmb() mb() #define set_mb(__var, __value) do { __var = __value; mb(); } while(0) #define set_wmb(__var, __value) set_mb(__var, __value) #define smp_mb() __asm__ __volatile__("":::"memory"); #define smp_rmb() __asm__ __volatile__("":::"memory"); +#define smp_read_barrier_depends() do { } while(0) #define smp_wmb() __asm__ __volatile__("":::"memory"); #define nop() __asm__ __volatile__ ("nop"); diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sparc64/system.h linux-2.4.10.read_barrier_depends/include/asm-sparc64/system.h --- linux-2.4.10/include/asm-sparc64/system.h Fri Sep 7 11:01:20 2001 +++ linux-2.4.10.read_barrier_depends/include/asm-sparc64/system.h Sat Oct 13 08:43:14 2001 @@ -99,6 +99,7 @@ #define mb() \ membar("#LoadLoad | #LoadStore | #StoreStore | #StoreLoad"); #define rmb() membar("#LoadLoad") +#define read_barrier_depends() do { } while(0) #define wmb() membar("#StoreStore") #define set_mb(__var, __value) \ do { __var = __value; membar("#StoreLoad | #StoreStore"); } while(0) @@ -108,10 +109,12 @@ #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_read_barrier_depends() read_barrier_depends() #define smp_wmb() wmb() #else #define smp_mb() __asm__ __volatile__("":::"memory"); #define smp_rmb() __asm__ __volatile__("":::"memory"); +#define smp_read_barrier_depends() do { } while(0) #define smp_wmb() __asm__ __volatile__("":::"memory"); #endif ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: patch to allow lock-free traversal of lists with insertion ^M 2001-10-13 16:28 ` Re: RFC: patch to allow lock-free traversal of lists with insertion ^M Paul E. McKenney @ 2001-10-13 21:23 ` Rusty Russell 0 siblings, 0 replies; 23+ messages in thread From: Rusty Russell @ 2001-10-13 21:23 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Linus Torvalds, dipankar, linux-kernel In message <200110131628.f9DGSGW09534@eng4.beaverton.ibm.com> you write: > OK, here is an RFC patch with the read_barrier_depends(). (I know that > the indentation is messed up, will fix when I add the read_barrier() > and friends). Yep, this is a 2.5 thing: we should probably combine change the barriers so read_barrier et. al. only have effect on SMP, and then have io_read_barrier for IO (these can be usefully differentiated on PPC IIUC). The stillborn smp_mb() etc. can then be abandoned. Rusty. -- Premature optmztion is rt of all evl. --DK ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with 2001-10-10 15:24 [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Paul McKenney 2001-10-10 16:58 ` Andrea Arcangeli @ 2001-10-12 5:43 ` Albert D. Cahalan 2001-10-12 6:26 ` Paul Mackerras 2001-10-12 8:28 ` Alex Bligh - linux-kernel 1 sibling, 2 replies; 23+ messages in thread From: Albert D. Cahalan @ 2001-10-12 5:43 UTC (permalink / raw) To: Paul McKenney Cc: Andrea Arcangeli, Peter Rival, Ivan Kokshaysky, Jay.Estabrook, linux-kernel, lse-tech, Richard Henderson, cardoza, woodward Paul McKenney writes: > In the meantime, Compaq's patent #6,108,737 leads me to believe that > others in DEC/Compaq also believe it to be a feature. The paragraph > starting at column 2 line 20 of the body of this patent states: > > In a weakly-ordered system, an order is imposed between selected > sets of memory reference operations, while other operations are > considered unordered. One or more MB instructions are used to > indicate the required order. In the case of an MB instruction > defined by the Alpha (R) 21264 processor instruction set, the MB > denotes that all memory reference instructions above the MB (i.e., > pre-MB instructions) are ordered before all reference instructions > after the MB (i.e., post-MB instructions). However, no order > is required between reference instructions that are not separated > by an MB. > > (The patent talks about the WMB instruction later on.) > > In other words, if there is no MB, the CPU is not required to maintain > ordering. Regardless of data dependencies or anything else. > > There is also an application note at > > http://www.openvms.compaq.com/wizard/wiz_2637.html > > which states: > > For instance, your producer must issue a "memory barrier" instruction > after writing the data to shared memory and before inserting it on > the queue; likewise, your consumer must issue a memory barrier > instruction after removing an item from the queue and before reading > from its memory. Otherwise, you risk seeing stale data, since, > while the Alpha processor does provide coherent memory, it does > not provide implicit ordering of reads and writes. (That is, the > write of the producer's data might reach memory after the write of > the queue, such that the consumer might read the new item from the > queue but get the previous values from the item's memory. > > Note that they require a memory barrier (rmb()) between the time the > item is removed from the queue and the time that the data in the item > is referenced, despite the fact that there is a data dependency between > the dequeueing and the dereferencing. So, again, data dependency does > -not- substitute for an MB on Alpha. This looks an awful lot like the PowerPC architecture. In an SMP system, one would most likely mark pages as requiring coherency. This means that stores to a memory location from multiple processors will give sane results. Ordering is undefined when multiple memory locations are involved. There is a memory barrier instruction called "eieio". This is commonly used for IO, but is also useful for RAM. Two separate sets of memory operations are simultaneously and independently affected by eieio: -- set one, generally memory-mapped IO space -- loads to uncached + guarded memory stores to uncached + guarded memory stores to write-through-required memory -- set two, generally RAM on an SMP box -- stores to cached + write-back + coherent "The eieio instruction is intended for use in managing shared data structures ... the shared data structure and the lock that protects it must be altered only by stores that are in the same set" -- from the 32-bit ppc arch book ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with 2001-10-12 5:43 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with Albert D. Cahalan @ 2001-10-12 6:26 ` Paul Mackerras 2001-10-12 8:28 ` Alex Bligh - linux-kernel 1 sibling, 0 replies; 23+ messages in thread From: Paul Mackerras @ 2001-10-12 6:26 UTC (permalink / raw) To: Albert D. Cahalan Cc: Paul McKenney, Andrea Arcangeli, Peter Rival, Ivan Kokshaysky, Jay.Estabrook, linux-kernel, lse-tech, Richard Henderson, cardoza, woodward Albert D. Cahalan writes: > This looks an awful lot like the PowerPC architecture. > > In an SMP system, one would most likely mark pages as > requiring coherency. This means that stores to a memory > location from multiple processors will give sane results. > Ordering is undefined when multiple memory locations are > involved. The current PowerPC Architecture spec actually has a paragraph that says that where a processor does two loads and the second load depends on the first (i.e. the result from the first load is used in computing the address for the second load), that they have to be performed in program order with respect to other processors. In other cases you do need a barrier as you say. This constraint has evidently been added since the original PPC architecture book was published. I strongly doubt that any of the older PPC implementations would violate that constraint though. Paul. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with 2001-10-12 5:43 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with Albert D. Cahalan 2001-10-12 6:26 ` Paul Mackerras @ 2001-10-12 8:28 ` Alex Bligh - linux-kernel 2001-10-12 8:51 ` Jonathan Lundell 1 sibling, 1 reply; 23+ messages in thread From: Alex Bligh - linux-kernel @ 2001-10-12 8:28 UTC (permalink / raw) To: Albert D. Cahalan, Paul McKenney Cc: Andrea Arcangeli, Peter Rival, Ivan Kokshaysky, Jay.Estabrook, linux-kernel, lse-tech, Richard Henderson, cardoza, woodward, Alex Bligh - linux-kernel --On Friday, 12 October, 2001 1:43 AM -0400 "Albert D. Cahalan" <acahalan@cs.uml.edu> wrote: > There is a memory barrier instruction called "eieio" This must be April 1 by some calendar. With a read/write here, a read/write there... (etc.) Guess the engineer was called Mr/Ms McDonald. -- Alex Bligh ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with 2001-10-12 8:28 ` Alex Bligh - linux-kernel @ 2001-10-12 8:51 ` Jonathan Lundell 0 siblings, 0 replies; 23+ messages in thread From: Jonathan Lundell @ 2001-10-12 8:51 UTC (permalink / raw) To: Alex Bligh - linux-kernel, Albert D. Cahalan, Paul McKenney Cc: Andrea Arcangeli, Peter Rival, Ivan Kokshaysky, Jay.Estabrook, linux-kernel, lse-tech, Richard Henderson, cardoza, woodward, Alex Bligh - linux-kernel At 9:28 AM +0100 10/12/01, Alex Bligh - linux-kernel wrote: >>There is a memory barrier instruction called "eieio" > >This must be April 1 by some calendar. PPC "Enforce In-Order Execution of I/O" -- /Jonathan Lundell. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2001-10-13 21:27 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-10-10 15:24 [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Paul McKenney 2001-10-10 16:58 ` Andrea Arcangeli 2001-10-10 17:25 ` Linus Torvalds 2001-10-12 5:06 ` Rusty Russell 2001-10-12 16:28 ` Linus Torvalds 2001-10-12 19:50 ` Al Dunsmuir 2001-10-13 1:07 ` Paul Mackerras 2001-10-13 1:54 ` Davide Libenzi 2001-10-13 2:04 ` Linus Torvalds 2001-10-13 2:31 ` Davide Libenzi 2001-10-13 2:46 ` Davide Libenzi 2001-10-13 3:30 ` Linus Torvalds 2001-10-13 2:49 ` Paul Mackerras 2001-10-13 14:11 ` Re: RFC: patch to allow lock-free traversal of lists Paul E. McKenney 2001-10-13 2:00 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Linus Torvalds 2001-10-13 13:54 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists Alan Cox 2001-10-13 7:38 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Rusty Russell 2001-10-13 16:28 ` Re: RFC: patch to allow lock-free traversal of lists with insertion ^M Paul E. McKenney 2001-10-13 21:23 ` Rusty Russell 2001-10-12 5:43 ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with Albert D. Cahalan 2001-10-12 6:26 ` Paul Mackerras 2001-10-12 8:28 ` Alex Bligh - linux-kernel 2001-10-12 8:51 ` Jonathan Lundell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox