* eieio rule-of-thumb? @ 2002-05-22 4:43 Allen Curtis 2002-05-22 6:01 ` Paul Mackerras 0 siblings, 1 reply; 14+ messages in thread From: Allen Curtis @ 2002-05-22 4:43 UTC (permalink / raw) To: linuxppc-dev I was wondering if anyone had a rule for the use of eieio or sync instructions? After reviewing some of the newer code, it appears that eieio is used fairly often (at least in some code), sync is hardly ever used. Is a consideration for use whether data is being manipulated on different buses? (that would be my assumption) I would assume that operations on the same bus would occur in the proper sequence. TIA ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: eieio rule-of-thumb? 2002-05-22 4:43 eieio rule-of-thumb? Allen Curtis @ 2002-05-22 6:01 ` Paul Mackerras 2002-05-23 2:25 ` Allen Curtis 0 siblings, 1 reply; 14+ messages in thread From: Paul Mackerras @ 2002-05-22 6:01 UTC (permalink / raw) To: acurtis; +Cc: linuxppc-dev Allen Curtis writes: > I was wondering if anyone had a rule for the use of eieio or sync > instructions? After reviewing some of the newer code, it appears that eieio > is used fairly often (at least in some code), sync is hardly ever used. Is a > consideration for use whether data is being manipulated on different buses? > (that would be my assumption) I would assume that operations on the same bus > would occur in the proper sequence. Nope. Loads can (and sometimes do) go ahead of stores. Stores can go ahead of loads as well in some circumstances, for example if the load is non-cacheable and the store hits in the cache. In principle loads can get reordered too. Reordering can happen in the processor's load/store unit and/or in the PCI host bridge. I assume you are mostly concerned with loads and stores to noncacheable addresses (i.e. I/O devices). Non-cacheable stores don't get reordered, and eieio acts as a barrier to make sure that all noncacheable accesses before the eieio are done before any of the noncacheable accesses after the eieio. If you are concerned with cacheable accesses as well you start needing to use sync - it orders all accesses. Unfortunately it is quite an expensive operation since it synchronizes the execution pipeline as well. Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: eieio rule-of-thumb? 2002-05-22 6:01 ` Paul Mackerras @ 2002-05-23 2:25 ` Allen Curtis 2002-05-23 4:26 ` Paul Mackerras 0 siblings, 1 reply; 14+ messages in thread From: Allen Curtis @ 2002-05-23 2:25 UTC (permalink / raw) To: Paul Mackerras; +Cc: linuxppc-dev >Nope. Loads can (and sometimes do) go ahead of stores. Stores can go >ahead of loads as well in some circumstances, for example if the load >is non-cacheable and the store hits in the cache. In principle loads >can get reordered too. Reordering can happen in the processor's >load/store unit and/or in the PCI host bridge. > >I assume you are mostly concerned with loads and stores to >noncacheable addresses (i.e. I/O devices). Non-cacheable stores don't >get reordered, and eieio acts as a barrier to make sure that all >noncacheable accesses before the eieio are done before any of the >noncacheable accesses after the eieio. Could you provide an example of when you should insert this instruction into your code. From you description it is a wonder that "var += 2;" works if I am accessing dual-port on an 8260. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: eieio rule-of-thumb? 2002-05-23 2:25 ` Allen Curtis @ 2002-05-23 4:26 ` Paul Mackerras 2002-05-23 13:38 ` Allen Curtis 0 siblings, 1 reply; 14+ messages in thread From: Paul Mackerras @ 2002-05-23 4:26 UTC (permalink / raw) To: acurtis; +Cc: linuxppc-dev Allen Curtis writes: > Could you provide an example of when you should insert this instruction into > your code. From you description it is a wonder that "var += 2;" works if I > am accessing dual-port on an 8260. That works because the value stored has a data dependency on the value loaded, i.e. the processor can't do the store until it has the value back from the load. In general it is sufficient to do an eieio before each non-cacheable load, and between two non-cacheable stores if you need to prevent any possibility of the two stores being combined into a single transaction (e.g. if you have PCI and a PCI host bridge that can do write combining). At present the I/O macros do an eieio after each non-cacheable load and store, which works but is overkill. In device driver code I much prefer not to see explicit "eieio()" calls unless it is absolutely necessary. There are sets of macros provided for access to I/O devices and it is almost always cleaner to use them instead of explicitly putting "eieio()" in your code. It cuts down on the "Old Macdonald" jokes too. :) For access to PCI devices, use: {in,out}{b,w,l} access to PCI I/O space (little endian) {read,write}{b,w,l} access to PCI memory space (little endian) For access to non-PCI devices on PPC platforms, use: {in,out}_8 {in,out}_{le,be}{16,32} These all take a kernel virtual address, i.e. the result of an ioremap. The little-endian variants use the byte-reversed load and store instructions. Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: eieio rule-of-thumb? 2002-05-23 4:26 ` Paul Mackerras @ 2002-05-23 13:38 ` Allen Curtis 2002-05-23 13:54 ` Dan Brennan 2002-05-23 17:28 ` Dan Malek 0 siblings, 2 replies; 14+ messages in thread From: Allen Curtis @ 2002-05-23 13:38 UTC (permalink / raw) To: Paul Mackerras; +Cc: linuxppc-dev > For access to PCI devices, use: > > {in,out}{b,w,l} access to PCI I/O space (little endian) > {read,write}{b,w,l} access to PCI memory space (little endian) > > For access to non-PCI devices on PPC platforms, use: > > {in,out}_8 > {in,out}_{le,be}{16,32} > All of these make sense, but what about the Internal Memory Map? Are you suggesting that these macros should be used to access internal control structures, buffer descriptors, etc and ignore the structures defined in cpm_8260.h and imap_8260.h? In a typical system, this is where most of the non-cacheable I/O will occur. In many cases you probably do not care what order things happen at the micro level, in some cases you do. If you understand the problem you can optimize your solution, otherwise put the fix everywhere out of paranoia. :) ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: eieio rule-of-thumb? 2002-05-23 13:38 ` Allen Curtis @ 2002-05-23 13:54 ` Dan Brennan 2002-05-23 14:42 ` Allen Curtis 2002-05-23 17:28 ` Dan Malek 1 sibling, 1 reply; 14+ messages in thread From: Dan Brennan @ 2002-05-23 13:54 UTC (permalink / raw) To: acurtis; +Cc: Paul Mackerras, linuxppc-dev If you are using the 8260 the eieio does nothing. From the 603e User's Manual: The Enforce In-Order Execution of I/O (eieio) instruction is used to ensure memory reordering of noncacheable memory access. Since the 603e does not reorder noncacheable memory accesses, the eieio instruction is treated as a no-op. Allen Curtis wrote: > > > For access to PCI devices, use: > > > > {in,out}{b,w,l} access to PCI I/O space (little endian) > > {read,write}{b,w,l} access to PCI memory space (little endian) > > > > For access to non-PCI devices on PPC platforms, use: > > > > {in,out}_8 > > {in,out}_{le,be}{16,32} > > > > All of these make sense, but what about the Internal Memory Map? Are you > suggesting that these macros should be used to access internal control > structures, buffer descriptors, etc and ignore the structures defined in > cpm_8260.h and imap_8260.h? In a typical system, this is where most of the > non-cacheable I/O will occur. In many cases you probably do not care what > order things happen at the micro level, in some cases you do. If you > understand the problem you can optimize your solution, otherwise put the fix > everywhere out of paranoia. :) > ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: eieio rule-of-thumb? 2002-05-23 13:54 ` Dan Brennan @ 2002-05-23 14:42 ` Allen Curtis 0 siblings, 0 replies; 14+ messages in thread From: Allen Curtis @ 2002-05-23 14:42 UTC (permalink / raw) To: Dan Brennan; +Cc: Paul Mackerras, linuxppc-dev > The Enforce In-Order Execution of I/O (eieio) instruction is used to > ensure memory > reordering of noncacheable memory access. Since the 603e does not > reorder > noncacheable memory accesses, the eieio instruction is treated as a > no-op. That make me feel better for my processor anyway.... ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: eieio rule-of-thumb? 2002-05-23 13:38 ` Allen Curtis 2002-05-23 13:54 ` Dan Brennan @ 2002-05-23 17:28 ` Dan Malek 2002-05-23 17:45 ` Chris Thomson 2002-05-23 18:44 ` benh 1 sibling, 2 replies; 14+ messages in thread From: Dan Malek @ 2002-05-23 17:28 UTC (permalink / raw) To: acurtis; +Cc: Paul Mackerras, linuxppc-dev Allen Curtis wrote: > All of these make sense,.... What doesn't make sense is why we use eieio at all......All of the mapped I/O space is marked uncached 'guarded' in the PTE, which enforces in-order load/store operations. This should also prevent store gathering in bridges since they shouldn't see a burst write from a processor store operation. If you want higher peformance programmed I/O access, then you should cache some of the space, and at that time you must use eieio if there are cached areas subject to out of order access problems. On the 8xx and 8260 family, all of the I/O (including the internal memory space) is mapped uncached and guarded. I've never used eieio nor seen any reason it was necessary. Where you will see problems, especially on 4xx and potentially on 8xx, is the use of "regular" memory for control structures and special registers for other control. You can write to memory, which gets stuck in pipelines, then whack a DCR (which seems to have some magical fast path update) causing the peripheral to start up before the pipelined writes have made it to memory. I'm wondering if we aren't just lucky with the eieio side effects when a 'sync' would be the logically correct operator. -- Dan ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: eieio rule-of-thumb? 2002-05-23 17:28 ` Dan Malek @ 2002-05-23 17:45 ` Chris Thomson 2002-05-23 19:02 ` Dan Malek 2002-05-23 22:36 ` Paul Mackerras 2002-05-23 18:44 ` benh 1 sibling, 2 replies; 14+ messages in thread From: Chris Thomson @ 2002-05-23 17:45 UTC (permalink / raw) To: 'Dan Malek', acurtis; +Cc: 'Paul Mackerras', linuxppc-dev Eieio instructions are needed for CPUs based on the 745x core. These are the 7445, 7450, 7451 and 7455. Earlier PPC implementations never reordered guarded, or even uncached, accesses. Motorola made a big deal about this in its embedded sales. All you had to do was set the WIMG bits right. Evidently Motorola decided that getting the extra performance of reordering was needed to keep the Apple account. -----Original Message----- From: owner-linuxppc-dev@lists.linuxppc.org [mailto:owner-linuxppc-dev@lists.linuxppc.org]On Behalf Of Dan Malek Sent: Thursday, May 23, 2002 10:28 AM To: acurtis@onz.com Cc: Paul Mackerras; linuxppc-dev@lists.linuxppc.org Subject: Re: eieio rule-of-thumb? Allen Curtis wrote: > All of these make sense,.... What doesn't make sense is why we use eieio at all......All of the mapped I/O space is marked uncached 'guarded' in the PTE, which enforces in-order load/store operations. This should also prevent store gathering in bridges since they shouldn't see a burst write from a processor store operation. If you want higher peformance programmed I/O access, then you should cache some of the space, and at that time you must use eieio if there are cached areas subject to out of order access problems. On the 8xx and 8260 family, all of the I/O (including the internal memory space) is mapped uncached and guarded. I've never used eieio nor seen any reason it was necessary. Where you will see problems, especially on 4xx and potentially on 8xx, is the use of "regular" memory for control structures and special registers for other control. You can write to memory, which gets stuck in pipelines, then whack a DCR (which seems to have some magical fast path update) causing the peripheral to start up before the pipelined writes have made it to memory. I'm wondering if we aren't just lucky with the eieio side effects when a 'sync' would be the logically correct operator. -- Dan ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: eieio rule-of-thumb? 2002-05-23 17:45 ` Chris Thomson @ 2002-05-23 19:02 ` Dan Malek 2002-05-23 22:36 ` Paul Mackerras 1 sibling, 0 replies; 14+ messages in thread From: Dan Malek @ 2002-05-23 19:02 UTC (permalink / raw) To: Chris Thomson; +Cc: acurtis, 'Paul Mackerras', linuxppc-dev Chris Thomson wrote: > Eieio instructions are needed for CPUs based on the 745x core. > These are the 7445, 7450, 7451 and 7455. The eieio is needed on many of the processors, I just got lucky on some of them :-) > Earlier PPC implementations never reordered guarded, or even > uncached, accesses. Motorola made a big deal about this in its > embedded sales. All you had to do was set the WIMG bits right. The 74xx is a little confusing, but I don't think it really changed any semantics. Loads from uncached and guarded spaces are still performed in order. Writes to such spaces are not gathered and are performed in order. I think there was some fine tuning of pipelines that may cause writes to hang around in a buffer longer than they did on previous processors (and there is this funny description of a cache line load if all of them would be executed). The only confusing part is how the cache inhibit or guarded attribute controls this behavior. The eieio is still necessary to ensure a load doesn't cross a pending store. The eieio further performs a broadcast bus operation which can be used by external bridges to prevent them from performing a store gathering (write combining as Ben said :-) if necessary. I know there is something a little different with the 74xx, because when I received one of the first 7450s for my Sandpoint the PCI I/O didn't work exactly correct. A couple of minor bug fixes suitable to all processors cured it. > Evidently Motorola decided that getting the extra performance > of reordering was needed to keep the Apple account. You could probably say that of Altivec, but I don't see any difference with the reordering behavior of the processors. Anything we see is likely to be an effect of significantly improving the performance of the normal superscalar mode of operation. Thanks. -- Dan ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: eieio rule-of-thumb? 2002-05-23 17:45 ` Chris Thomson 2002-05-23 19:02 ` Dan Malek @ 2002-05-23 22:36 ` Paul Mackerras 1 sibling, 0 replies; 14+ messages in thread From: Paul Mackerras @ 2002-05-23 22:36 UTC (permalink / raw) To: Chris Thomson; +Cc: 'Dan Malek', acurtis, linuxppc-dev Chris Thomson writes: > Eieio instructions are needed for CPUs based on the 745x core. > These are the 7445, 7450, 7451 and 7455. > > Earlier PPC implementations never reordered guarded, or even > uncached, accesses. Motorola made a big deal about this in its > embedded sales. All you had to do was set the WIMG bits right. Not true. I had an actual example on a 604 processor where I had a loop for outputting characters to a serial port. The loop did the usual thing of polling the status register until it said the transmit buffer was empty, then poke a character into the transmit buffer, and loop for the next character. Without an eieio in there, it would drop characters because the load to poll the status register would go ahead of the store to the transmit buffer. Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: eieio rule-of-thumb? 2002-05-23 17:28 ` Dan Malek 2002-05-23 17:45 ` Chris Thomson @ 2002-05-23 18:44 ` benh 2002-05-23 18:02 ` Dan Malek 1 sibling, 1 reply; 14+ messages in thread From: benh @ 2002-05-23 18:44 UTC (permalink / raw) To: Dan Malek, acurtis; +Cc: Paul Mackerras, linuxppc-dev > >What doesn't make sense is why we use eieio at all......All of the >mapped I/O space is marked uncached 'guarded' in the PTE, which enforces >in-order load/store operations. This should also prevent store gathering in >bridges since they shouldn't see a burst write from a processor store >operation. Regarding eieio on uncached, Paul or Anton can tell you more about it, I think there is still a case where guarded doesn't prevent a load from moving accross a store. Regarding the bridge, the while point of store gathering in bridges is to make bursts when the CPU doesn't. That's what most bridges call 'write combine'. It's done, for example, but UniNorth rev >= 1.5 on macs when targeting the AGP bus. >If you want higher peformance programmed I/O access, then you should cache >some of the space, and at that time you must use eieio if there are cached >areas subject to out of order access problems. And beware that eieio won't be a barrier between cacheable and non cacheable space. So if you need your cacheable stores to be complete before you write to non-cacheable space (a register for example), you need to use sync. >On the 8xx and 8260 family, all of the I/O (including the internal memory >space) is mapped uncached and guarded. I've never used eieio nor seen >any reason it was necessary. I think it is on desktop CPUs, but again, here, Paul and Anton have more knowledge than I do. >Where you will see problems, especially on 4xx and potentially on 8xx, >is the use of "regular" memory for control structures and special registers >for other control. You can write to memory, which gets stuck in pipelines, >then whack a DCR (which seems to have some magical fast path update) causing >the peripheral to start up before the pipelined writes have made it to >memory. I'm wondering if we aren't just lucky with the eieio side effects >when a 'sync' would be the logically correct operator. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: eieio rule-of-thumb? 2002-05-23 18:44 ` benh @ 2002-05-23 18:02 ` Dan Malek 2002-05-23 22:58 ` Paul Mackerras 0 siblings, 1 reply; 14+ messages in thread From: Dan Malek @ 2002-05-23 18:02 UTC (permalink / raw) To: benh; +Cc: acurtis, Paul Mackerras, linuxppc-dev benh@kernel.crashing.org wrote: > Regarding eieio on uncached, Paul or Anton can tell you more about it, > I think there is still a case where guarded doesn't prevent a load > from moving accross a store. Oh right....You get guarded to prevent out of order loads, and I just get lucky that the uncached store doesn't cross the load. > Regarding the bridge,... OK. I thought that was a PCI master, memory controller thing, not something that would happen from a CPU programmed I/O. > I think it is on desktop CPUs, but again, here, Paul and Anton have more > knowledge than I do. Right. I know I found a couple of things we had to update for the 74xx processors. Thanks. -- Dan ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: eieio rule-of-thumb? 2002-05-23 18:02 ` Dan Malek @ 2002-05-23 22:58 ` Paul Mackerras 0 siblings, 0 replies; 14+ messages in thread From: Paul Mackerras @ 2002-05-23 22:58 UTC (permalink / raw) To: Dan Malek; +Cc: benh, acurtis, linuxppc-dev Dan Malek writes: > Oh right....You get guarded to prevent out of order loads, and I just > get lucky that the uncached store doesn't cross the load. Careful... "out of order" is a somewhat ambiguous term. The obvious meaning (at least to me) is "done in a different order from that specified in the program", i.e. reordered. But that isn't what the term means in the CPU users' manuals and the PEM. There it means "done before you know that the program requires it to be done", i.e. speculative. The guarded bit prevents "out of order" loads and stores in the second sense, that is, the processor can't perform a load or a store to a guarded page until it has resolved all previous branches and made sure that there are no pending exceptions from previous instructions. The original PowerPC architecture didn't put any constraints on which order the loads and stores done by a program are seen by memory or an I/O device. However, none of the implementations so far reorder stores to noncacheable, guarded space, and IBM has decided that future 64-bit PowerPCs will preserve this behaviour. I don't recall what Book E specifies in this area though. It would be legal for two loads to get reordered, and I can imagine a situation where this might happen: lis r9,variable@ha lwz r3,variable@l(r9) lwzx r4,r3,r10 lwz r5,0(r10) Imagine that r10 points to a noncacheable guarded page, and that the first load misses in the cache (or is from a noncacheable page). The subsequent loads can't be performed until we know that the first load isn't going to generate an exception, but that only takes a TLB lookup. The third load could well go ahead while the second load is waiting for the value of the first load. Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2002-05-23 22:58 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-05-22 4:43 eieio rule-of-thumb? Allen Curtis 2002-05-22 6:01 ` Paul Mackerras 2002-05-23 2:25 ` Allen Curtis 2002-05-23 4:26 ` Paul Mackerras 2002-05-23 13:38 ` Allen Curtis 2002-05-23 13:54 ` Dan Brennan 2002-05-23 14:42 ` Allen Curtis 2002-05-23 17:28 ` Dan Malek 2002-05-23 17:45 ` Chris Thomson 2002-05-23 19:02 ` Dan Malek 2002-05-23 22:36 ` Paul Mackerras 2002-05-23 18:44 ` benh 2002-05-23 18:02 ` Dan Malek 2002-05-23 22:58 ` Paul Mackerras
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).