eieio rule-of-thumb?

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* eieio rule-of-thumb?
@ 2002-05-22  4:43 Allen Curtis
  2002-05-22  6:01 ` Paul Mackerras
  0 siblings, 1 reply; 14+ messages in thread
From: Allen Curtis @ 2002-05-22  4:43 UTC (permalink / raw)
  To: linuxppc-dev

I was wondering if anyone had a rule for the use of eieio or sync
instructions? After reviewing some of the newer code, it appears that eieio
is used fairly often (at least in some code), sync is hardly ever used. Is a
consideration for use whether data is being manipulated on different buses?
(that would be my assumption) I would assume that operations on the same bus
would occur in the proper sequence.

TIA

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: eieio rule-of-thumb?
  2002-05-22  4:43 eieio rule-of-thumb? Allen Curtis
@ 2002-05-22  6:01 ` Paul Mackerras
  2002-05-23  2:25   ` Allen Curtis
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Mackerras @ 2002-05-22  6:01 UTC (permalink / raw)
  To: acurtis; +Cc: linuxppc-dev

Allen Curtis writes:

> I was wondering if anyone had a rule for the use of eieio or sync
> instructions? After reviewing some of the newer code, it appears that eieio
> is used fairly often (at least in some code), sync is hardly ever used. Is a
> consideration for use whether data is being manipulated on different buses?
> (that would be my assumption) I would assume that operations on the same bus
> would occur in the proper sequence.

Nope.  Loads can (and sometimes do) go ahead of stores.  Stores can go
ahead of loads as well in some circumstances, for example if the load
is non-cacheable and the store hits in the cache.  In principle loads
can get reordered too.  Reordering can happen in the processor's
load/store unit and/or in the PCI host bridge.

I assume you are mostly concerned with loads and stores to
noncacheable addresses (i.e. I/O devices).  Non-cacheable stores don't
get reordered, and eieio acts as a barrier to make sure that all
noncacheable accesses before the eieio are done before any of the
noncacheable accesses after the eieio.

If you are concerned with cacheable accesses as well you start needing
to use sync - it orders all accesses.  Unfortunately it is quite an
expensive operation since it synchronizes the execution pipeline as
well.

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: eieio rule-of-thumb?
  2002-05-22  6:01 ` Paul Mackerras
@ 2002-05-23  2:25   ` Allen Curtis
  2002-05-23  4:26     ` Paul Mackerras
  0 siblings, 1 reply; 14+ messages in thread
From: Allen Curtis @ 2002-05-23  2:25 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev


>Nope.  Loads can (and sometimes do) go ahead of stores.  Stores can go
>ahead of loads as well in some circumstances, for example if the load
>is non-cacheable and the store hits in the cache.  In principle loads
>can get reordered too.  Reordering can happen in the processor's
>load/store unit and/or in the PCI host bridge.
>
>I assume you are mostly concerned with loads and stores to
>noncacheable addresses (i.e. I/O devices).  Non-cacheable stores don't
>get reordered, and eieio acts as a barrier to make sure that all
>noncacheable accesses before the eieio are done before any of the
>noncacheable accesses after the eieio.

Could you provide an example of when you should insert this instruction into
your code. From you description it is a wonder that "var += 2;" works if I
am accessing dual-port on an 8260.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: eieio rule-of-thumb?
  2002-05-23  2:25   ` Allen Curtis
@ 2002-05-23  4:26     ` Paul Mackerras
  2002-05-23 13:38       ` Allen Curtis
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Mackerras @ 2002-05-23  4:26 UTC (permalink / raw)
  To: acurtis; +Cc: linuxppc-dev

Allen Curtis writes:

> Could you provide an example of when you should insert this instruction into
> your code. From you description it is a wonder that "var += 2;" works if I
> am accessing dual-port on an 8260.

That works because the value stored has a data dependency on the value
loaded, i.e. the processor can't do the store until it has the value
back from the load.

In general it is sufficient to do an eieio before each non-cacheable
load, and between two non-cacheable stores if you need to prevent any
possibility of the two stores being combined into a single transaction
(e.g. if you have PCI and a PCI host bridge that can do write
combining).  At present the I/O macros do an eieio after each
non-cacheable load and store, which works but is overkill.

In device driver code I much prefer not to see explicit "eieio()"
calls unless it is absolutely necessary.  There are sets of macros
provided for access to I/O devices and it is almost always cleaner to
use them instead of explicitly putting "eieio()" in your code.  It
cuts down on the "Old Macdonald" jokes too. :)

For access to PCI devices, use:

{in,out}{b,w,l}		access to PCI I/O space (little endian)
{read,write}{b,w,l}	access to PCI memory space (little endian)

For access to non-PCI devices on PPC platforms, use:

{in,out}_8
{in,out}_{le,be}{16,32}

These all take a kernel virtual address, i.e. the result of an
ioremap.  The little-endian variants use the byte-reversed load and
store instructions.

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: eieio rule-of-thumb?
  2002-05-23  4:26     ` Paul Mackerras
@ 2002-05-23 13:38       ` Allen Curtis
  2002-05-23 13:54         ` Dan Brennan
  2002-05-23 17:28         ` Dan Malek
  0 siblings, 2 replies; 14+ messages in thread
From: Allen Curtis @ 2002-05-23 13:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

> For access to PCI devices, use:
>
> {in,out}{b,w,l}		access to PCI I/O space (little endian)
> {read,write}{b,w,l}	access to PCI memory space (little endian)
>
> For access to non-PCI devices on PPC platforms, use:
>
> {in,out}_8
> {in,out}_{le,be}{16,32}
>

All of these make sense, but what about the Internal Memory Map? Are you
suggesting that these macros should be used to access internal control
structures, buffer descriptors, etc and ignore the structures defined in
cpm_8260.h and imap_8260.h? In a typical system, this is where most of the
non-cacheable I/O will occur. In many cases you probably do not care what
order things happen at the micro level, in some cases you do. If you
understand the problem you can optimize your solution, otherwise put the fix
everywhere out of paranoia.  :)

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: eieio rule-of-thumb?
  2002-05-23 13:38       ` Allen Curtis
@ 2002-05-23 13:54         ` Dan Brennan
  2002-05-23 14:42           ` Allen Curtis
  2002-05-23 17:28         ` Dan Malek
  1 sibling, 1 reply; 14+ messages in thread
From: Dan Brennan @ 2002-05-23 13:54 UTC (permalink / raw)
  To: acurtis; +Cc: Paul Mackerras, linuxppc-dev


If you are using the 8260 the eieio does nothing. From the 603e User's
Manual:

The Enforce In-Order Execution of I/O (eieio) instruction is used to
ensure memory
reordering of noncacheable memory access. Since the 603e does not
reorder
noncacheable memory accesses, the eieio instruction is treated as a
no-op.

Allen Curtis wrote:
>
> > For access to PCI devices, use:
> >
> > {in,out}{b,w,l}               access to PCI I/O space (little endian)
> > {read,write}{b,w,l}   access to PCI memory space (little endian)
> >
> > For access to non-PCI devices on PPC platforms, use:
> >
> > {in,out}_8
> > {in,out}_{le,be}{16,32}
> >
>
> All of these make sense, but what about the Internal Memory Map? Are you
> suggesting that these macros should be used to access internal control
> structures, buffer descriptors, etc and ignore the structures defined in
> cpm_8260.h and imap_8260.h? In a typical system, this is where most of the
> non-cacheable I/O will occur. In many cases you probably do not care what
> order things happen at the micro level, in some cases you do. If you
> understand the problem you can optimize your solution, otherwise put the fix
> everywhere out of paranoia.  :)
>

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: eieio rule-of-thumb?
  2002-05-23 13:54         ` Dan Brennan
@ 2002-05-23 14:42           ` Allen Curtis
  0 siblings, 0 replies; 14+ messages in thread
From: Allen Curtis @ 2002-05-23 14:42 UTC (permalink / raw)
  To: Dan Brennan; +Cc: Paul Mackerras, linuxppc-dev


> The Enforce In-Order Execution of I/O (eieio) instruction is used to
> ensure memory
> reordering of noncacheable memory access. Since the 603e does not
> reorder
> noncacheable memory accesses, the eieio instruction is treated as a
> no-op.

That make me feel better for my processor anyway....

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: eieio rule-of-thumb?
  2002-05-23 13:38       ` Allen Curtis
  2002-05-23 13:54         ` Dan Brennan
@ 2002-05-23 17:28         ` Dan Malek
  2002-05-23 17:45           ` Chris Thomson
  2002-05-23 18:44           ` benh
  1 sibling, 2 replies; 14+ messages in thread
From: Dan Malek @ 2002-05-23 17:28 UTC (permalink / raw)
  To: acurtis; +Cc: Paul Mackerras, linuxppc-dev

Allen Curtis wrote:

> All of these make sense,....

What doesn't make sense is why we use eieio at all......All of the
mapped I/O space is marked uncached 'guarded' in the PTE, which enforces
in-order load/store operations.  This should also prevent store gathering in
bridges since they shouldn't see a burst write from a processor store operation.

If you want higher peformance programmed I/O access, then you should cache
some of the space, and at that time you must use eieio if there are cached
areas subject to out of order access problems.

On the 8xx and 8260 family, all of the I/O (including the internal memory
space) is mapped uncached and guarded.  I've never used eieio nor seen
any reason it was necessary.

Where you will see problems, especially on 4xx and potentially on 8xx,
is the use of "regular" memory for control structures and special registers
for other control.  You can write to memory, which gets stuck in pipelines,
then whack a DCR (which seems to have some magical fast path update) causing
the peripheral to start up before the pipelined writes have made it to
memory.  I'm wondering if we aren't just lucky with the eieio side effects
when a 'sync' would be the logically correct operator.

	-- Dan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: eieio rule-of-thumb?
  2002-05-23 17:28         ` Dan Malek
@ 2002-05-23 17:45           ` Chris Thomson
  2002-05-23 19:02             ` Dan Malek
  2002-05-23 22:36             ` Paul Mackerras
  2002-05-23 18:44           ` benh
  1 sibling, 2 replies; 14+ messages in thread
From: Chris Thomson @ 2002-05-23 17:45 UTC (permalink / raw)
  To: 'Dan Malek', acurtis; +Cc: 'Paul Mackerras', linuxppc-dev

Eieio instructions are needed for CPUs based on the 745x core.
These are the 7445, 7450, 7451 and 7455.

Earlier PPC implementations never reordered guarded, or even
uncached, accesses.  Motorola made a big deal about this in its
embedded sales.  All you had to do was set the WIMG bits right.

Evidently Motorola decided that getting the extra performance
of reordering was needed to keep the Apple account.

-----Original Message-----
From: owner-linuxppc-dev@lists.linuxppc.org
[mailto:owner-linuxppc-dev@lists.linuxppc.org]On Behalf Of Dan Malek
Sent: Thursday, May 23, 2002 10:28 AM
To: acurtis@onz.com
Cc: Paul Mackerras; linuxppc-dev@lists.linuxppc.org
Subject: Re: eieio rule-of-thumb?

Allen Curtis wrote:

> All of these make sense,....

What doesn't make sense is why we use eieio at all......All of the
mapped I/O space is marked uncached 'guarded' in the PTE, which enforces
in-order load/store operations.  This should also prevent store gathering in
bridges since they shouldn't see a burst write from a processor store
operation.

If you want higher peformance programmed I/O access, then you should cache
some of the space, and at that time you must use eieio if there are cached
areas subject to out of order access problems.

On the 8xx and 8260 family, all of the I/O (including the internal memory
space) is mapped uncached and guarded.  I've never used eieio nor seen
any reason it was necessary.

Where you will see problems, especially on 4xx and potentially on 8xx,
is the use of "regular" memory for control structures and special registers
for other control.  You can write to memory, which gets stuck in pipelines,
then whack a DCR (which seems to have some magical fast path update) causing
the peripheral to start up before the pipelined writes have made it to
memory.  I'm wondering if we aren't just lucky with the eieio side effects
when a 'sync' would be the logically correct operator.

	-- Dan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: eieio rule-of-thumb?
  2002-05-23 17:45           ` Chris Thomson
@ 2002-05-23 19:02             ` Dan Malek
  2002-05-23 22:36             ` Paul Mackerras
  1 sibling, 0 replies; 14+ messages in thread
From: Dan Malek @ 2002-05-23 19:02 UTC (permalink / raw)
  To: Chris Thomson; +Cc: acurtis, 'Paul Mackerras', linuxppc-dev

Chris Thomson wrote:

> Eieio instructions are needed for CPUs based on the 745x core.
> These are the 7445, 7450, 7451 and 7455.

The eieio is needed on many of the processors, I just got lucky
on some of them :-)

> Earlier PPC implementations never reordered guarded, or even
> uncached, accesses.  Motorola made a big deal about this in its
> embedded sales.  All you had to do was set the WIMG bits right.

The 74xx is a little confusing, but I don't think it really changed
any semantics.  Loads from uncached and guarded spaces are still
performed in order.  Writes to such spaces are not gathered and are
performed in order.  I think there was some fine tuning of pipelines
that may cause writes to hang around in a buffer longer than they
did on previous processors (and there is this funny description of
a cache line load if all of them would be executed).  The only
confusing part is how the cache inhibit or guarded attribute controls
this behavior.  The eieio is still necessary to ensure a load doesn't
cross a pending store.

The eieio further performs a broadcast bus operation which can be
used by external bridges to prevent them from performing a store
gathering (write combining as Ben said :-) if necessary.

I know there is something a little different with the 74xx, because
when I received one of the first 7450s for my Sandpoint the PCI I/O
didn't work exactly correct.  A couple of minor bug fixes suitable to
all processors cured it.

> Evidently Motorola decided that getting the extra performance
> of reordering was needed to keep the Apple account.

You could probably say that of Altivec, but I don't see any difference
with the reordering behavior of the processors.  Anything we see is likely
to be an effect of significantly improving the performance of the normal
superscalar mode of operation.

Thanks.

	-- Dan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: eieio rule-of-thumb?
  2002-05-23 17:45           ` Chris Thomson
  2002-05-23 19:02             ` Dan Malek
@ 2002-05-23 22:36             ` Paul Mackerras
  1 sibling, 0 replies; 14+ messages in thread
From: Paul Mackerras @ 2002-05-23 22:36 UTC (permalink / raw)
  To: Chris Thomson; +Cc: 'Dan Malek', acurtis, linuxppc-dev

Chris Thomson writes:

> Eieio instructions are needed for CPUs based on the 745x core.
> These are the 7445, 7450, 7451 and 7455.
>
> Earlier PPC implementations never reordered guarded, or even
> uncached, accesses.  Motorola made a big deal about this in its
> embedded sales.  All you had to do was set the WIMG bits right.

Not true.  I had an actual example on a 604 processor where I had a
loop for outputting characters to a serial port.  The loop did the
usual thing of polling the status register until it said the transmit
buffer was empty, then poke a character into the transmit buffer, and
loop for the next character.  Without an eieio in there, it would drop
characters because the load to poll the status register would go ahead
of the store to the transmit buffer.

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: eieio rule-of-thumb?
  2002-05-23 17:28         ` Dan Malek
  2002-05-23 17:45           ` Chris Thomson
@ 2002-05-23 18:44           ` benh
  2002-05-23 18:02             ` Dan Malek
  1 sibling, 1 reply; 14+ messages in thread
From: benh @ 2002-05-23 18:44 UTC (permalink / raw)
  To: Dan Malek, acurtis; +Cc: Paul Mackerras, linuxppc-dev

>
>What doesn't make sense is why we use eieio at all......All of the
>mapped I/O space is marked uncached 'guarded' in the PTE, which enforces
>in-order load/store operations.  This should also prevent store gathering in
>bridges since they shouldn't see a burst write from a processor store
>operation.

Regarding eieio on uncached, Paul or Anton can tell you more about it,
I think there is still a case where guarded doesn't prevent a load
from moving accross a store.

Regarding the bridge, the while point of store gathering in bridges
is to make bursts when the CPU doesn't. That's what most bridges call
'write combine'. It's done, for example, but UniNorth rev >= 1.5 on
macs when targeting the AGP bus.

>If you want higher peformance programmed I/O access, then you should cache
>some of the space, and at that time you must use eieio if there are cached
>areas subject to out of order access problems.

And beware that eieio won't be a barrier between cacheable and non cacheable
space. So if you need your cacheable stores to be complete before you write
to non-cacheable space (a register for example), you need to use sync.

>On the 8xx and 8260 family, all of the I/O (including the internal memory
>space) is mapped uncached and guarded.  I've never used eieio nor seen
>any reason it was necessary.

I think it is on desktop CPUs, but again, here, Paul and Anton have more
knowledge than I do.

>Where you will see problems, especially on 4xx and potentially on 8xx,
>is the use of "regular" memory for control structures and special registers
>for other control.  You can write to memory, which gets stuck in pipelines,
>then whack a DCR (which seems to have some magical fast path update) causing
>the peripheral to start up before the pipelined writes have made it to
>memory.  I'm wondering if we aren't just lucky with the eieio side effects
>when a 'sync' would be the logically correct operator.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: eieio rule-of-thumb?
  2002-05-23 18:44           ` benh
@ 2002-05-23 18:02             ` Dan Malek
  2002-05-23 22:58               ` Paul Mackerras
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Malek @ 2002-05-23 18:02 UTC (permalink / raw)
  To: benh; +Cc: acurtis, Paul Mackerras, linuxppc-dev

benh@kernel.crashing.org wrote:

> Regarding eieio on uncached, Paul or Anton can tell you more about it,
> I think there is still a case where guarded doesn't prevent a load
> from moving accross a store.

Oh right....You get guarded to prevent out of order loads, and I just
get lucky that the uncached store doesn't cross the load.

> Regarding the bridge,...

OK.  I thought that was a PCI master, memory controller thing, not
something that would happen from a CPU programmed I/O.

> I think it is on desktop CPUs, but again, here, Paul and Anton have more
> knowledge than I do.

Right.  I know I found a couple of things we had to update for the 74xx
processors.

Thanks.

	-- Dan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: eieio rule-of-thumb?
  2002-05-23 18:02             ` Dan Malek
@ 2002-05-23 22:58               ` Paul Mackerras
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Mackerras @ 2002-05-23 22:58 UTC (permalink / raw)
  To: Dan Malek; +Cc: benh, acurtis, linuxppc-dev

Dan Malek writes:

> Oh right....You get guarded to prevent out of order loads, and I just
> get lucky that the uncached store doesn't cross the load.

Careful... "out of order" is a somewhat ambiguous term.  The obvious
meaning (at least to me) is "done in a different order from that
specified in the program", i.e. reordered.  But that isn't what the
term means in the CPU users' manuals and the PEM.  There it means
"done before you know that the program requires it to be done",
i.e. speculative.  The guarded bit prevents "out of order" loads and
stores in the second sense, that is, the processor can't perform a
load or a store to a guarded page until it has resolved all previous
branches and made sure that there are no pending exceptions from
previous instructions.

The original PowerPC architecture didn't put any constraints on which
order the loads and stores done by a program are seen by memory or an
I/O device.  However, none of the implementations so far reorder
stores to noncacheable, guarded space, and IBM has decided that future
64-bit PowerPCs will preserve this behaviour.  I don't recall what
Book E specifies in this area though.

It would be legal for two loads to get reordered, and I can imagine a
situation where this might happen:

	lis	r9,variable@ha
	lwz	r3,variable@l(r9)
	lwzx	r4,r3,r10
	lwz	r5,0(r10)

Imagine that r10 points to a noncacheable guarded page, and that the
first load misses in the cache (or is from a noncacheable page).  The
subsequent loads can't be performed until we know that the first load
isn't going to generate an exception, but that only takes a TLB
lookup.  The third load could well go ahead while the second load is
waiting for the value of the first load.

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-05-23 22:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-22  4:43 eieio rule-of-thumb? Allen Curtis
2002-05-22  6:01 ` Paul Mackerras
2002-05-23  2:25   ` Allen Curtis
2002-05-23  4:26     ` Paul Mackerras
2002-05-23 13:38       ` Allen Curtis
2002-05-23 13:54         ` Dan Brennan
2002-05-23 14:42           ` Allen Curtis
2002-05-23 17:28         ` Dan Malek
2002-05-23 17:45           ` Chris Thomson
2002-05-23 19:02             ` Dan Malek
2002-05-23 22:36             ` Paul Mackerras
2002-05-23 18:44           ` benh
2002-05-23 18:02             ` Dan Malek
2002-05-23 22:58               ` Paul Mackerras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).