linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* readl() and friends and eieio on PPC
@ 1999-08-09  8:17 Geert Uytterhoeven
  1999-08-09 17:19 ` David A. Gatwood
  1999-08-10  1:00 ` Paul Mackerras
  0 siblings, 2 replies; 41+ messages in thread
From: Geert Uytterhoeven @ 1999-08-09  8:17 UTC (permalink / raw)
  To: Linux/PPC Development; +Cc: Linux Frame Buffer Device Development



Jes Sørensen pointed out to me that readl() and friends should not use eieio on
PPC. On other architectures (e.g. AXP) this isn't done neither.

Currently we have[*]:

#define readl(addr) in_le32((volatile unsigned *)(addr))
#define inl(port)               in_le32((unsigned *)((port)+_IO_BASE))
#define inl_p(port)             in_le32((unsigned *)((port)+_IO_BASE))

extern inline unsigned in_le32(volatile unsigned *addr){
        unsigned ret;

        __asm__ __volatile__("lwbrx %0,0,%1; eieio" : "=r" (ret) :
                             "r" (addr), "m" (*addr));
        return ret;
}

[*] Except on APUS, where readl() uses native endianness.

Hence both inl() and readl() protect against reordering. This is not necessary
for readl(). Drivers that need to protect against reordering should use
wmb()/rmb()/mb() theirselves.

If readl() and friends don't do eieio, the fbcon-* routines won't be slowed
down by using readl() and friends (but we're still having the byte swapping
then).

And atyfb should use readl()/writel() instead of aty_{ld,st}_le32(), so we can
get rid of the inline assembler. Note that this will probably break on Atari,
since on m68k readl() doesn't do byte swapping. But that can be circumvented
with one #ifdef.

Greetings,

						Geert

--
Geert Uytterhoeven                     Geert.Uytterhoeven@cs.kuleuven.ac.be
Wavelets, Linux/{m68k~Amiga,PPC~CHRP}  http://www.cs.kuleuven.ac.be/~geert/
Department of Computer Science -- Katholieke Universiteit Leuven -- Belgium


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: readl() and friends and eieio on PPC
  1999-08-09  8:17 readl() and friends and eieio on PPC Geert Uytterhoeven
@ 1999-08-09 17:19 ` David A. Gatwood
  1999-08-10  1:00 ` Paul Mackerras
  1 sibling, 0 replies; 41+ messages in thread
From: David A. Gatwood @ 1999-08-09 17:19 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linux/PPC Development, Linux Frame Buffer Device Development


On Mon, 9 Aug 1999, Geert Uytterhoeven wrote:

> Jes Sørensen pointed out to me that readl() and friends should not use
> eieio on PPC. On other architectures (e.g. AXP) this isn't done neither. 
> 
> Currently we have[*]:
> 
> #define readl(addr) in_le32((volatile unsigned *)(addr))
> #define inl(port)               in_le32((unsigned *)((port)+_IO_BASE))
> #define inl_p(port)             in_le32((unsigned *)((port)+_IO_BASE))
> 
> extern inline unsigned in_le32(volatile unsigned *addr){
>         unsigned ret;
> 
>         __asm__ __volatile__("lwbrx %0,0,%1; eieio" : "=r" (ret) :
>                              "r" (addr), "m" (*addr));
>         return ret;
> }
> 
> [*] Except on APUS, where readl() uses native endianness.
> 
> Hence both inl() and readl() protect against reordering. This is not necessary
> for readl(). Drivers that need to protect against reordering should use
> wmb()/rmb()/mb() theirselves.

Further, eieio should never be used by itself as an assembly instruction
like this -- not in _any_ macro.  If you ever hope to support all of the
x100 PowerMacs, you'll have to have a macro just for eieio, as several
instructions are required before and after eieio, sync, and isync (or at
least two of them, and I forget which) to avoid hardware buglet on certain
machines.


Later,
David


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: readl() and friends and eieio on PPC
  1999-08-09  8:17 readl() and friends and eieio on PPC Geert Uytterhoeven
  1999-08-09 17:19 ` David A. Gatwood
@ 1999-08-10  1:00 ` Paul Mackerras
  1999-08-10  7:18   ` [linux-fbdev] " Jes Sorensen
  1 sibling, 1 reply; 41+ messages in thread
From: Paul Mackerras @ 1999-08-10  1:00 UTC (permalink / raw)
  To: Geert.Uytterhoeven; +Cc: linuxppc-dev, linux-fbdev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2470 bytes --]


Geert Uytterhoeven <Geert.Uytterhoeven@cs.kuleuven.ac.be> wrote:

> Jes Sørensen pointed out to me that readl() and friends should not use eieio on
> PPC. On other architectures (e.g. AXP) this isn't done neither.

Readl/writel etc. are intended for "memory" space, but this could be
either memory-mapped device registers or plain ordinary memory.  The
intel folks don't make the distinction because ia32 doesn't allow
reordering of memory accesses AFAIK.

> Hence both inl() and readl() protect against reordering. This is not necessary
> for readl(). Drivers that need to protect against reordering should use
> wmb()/rmb()/mb() theirselves.

Linus made the point in a recent post to linux-kernel that people
shouldn't necessarily expect inb/outb/readb/writeb etc. to be usable
on every kind of bus - it's quite reasonable to define other access
methods on other cpus or buses.

> If readl() and friends don't do eieio, the fbcon-* routines won't be slowed
> down by using readl() and friends (but we're still having the byte swapping
> then).

Do you have any numbers to show how much the eieios slow you down?

If you take out the eieios, you will break other drivers, starting
with the OHCI USB host driver.  Can we think of another way around the
problem?  You could use le32_to_cpup for loading from the frame
buffer, but there isn't currently an equivalent for stores,
unfortunately (one could be invented, though).

> And atyfb should use readl()/writel() instead of aty_{ld,st}_le32(), so we can
> get rid of the inline assembler. Note that this will probably break on Atari,

I thought the point of the aty_ld/st* routines was to avoid one add
instruction each time by using the PPC indexed addressing mode.
Anyway, IMO the aty_ld/st* routines *should* include the eieio.  That
would mean you wouldn't need the explicit eieio() calls scattered
through the rest of the driver.  I guess it's just luck that it works
where you do a sequence of aty_st_le32's to set up some drawing
command and then call wait_for_fifo (or wait_for_idle) which does an
aty_ld_le32.  Or doesn't it matter if the load gets done before all of
the stores have completed?

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-10  1:00 ` Paul Mackerras
@ 1999-08-10  7:18   ` Jes Sorensen
  1999-08-11  0:23     ` Paul Mackerras
  0 siblings, 1 reply; 41+ messages in thread
From: Jes Sorensen @ 1999-08-10  7:18 UTC (permalink / raw)
  To: Paul.Mackerras; +Cc: Geert.Uytterhoeven, linuxppc-dev, linux-fbdev


>>>>> "Paul" == Paul Mackerras <paulus@cs.anu.edu.au> writes:

Paul> If you take out the eieios, you will break other drivers,
Paul> starting with the OHCI USB host driver.  Can we think of another
Paul> way around the problem?  You could use le32_to_cpup for loading
Paul> from the frame buffer, but there isn't currently an equivalent
Paul> for stores, unfortunately (one could be invented, though).

This is quite easily solved by putting in mb()'s in the right
places. This is how it is done for other drivers that are supposed to
work on the Alpha.

Paul> I thought the point of the aty_ld/st* routines was to avoid one
Paul> add instruction each time by using the PPC indexed addressing
Paul> mode.  Anyway, IMO the aty_ld/st* routines *should* include the
Paul> eieio.  That would mean you wouldn't need the explicit eieio()
Paul> calls scattered through the rest of the driver.  I guess it's
Paul> just luck that it works where you do a sequence of aty_st_le32's
Paul> to set up some drawing command and then call wait_for_fifo (or
Paul> wait_for_idle) which does an aty_ld_le32.  Or doesn't it matter
Paul> if the load gets done before all of the stores have completed?

Having mb()'s explicitly put into the driver in the right places also
makes sure that a driver will work on other architectures. Right now a
driver that is written for the PPC is likely not to work on the Alpha
if the author expects readl/writel to guarantee write ordering.

Jes

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-10  7:18   ` [linux-fbdev] " Jes Sorensen
@ 1999-08-11  0:23     ` Paul Mackerras
  1999-08-11  7:23       ` Jes Sorensen
  0 siblings, 1 reply; 41+ messages in thread
From: Paul Mackerras @ 1999-08-11  0:23 UTC (permalink / raw)
  To: Jes.Sorensen; +Cc: Geert.Uytterhoeven, linuxppc-dev, linux-fbdev


Jes Sorensen <Jes.Sorensen@cern.ch> wrote:

> This is quite easily solved by putting in mb()'s in the right
> places. This is how it is done for other drivers that are supposed to
> work on the Alpha.

No, this is not an acceptable solution.

On ultrasparc at least, there is a "side-effect" bit in each PTE.  If
that bit is set, it tells the cpu not to reorder accesses to that
page.  I don't know whether alpha has the same facility, do you?

Anyway, it's hard enough educating device driver writers about the
need for byte-swapping on data in memory that is accessed by DMA.
Trying to get people to scatter mb()'s around their drivers would be a
herculean task (a bit like cleaning out the Augean stables, actually
:-).

Finally, mb() is actually a much stronger constraint than we need in a
device driver, and will slow things down unnecessarily.  mb() implies
a strong ordering on all loads and stores to all memory.  On the PPC,
mb() translates into the sync instruction, which is much slower than
eieio.  For a sync, the cpu actually has to stop and wait for all bus
activity to complete, whereas for an eieio, it just puts a special
kind of entry in the stream of accesses going out to the memory bus.

> Having mb()'s explicitly put into the driver in the right places also
> makes sure that a driver will work on other architectures. Right now a
> driver that is written for the PPC is likely not to work on the Alpha
> if the author expects readl/writel to guarantee write ordering.

Well, if alpha is actually like that, then IMO it is broken.

I did some experiments this morning to test whether having eieio in
readl/writel is actually going to slow you down.  The bottom line is
that the eieio introduces *no* measurable reduction in performance.  I
used the little program that I have appended below (mtest.c and
mtm.S).

I ran it on my 7600 like this:

mtest 94000000 b420 e1480 200 400 2304 100
mtestn 94000000 b420 e1480 200 400 2304 100

This was with the screen at 1152x870, 16bpp.  mtestn is just a symlink
to mtest.  The results for 10 runs were:

   with eieio:	       mean 2.825s, s.d. 0.007s
   without eieio:      mean 2.824s, s.d. 0.027s

I also tried it on my iMac (81000000 a000 b8350 200 400 2048 100) and
got 4.76s both with and without eieio.

So, unless and until you can show me some numbers that show an actual
performance degradation from having the eieio in readl/writel, the
eieio stays.

Paul.

mtest.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>

extern void move_eieio(int *src, int *dst, int nx, int ny, int pitch);
extern void move_no_eieio(int *src, int *dst, int nx, int ny, int pitch);

main(int ac, char **av)
{
	int fd;
	unsigned long base, sof, dof;
	int nx, ny, pitch;
	long ptr;
	int nrpt;
	int use_eieio;

	if (ac < 7) {
		fprintf(stderr, "Usage: %s base sof dof nx ny pitch\n", av[0]);
		exit(1);
	}
	base = strtoul(av[1], 0, 16);
	sof = strtoul(av[2], 0, 16);
	dof = strtoul(av[3], 0, 16);
	nx = atoi(av[4]);
	ny = atoi(av[5]);
	pitch = atoi(av[6]);
	nrpt = (ac > 7)? atoi(av[7]): 1;
	if ((fd = open("/dev/mem", 2)) < 0) {
		perror("/dev/mem");
		exit(1);
	}
	use_eieio = strchr(av[0], 'n') == 0;
	printf("%seieio\n", use_eieio? "": "no ");
	ptr = mmap(0, 0x200000, PROT_READ|PROT_WRITE, MAP_SHARED, fd, base);
	if (ptr == -1) {
		perror("mmap");
		exit(1);
	}
	if (use_eieio) {
		do {
			move_eieio((int *)(ptr + sof), (int *)(ptr + dof),
				   nx, ny, pitch);
			dof += 4;
		} while (--nrpt > 0);
	} else {
		do {
			move_no_eieio((int *)(ptr + sof), (int *)(ptr + dof),
				      nx, ny, pitch);
			dof += 4;
		} while (--nrpt > 0);
	}
	exit(0);
}

mtm.S:

/* move_eieio(int *src, int *dst, int nx, int ny, int pitch) */
	.globl	move_eieio
move_eieio:
	mtctr	5
	li	8,0
2:	lwbrx	0,3,8
	eieio
	stwbrx	0,4,8
	eieio
	addi	8,8,4
	bdnz	2b
	addic.	6,6,-1
	blelr
	add	3,3,7
	add	4,4,7
	b	move_no_eieio

/* move_no_eieio(int *src, int *dst, int nx, int ny, int pitch) */
	.globl	move_no_eieio
move_no_eieio:
	mtctr	5
	li	8,0
2:	lwbrx	0,3,8
	stwbrx	0,4,8
	addi	8,8,4
	bdnz	2b
	addic.	6,6,-1
	blelr
	add	3,3,7
	add	4,4,7
	b	move_no_eieio

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-11  0:23     ` Paul Mackerras
@ 1999-08-11  7:23       ` Jes Sorensen
  1999-08-11  7:38         ` Richard Henderson
  1999-08-11 23:52         ` Paul Mackerras
  0 siblings, 2 replies; 41+ messages in thread
From: Jes Sorensen @ 1999-08-11  7:23 UTC (permalink / raw)
  To: Paul.Mackerras; +Cc: Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth


>>>>> "Paul" == Paul Mackerras <paulus@cs.anu.edu.au> writes:

Paul> Jes Sorensen <Jes.Sorensen@cern.ch> wrote:
>> This is quite easily solved by putting in mb()'s in the right
>> places. This is how it is done for other drivers that are supposed
>> to work on the Alpha.

Paul> No, this is not an acceptable solution.

Paul> On ultrasparc at least, there is a "side-effect" bit in each
Paul> PTE.  If that bit is set, it tells the cpu not to reorder
Paul> accesses to that page.  I don't know whether alpha has the same
Paul> facility, do you?

No idea but I bet Richard Henderson can answer that question. I also
checked with him after posting this message yesterday and the answer
was readl/writel are not supposed to guarantee strict ordering.

Paul> Anyway, it's hard enough educating device driver writers about
Paul> the need for byte-swapping on data in memory that is accessed by
Paul> DMA.  Trying to get people to scatter mb()'s around their
Paul> drivers would be a herculean task (a bit like cleaning out the
Paul> Augean stables, actually :-).

There are quite a few issues device driver authors needs to deal with,
this is just one of them. I actually made quite an effort to explain
the problem in my tutorial at Linux Expo. Besides people still have to
deal with it when writing drivers for devices that are not mapped in
PCI space but directly mapped. Having readl/writel guarantee ordering
is inconsistant.

Paul> Finally, mb() is actually a much stronger constraint than we
Paul> need in a device driver, and will slow things down
Paul> unnecessarily.  mb() implies a strong ordering on all loads and
Paul> stores to all memory.  On the PPC, mb() translates into the sync
Paul> instruction, which is much slower than eieio.  For a sync, the
Paul> cpu actually has to stop and wait for all bus activity to
Paul> complete, whereas for an eieio, it just puts a special kind of
Paul> entry in the stream of accesses going out to the memory bus.

I don't know enough about the PPC architecture to comment on this,
however I can see that wmb() translates into an eieio. wmb() is more
fine grained and it would make sense to promote it over plain mb() in
the places where it makes sense.

>> Having mb()'s explicitly put into the driver in the right places
>> also makes sure that a driver will work on other
>> architectures. Right now a driver that is written for the PPC is
>> likely not to work on the Alpha if the author expects readl/writel
>> to guarantee write ordering.

Paul> Well, if alpha is actually like that, then IMO it is broken.

I will have to disagree with you on this one, I consider the PPC
implementation to be very broken in this regard.

Paul> So, unless and until you can show me some numbers that show an
Paul> actual performance degradation from having the eieio in
Paul> readl/writel, the eieio stays.

So will the education of people telling them to use mb() after
writel() if they want to be sure of the result.

Jes

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-11  7:23       ` Jes Sorensen
@ 1999-08-11  7:38         ` Richard Henderson
  1999-08-12  0:13           ` Paul Mackerras
  1999-08-12  0:17           ` Paul Mackerras
  1999-08-11 23:52         ` Paul Mackerras
  1 sibling, 2 replies; 41+ messages in thread
From: Richard Henderson @ 1999-08-11  7:38 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Paul.Mackerras, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev,
	Richard Henderson


On Wed, Aug 11, 1999 at 09:23:29AM +0200, Jes Sorensen wrote:
> Paul> On ultrasparc at least, there is a "side-effect" bit in each
> Paul> PTE.  If that bit is set, it tells the cpu not to reorder
> Paul> accesses to that page.  I don't know whether alpha has the same
> Paul> facility, do you?

No, it doesn't.

> I don't know enough about the PPC architecture to comment on this,
> however I can see that wmb() translates into an eieio. wmb() is more
> fine grained and it would make sense to promote it over plain mb() in
> the places where it makes sense.

Definitely.  Alpha's wmb and mb are very similar to ppc's sync and eieio.

> Paul> Well, if alpha is actually like that, then IMO it is broken.

IMO it is Most Correct.

Memory barriers on alpha are a fact of life.  It's not just I/O that
requires it, though that is where it shows up most often with drivers.

There are a great many cards that do memory mapped i/o that don't care
about the ordering and write combining of the data setup, only that the
data setup all be done before receiving the "go code".  In these drivers,
we need only one wmb insn, not one between each and every writel.

This benefit is marked enough that there is zero chance you can convince
me to add wmb() to writel().  The driver writer is the only one that
knows whether this barrier is necessary.



r~

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-11  7:23       ` Jes Sorensen
  1999-08-11  7:38         ` Richard Henderson
@ 1999-08-11 23:52         ` Paul Mackerras
  1999-08-12  7:38           ` Jes Sorensen
  1999-08-12 19:00           ` David A. Gatwood
  1 sibling, 2 replies; 41+ messages in thread
From: Paul Mackerras @ 1999-08-11 23:52 UTC (permalink / raw)
  To: Jes.Sorensen; +Cc: Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth


Jes Sorensen <Jes.Sorensen@cern.ch> wrote:

> I will have to disagree with you on this one, I consider the PPC
> implementation to be very broken in this regard.

"Very broken" - because drivers work and there is no measurable
performance impact?? !!! ??

The only possible argument for *not* having the eieio in readl/writel
is that it hurts performance (actually and measurably, not just
potentially).

Paul.


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-11  7:38         ` Richard Henderson
@ 1999-08-12  0:13           ` Paul Mackerras
  1999-08-12  1:39             ` Peter Chang
  1999-08-12  0:17           ` Paul Mackerras
  1 sibling, 1 reply; 41+ messages in thread
From: Paul Mackerras @ 1999-08-12  0:13 UTC (permalink / raw)
  To: rth; +Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth


Richard Henderson <rth@cygnus.com> wrote:

> There are a great many cards that do memory mapped i/o that don't care
> about the ordering and write combining of the data setup, only that the
> data setup all be done before receiving the "go code".  In these drivers,
> we need only one wmb insn, not one between each and every writel.
> 
> This benefit is marked enough that there is zero chance you can convince

I'm curious to see the numbers.  What sort of driver do you see this
much of an effect in?

For most things, the CPU spends an absolutely insignificant fraction
of its time doing accesses to I/O device registers.  The only
exceptions I can think of would be 3D graphics cards (and possibly
also gigabit ethernet cards, although they should be doing most stuff
by DMA).

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-11  7:38         ` Richard Henderson
  1999-08-12  0:13           ` Paul Mackerras
@ 1999-08-12  0:17           ` Paul Mackerras
  1999-08-12  4:40             ` Richard Henderson
  1 sibling, 1 reply; 41+ messages in thread
From: Paul Mackerras @ 1999-08-12  0:17 UTC (permalink / raw)
  To: rth; +Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth


Richard Henderson <rth@cygnus.com> wrote:

> Definitely.  Alpha's wmb and mb are very similar to ppc's sync and eieio.

Sync and eieio are different in that for sync, the cpu actually stops
and waits for all memory accesses to complete, whereas for eieio the
cpu doesn't have to stop and wait for anything.  Do alpha's mb and wmb
work the same way?

My position is that if you can provide the ordering at essentially
zero cost, then it is an advantage to have it since more drivers will
work that way.

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  0:13           ` Paul Mackerras
@ 1999-08-12  1:39             ` Peter Chang
  1999-08-12  4:52               ` Paul Mackerras
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Chang @ 1999-08-12  1:39 UTC (permalink / raw)
  To: linuxppc-dev, rth


I may have sent this too early during editing. Sorry if you've seen 
an incomplete one already.

At 10:13 +1000 08.12.1999, Paul Mackerras wrote:
>Richard Henderson <rth@cygnus.com> wrote:
>
> > There are a great many cards that do memory mapped i/o that don't care
> > about the ordering and write combining of the data setup, only that the
> > data setup all be done before receiving the "go code".  In these drivers,
> > we need only one wmb insn, not one between each and every writel.
> >
> > This benefit is marked enough that there is zero chance you can convince
>
>I'm curious to see the numbers.  What sort of driver do you see this
>much of an effect in?

When I did glide for the mac it definitely helped not do do an eieio 
after every pci write. The current generations of 3dfx hw use a sw 
managed fifo, and an eieio was only necessary when the sw layer 
needed to do do things to insert a 'barrier' in the fifo for later 
accounting.

>The only exceptions I can think of would be 3D graphics cards (and possibly
>also gigabit ethernet cards, although they should be doing most stuff
>by DMA).

Hmmm.... well the fifo in the 3dfx case lives on the board so there 
is a tradeoff of doing a lot of bus io and trying to make the 
rasterization responsive. Also the hw did not do dma, so this was 
sort of beside the point. :-)

\p

---
Underneath this flabby exterior is an enormous lack of character.
                     -- Oscar Levant

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  0:17           ` Paul Mackerras
@ 1999-08-12  4:40             ` Richard Henderson
  1999-08-12  5:00               ` Paul Mackerras
  1999-08-12  5:16               ` David Edelsohn
  0 siblings, 2 replies; 41+ messages in thread
From: Richard Henderson @ 1999-08-12  4:40 UTC (permalink / raw)
  To: Paul.Mackerras
  Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev


On Thu, Aug 12, 1999 at 10:17:34AM +1000, Paul Mackerras wrote:
> Sync and eieio are different in that for sync, the cpu actually stops
> and waits for all memory accesses to complete, whereas for eieio the
> cpu doesn't have to stop and wait for anything.  Do alpha's mb and wmb
> work the same way?

Yes.  (Except for EV4, in which wmb == mb, but we don't care about that.)

> My position is that if you can provide the ordering at essentially
> zero cost, then it is an advantage to have it since more drivers will
> work that way.

But it isn't zero cost.  It's not high cost, but that's not the same thing.


r~

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  1:39             ` Peter Chang
@ 1999-08-12  4:52               ` Paul Mackerras
  1999-08-12  6:17                 ` Peter Chang
  0 siblings, 1 reply; 41+ messages in thread
From: Paul Mackerras @ 1999-08-12  4:52 UTC (permalink / raw)
  To: weasel; +Cc: linuxppc-dev, rth


Peter Chang <weasel@cs.stanford.edu> wrote:

> When I did glide for the mac it definitely helped not do do an eieio 
> after every pci write. The current generations of 3dfx hw use a sw 
> managed fifo, and an eieio was only necessary when the sw layer 
> needed to do do things to insert a 'barrier' in the fifo for later 
> accounting.

Interesting.  What was the magnitude of the effect?  Are we talking
about 1%, 10%, or 100% faster?

This would have been from user level, right?  Driving a 3D card
through a kernel device driver would seem to be a bit painful.

Would it have been possible to use double-precision floating loads and
stores to transfer 8 bytes at a time?  That can double the available
bandwidth to PCI devices under some conditions.

Thinking about it, it seems to me that if your device needs to be fed
so fast that the eieio makes a difference, you *should* be feeding it
from user level rather than the kernel anyway, so then the behaviour
of readl/writel is irrelevant.

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  4:40             ` Richard Henderson
@ 1999-08-12  5:00               ` Paul Mackerras
  1999-08-12  5:43                 ` Richard Henderson
  1999-08-12  5:16               ` David Edelsohn
  1 sibling, 1 reply; 41+ messages in thread
From: Paul Mackerras @ 1999-08-12  5:00 UTC (permalink / raw)
  To: rth; +Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev


Richard Henderson <rth@cygnus.com> wrote:

> But it isn't zero cost.  It's not high cost, but that's not the same
> thing.

Show us the numbers?

I'm starting to sound like Larry McVoy, I know. :-)

The measurements I did showed no measurable difference in performance
for copying stuff around a framebuffer on PPC (Richard, I guess you
may not have seen that post).  As far as PPC is concerned, I am
unwilling to break drivers for the sake of an infinitesimal
performance gain.  I don't believe the frame-buffer guys will actually
see any measurable improvement in performance from taking out the
eieio from readl/writel on PPC.

Of course, it may be different on alpha, and I would be very
interested to know how big the effect is there.

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  4:40             ` Richard Henderson
  1999-08-12  5:00               ` Paul Mackerras
@ 1999-08-12  5:16               ` David Edelsohn
  1999-08-12  5:27                 ` Paul Mackerras
                                   ` (2 more replies)
  1 sibling, 3 replies; 41+ messages in thread
From: David Edelsohn @ 1999-08-12  5:16 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Paul.Mackerras, Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev,
	linux-fbdev


>>>>> Richard Henderson writes:

>> My position is that if you can provide the ordering at essentially
>> zero cost, then it is an advantage to have it since more drivers will
>> work that way.

Richard> But it isn't zero cost.  It's not high cost, but that's not the same thing.

	Is your assumption that you want to provide the infrastructure to
write high-performance device drivers or to write device drivers that
don't require as much expertise and knowledge to produce correct results?
There are conflicting goals in this design providing different benefits to
Linux. 

David


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  5:16               ` David Edelsohn
@ 1999-08-12  5:27                 ` Paul Mackerras
  1999-08-12  5:52                 ` Richard Henderson
  1999-08-12  7:32                 ` Jes Sorensen
  2 siblings, 0 replies; 41+ messages in thread
From: Paul Mackerras @ 1999-08-12  5:27 UTC (permalink / raw)
  To: dje; +Cc: rth, Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev


David Edelsohn <dje@watson.ibm.com> wrote:

> 	Is your assumption that you want to provide the infrastructure to
> write high-performance device drivers or to write device drivers that
> don't require as much expertise and knowledge to produce correct results?

Interesting question.

I guess I would be trying both to make it easy to write device drivers
that work, and possible to write very high-performance device drivers.
Particularly since the vast majority of drivers in Linux have been
written for the i386 platform, which doesn't do pesky (;-) things like
reordering reads and writes.

In any case, as far as the question of using readl/writel in
framebuffer code goes, and whether readl/writel should include the
eieio, the measurements I did showed zero performance impact of having
the eieio, in frame-buffer copy code at least.

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  5:00               ` Paul Mackerras
@ 1999-08-12  5:43                 ` Richard Henderson
  1999-08-12  7:07                   ` Paul Mackerras
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 1999-08-12  5:43 UTC (permalink / raw)
  To: Paul.Mackerras
  Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

On Thu, Aug 12, 1999 at 03:00:46PM +1000, Paul Mackerras wrote:
> Show us the numbers?

Attached is a quick userland study wrt main memory.  How much
more accurate to a real device do I need to get to convice you
that the test is valid enough?

As I see it, testing against main memory should be the lower
bound of the numbers, since it's the quickest to respond.  A
real device will take longer to respond, so any enforced delays
(or failures to write-combine) will only exagerate the difference.

Anyway, the results (in cycles) from my 533MHz sx164 are:

10
10
10
10
10
223
94
94
94
94

So the cost of wmb for 8 store+wmb, versus 8 stores with one wmb,
is over 9:1.

> I don't believe the frame-buffer guys will actually
> see any measurable improvement in performance from taking out the
> eieio from readl/writel on PPC.

For grins, will you try the same test on your ppc?


r~

[-- Attachment #2: z.c --]
[-- Type: text/plain, Size: 652 bytes --]

#include <stdio.h>

main()
{
  int i;
  unsigned s, e;
  unsigned long mem;

  for (i = 0; i < 5; ++i)
    {
      asm("rpcc %0
	   stq $31,%2
	   stq $31,%2
	   stq $31,%2
	   stq $31,%2
	   stq $31,%2
	   stq $31,%2
	   stq $31,%2
	   wmb
	   stq $31,%2
	   rpcc %1"
	: "=r"(s), "=r"(e), "=m"(mem));
      printf("%u\n", e-s);
    }

  for (i = 0; i < 5; ++i)
    {
      asm("rpcc %0
	   stq $31,%2
	   wmb
	   stq $31,%2
	   wmb
	   stq $31,%2
	   wmb
	   stq $31,%2
	   wmb
	   stq $31,%2
	   wmb
	   stq $31,%2
	   wmb
	   stq $31,%2
	   wmb
	   stq $31,%2
	   wmb
	   rpcc %1"
	: "=r"(s), "=r"(e), "=m"(mem));
      printf("%u\n", e-s);
    }
}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  5:16               ` David Edelsohn
  1999-08-12  5:27                 ` Paul Mackerras
@ 1999-08-12  5:52                 ` Richard Henderson
  1999-08-12  7:11                   ` Paul Mackerras
  1999-08-12  7:32                 ` Jes Sorensen
  2 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 1999-08-12  5:52 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Paul.Mackerras, Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev,
	linux-fbdev, Richard Henderson


On Thu, Aug 12, 1999 at 01:16:14AM -0400, David Edelsohn wrote:
> 	Is your assumption that you want to provide the infrastructure to
> write high-performance device drivers or to write device drivers that
> don't require as much expertise and knowledge to produce correct results?

I prefer high-performance drivers. 

There are enough other things (virt_to_bus, ioremap, et al) that are
non-optional that driver writers must learn about for non-peecee
driver programming that proper use of mb/wmb doesn't seem that big a
deal to me.

I guess I personally can afford to be somewhat idealistic in this,
because I only use about 4 drivers -- ncr, aic7xxx, tulip, epic100 --
and the authors of all these drivers have clue.  But the thought of
coddling to folks that can't be bothered to do things Right gives
me hives.



r~

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  4:52               ` Paul Mackerras
@ 1999-08-12  6:17                 ` Peter Chang
  0 siblings, 0 replies; 41+ messages in thread
From: Peter Chang @ 1999-08-12  6:17 UTC (permalink / raw)
  To: Paul.Mackerras; +Cc: linuxppc-dev, rth


At 14:52 +1000 08.12.1999, Paul Mackerras wrote:
>Peter Chang <weasel@cs.stanford.edu> wrote:
>
> > When I did glide for the mac it definitely helped not do do an eieio
> > after every pci write. The current generations of 3dfx hw use a sw
> > managed fifo, and an eieio was only necessary when the sw layer
> > needed to do do things to insert a 'barrier' in the fifo for later
> > accounting.
>
>Interesting.  What was the magnitude of the effect?  Are we talking
>about 1%, 10%, or 100% faster?

It depended on the actual benchmark. A synthetic benchmark for a 
specific thing (flat triangles, gouraud triangls, texture download, 
etc) showed the biggest hits (~1% - 20% if memory serves). Actual 
game tests varied depending on their actual scene complexity, but had 
a similar effect.

>This would have been from user level, right?

Glide is always user level. (Well, on win32 there is a little driver 
level thing that does the mapping etc).

>Would it have been possible to use double-precision floating loads and
>stores to transfer 8 bytes at a time?  That can double the available
>bandwidth to PCI devices under some conditions.

I did not do this, but I know that Ken (the actual mac guy at 3dfx) 
did this and got some impressive improvements. I did this for texture 
downloads and stuff for 3DNow! machines (amd k6 and k7), and got 
really impressive results. (More so on the k6 because of the lack of 
write combining).

>Thinking about it, it seems to me that if your device needs to be fed
>so fast that the eieio makes a difference, you *should* be feeding it
>from user level rather than the kernel anyway, so then the behaviour
>of readl/writel is irrelevant.

That's true since the level switch will probably kill you anyway. I 
was just piping in w/ my $0.02 (a little off topic, you're right) 
that the eieio is not w/o its costs.

\p

---
Underneath this flabby exterior is an enormous lack of character.
                     -- Oscar Levant

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  5:43                 ` Richard Henderson
@ 1999-08-12  7:07                   ` Paul Mackerras
  1999-08-12  7:33                     ` Richard Henderson
                                       ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Paul Mackerras @ 1999-08-12  7:07 UTC (permalink / raw)
  To: rth; +Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev


Richard Henderson <rth@cygnus.com> wrote:

> As I see it, testing against main memory should be the lower
> bound of the numbers, since it's the quickest to respond.  A
> real device will take longer to respond, so any enforced delays
> (or failures to write-combine) will only exagerate the difference.

Hmmm, no, doesn't it go the other way around?

Going to L1 cache will mean that we can isolate the overhead of the
wmb, and will exaggerate the ratio between the two cases.

A real device that takes longer to respond will make the overhead of
the wmb a smaller fraction of the total time.  And you would hope that
the cpu could overlap the wmb, or at least the time to decode and
issue it, with the time waiting for the device to respond.

> Anyway, the results (in cycles) from my 533MHz sx164 are:
> 
> 10

One-cycle access to L1 cache, I guess?

> 10
> 10
> 10
> 10
> 223

Because of i-cache misses, presumably

> 94
> 94
> 94
> 94
> 
> So the cost of wmb for 8 store+wmb, versus 8 stores with one wmb,
> is over 9:1.

Interesting.  Sounds like each wmb takes about 12 cycles ((94-10)/7),
which sounds a bit like it is going all the way out to the memory bus
and back before the cpu does the next instruction.

(Ob. nitpicking: if a wmb takes 12 cycles, how come we can do a wmb
and 8 stores in 10 cycles? :-)

> For grins, will you try the same test on your ppc?

Sure, happy to.

I think I have correctly understood the alpha assembly syntax.  My PPC
version is below.  I've added a couple of things.  First, PPC has a
`timebase' register which counts at 1/4 of the bus clock, which means
once every 16 cycles on my G3 desktop at work.  For this reason I have
put a loop around the sets of stores to do them 16 times.  The
overhead of the loop should be zero (the branch is pretty easily
predictable :-).  The numbers should thus be cycles per iteration.

Secondly, I added stuff to mmap a framebuffer and do the stores to a
word in it, just for grins.

The results tended to vary quite a lot from run to run, but here's a
typical set:

17 10 9 9 9
24 17 16 16 16
732 731 736 786 727
666 755 840 774 801

So the eieio doesn't look to be nearly as expensive on PPC as wmb is
on alpha.  (16 - 9) / 7 = 1 cycle for the eieio, which is going to be
insignificant in the context of an access to a device register, which
can easily take ~ 50 to 100 cycles.

The average of the 3rd line is 742, and of the 4th line is 767.  But
given the spread of the numbers, I don't think that the difference is
statistically significant.  This is going to the framebuffer on an ATI
Rage chip.  760 cycles is 95 cpu cycles per access, or about 350ns.  I
guess ATI chips expect you to use the drawing engine if you are doing
any significant amount of stuff. :-)

What numbers do you get on alpha if you point it at a framebuffer,
just for interest?

Paul.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>

test(unsigned long *ptr)
{
  int i;
  unsigned s, e;

  for (i = 0; i < 5; ++i)
    {
      asm("mftb %0
	   mtctr %3
	1: stw 16,%2
	   stw 16,%2
	   stw 16,%2
	   stw 16,%2
	   stw 16,%2
	   stw 16,%2
	   stw 16,%2
	   eieio
	   stw 16,%2
	   bdnz 1b
	   mftb %1"
	: "=r"(s), "=r"(e), "=m"(*ptr)
	: "r"(16));
      printf("%u ", e-s);
    }
  printf("\n");

  for (i = 0; i < 5; ++i)
    {
      asm("mftb %0
	   mtctr %3
	1: stw 16,%2
	   eieio
	   stw 16,%2
	   eieio
	   stw 16,%2
	   eieio
	   stw 16,%2
	   eieio
	   stw 16,%2
	   eieio
	   stw 16,%2
	   eieio
	   stw 16,%2
	   eieio
	   stw 16,%2
	   eieio
	   bdnz 1b
	   mftb %1"
	: "=r"(s), "=r"(e), "=m"(*ptr)
	: "r"(16));
      printf("%u ", e-s);
    }
  printf("\n");
}

#define PAGESIZE	0x1000

main(int ac, char **av)
{
	unsigned long base, offset;
	int fd;
	unsigned long mem;
	unsigned long *ptr;

	test(&mem);
	if (ac > 1) {
		base = strtoul(av[1], 0, 16);
		offset = (base & (PAGESIZE - 1)) / sizeof(unsigned long);
		base &= -PAGESIZE;
		if ((fd = open("/dev/mem", 2)) < 0) {
			perror("/dev/mem");
			exit(1);
		}
		ptr = (unsigned long *)
			mmap(0, PAGESIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, base);
		if ((long)ptr == -1) {
			perror("mmap");
			exit(1);
		}
		test(ptr + offset);
	}
	exit(0);
}

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  5:52                 ` Richard Henderson
@ 1999-08-12  7:11                   ` Paul Mackerras
  0 siblings, 0 replies; 41+ messages in thread
From: Paul Mackerras @ 1999-08-12  7:11 UTC (permalink / raw)
  To: rth; +Cc: dje, Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev,
	rth


Richard Henderson <rth@cygnus.com> wrote:

> I prefer high-performance drivers. 

Sure, so do I.  But when I can get safety as well, for the cost of one
extra cpu cycle per device access, which can probably be overlapped
with the device access anyway, I think it's a good deal.

On alpha, does wmb() stop a subsequent load from being moved ahead of
a previous store?  Or do you have to use mb() to get that effect?

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  5:16               ` David Edelsohn
  1999-08-12  5:27                 ` Paul Mackerras
  1999-08-12  5:52                 ` Richard Henderson
@ 1999-08-12  7:32                 ` Jes Sorensen
  2 siblings, 0 replies; 41+ messages in thread
From: Jes Sorensen @ 1999-08-12  7:32 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Richard Henderson, Paul.Mackerras, linuxppc-dev, linux-fbdev


>>>>> "David" == David Edelsohn <dje@watson.ibm.com> writes:

>>>>> Richard Henderson writes:
Richard> But it isn't zero cost.  It's not high cost, but that's not
Richard> the same thing.

David> 	Is your assumption that you want to provide the infrastructure
David> to write high-performance device drivers or to write device
David> drivers that don't require as much expertise and knowledge to
David> produce correct results?  There are conflicting goals in this
David> design providing different benefits to Linux.

I am certainly up for high performance device drivers. Even having
writel do the syncing there are enough other pitfalls for people to
take into account. Some of these are much harder to understand than
dealing with write ordering and as such, trying to make things
invisible are only giving us a false guarantee. Ie. if we want to use
spin locks in the kernel we need to teach people how to use them
correctly.

Jes

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  7:07                   ` Paul Mackerras
@ 1999-08-12  7:33                     ` Richard Henderson
  1999-08-12  9:58                       ` Paul Mackerras
  1999-08-12 12:31                     ` Geert Uytterhoeven
  1999-08-13 18:33                     ` Richard Henderson
  2 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 1999-08-12  7:33 UTC (permalink / raw)
  To: Paul.Mackerras
  Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev,
	Richard Henderson


On Thu, Aug 12, 1999 at 05:07:02PM +1000, Paul Mackerras wrote:
> > 10
> 
> One-cycle access to L1 cache, I guess?

No, 2 Cycles to L1 cache.  One cycle to execute the store,
which merely adds an entry to the store buffer.

> > 223
> 
> Because of i-cache misses, presumably

Presumably.  The 10 and 94 numbers are all that's interesting.

> Interesting.  Sounds like each wmb takes about 12 cycles ((94-10)/7),
> which sounds a bit like it is going all the way out to the memory bus
> and back before the cpu does the next instruction.
> 
> (Ob. nitpicking: if a wmb takes 12 cycles, how come we can do a wmb
> and 8 stores in 10 cycles? :-)

Because it doesn't work like that.  wmb adds a magic token to the
store buffer that prevents write combining and other such hw
optimizations.  Timing

	stq $31,addr
	stq $31,addr+8
vs
	stq $31,addr
	wmb
	stq $31,addr+8

shows only 1 cycle difference between the two.  I'm not quite sure
how the 12 works out.  I do know that L2 cache is 12 cycles away,
but that may just be coincidence.

Going all the way out to the memory bus would take a whole lot 
longer than 12 cycles.  More like 36.

> What numbers do you get on alpha if you point it at a framebuffer,
> just for interest?

I'll give that a try tomorrow.


r~

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-11 23:52         ` Paul Mackerras
@ 1999-08-12  7:38           ` Jes Sorensen
  1999-08-12 19:00           ` David A. Gatwood
  1 sibling, 0 replies; 41+ messages in thread
From: Jes Sorensen @ 1999-08-12  7:38 UTC (permalink / raw)
  To: Paul.Mackerras; +Cc: Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth


>>>>> "Paul" == Paul Mackerras <paulus@cs.anu.edu.au> writes:

Paul> Jes Sorensen <Jes.Sorensen@cern.ch> wrote:
>> I will have to disagree with you on this one, I consider the PPC
>> implementation to be very broken in this regard.

Paul> "Very broken" - because drivers work and there is no measurable
Paul> performance impact?? !!! ??

Paul> The only possible argument for *not* having the eieio in
Paul> readl/writel is that it hurts performance (actually and
Paul> measurably, not just potentially).

Ok strong wording maybe.

I am just quite displeased when people try to hide the real world from
programmers because most code was written for the x86 by people
without a clue. In the long term I think that sort of approach is
going to bite us since code will not get fixed where it should.

Jes

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  7:33                     ` Richard Henderson
@ 1999-08-12  9:58                       ` Paul Mackerras
  0 siblings, 0 replies; 41+ messages in thread
From: Paul Mackerras @ 1999-08-12  9:58 UTC (permalink / raw)
  To: rth; +Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth


Richard Henderson <rth@cygnus.com> wrote:

> No, 2 Cycles to L1 cache.  One cycle to execute the store,
> which merely adds an entry to the store buffer.

Yes, of course, silly me.  Same on PPC.

> > (Ob. nitpicking: if a wmb takes 12 cycles, how come we can do a wmb
> > and 8 stores in 10 cycles? :-)
> 
> Because it doesn't work like that.  wmb adds a magic token to the
> store buffer that prevents write combining and other such hw
> optimizations.  Timing

Then why is there such a big performance impact from the wmb's?

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  7:07                   ` Paul Mackerras
  1999-08-12  7:33                     ` Richard Henderson
@ 1999-08-12 12:31                     ` Geert Uytterhoeven
  1999-08-13 12:18                       ` Paul Mackerras
  1999-08-18 11:02                       ` Gabriel Paubert
  1999-08-13 18:33                     ` Richard Henderson
  2 siblings, 2 replies; 41+ messages in thread
From: Geert Uytterhoeven @ 1999-08-12 12:31 UTC (permalink / raw)
  To: Paul.Mackerras; +Cc: rth, Jes.Sorensen, linuxppc-dev, linux-fbdev


On Thu, 12 Aug 1999, Paul Mackerras wrote:
> Richard Henderson <rth@cygnus.com> wrote:
> The results tended to vary quite a lot from run to run, but here's a
> typical set:
> 
> 17 10 9 9 9
> 24 17 16 16 16
> 732 731 736 786 727
> 666 755 840 774 801
> 
> So the eieio doesn't look to be nearly as expensive on PPC as wmb is
> on alpha.  (16 - 9) / 7 = 1 cycle for the eieio, which is going to be

I'm seeing different things (results don't tend to vary a lot):

| [14:27:01]/tmp# ./a.out 0xc2800000
| 35 29 30 31 28 
| 261 251 247 248 248 
| 429 332 358 374 348 
| 541 532 529 531 529 
| [14:27:05]/tmp# 

Hence eieio() is quite expensive on memory.

This in on an IBM LongTrail (CHRP), with 604e at 200 MHz, 512 KB L2 cache,
66 MHz SDRAM bus, and 33 MHz PCI to an ATI RAGE II+.

> insignificant in the context of an access to a device register, which
> can easily take ~ 50 to 100 cycles.

For ISA (through PCI/ISA bridge). Isn't real PCI faster?

Greetings,

						Geert

--
Geert Uytterhoeven                     Geert.Uytterhoeven@cs.kuleuven.ac.be
Wavelets, Linux/{m68k~Amiga,PPC~CHRP}  http://www.cs.kuleuven.ac.be/~geert/
Department of Computer Science -- Katholieke Universiteit Leuven -- Belgium


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-11 23:52         ` Paul Mackerras
  1999-08-12  7:38           ` Jes Sorensen
@ 1999-08-12 19:00           ` David A. Gatwood
  1999-08-13  1:51             ` Paul Mackerras
  1 sibling, 1 reply; 41+ messages in thread
From: David A. Gatwood @ 1999-08-12 19:00 UTC (permalink / raw)
  To: Paul.Mackerras
  Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth


On Thu, 12 Aug 1999, Paul Mackerras wrote:

> Jes Sorensen <Jes.Sorensen@cern.ch> wrote:
> 
> > I will have to disagree with you on this one, I consider the PPC
> > implementation to be very broken in this regard.
> 
> "Very broken" - because drivers work and there is no measurable
> performance impact?? !!! ??
> 
> The only possible argument for *not* having the eieio in readl/writel
> is that it hurts performance (actually and measurably, not just
> potentially).

No, that's not the only argument.  eieio and... isync, I think... causes
the PPC 601 to shift one of its registers a few bits and send out an
address only transaction using the address that results from that.  I
can't remember which register off the top of my head.  MkLinux ran into
this in the late Pre-DR3 stage and it nearly cost us a large percentage of
x100 support do to a hardware bug that can cause the machine to hang if an
address only transaction is done into certain parts of the address space.

The workaround is a really nasty bunch of code that creates a sizable
performance hit by forcing that register to be cleared before the eieio
and restored afterwards.  As a result, putting eieio in those macros will
have a _very_ major performance hit if yuo ever start supporting x100
PowerMacs.  It will also require lots of really nasty #ifdef structures in
the readl and writel code that can be avoided just by making a macro
eieio() and using it only where needed.  It will also greatly decrease the
headaches for x100 folks have in their efforts to find all the eieios and
figure out why their machines crash randomly.  :-)


David


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12 19:00           ` David A. Gatwood
@ 1999-08-13  1:51             ` Paul Mackerras
  0 siblings, 0 replies; 41+ messages in thread
From: Paul Mackerras @ 1999-08-13  1:51 UTC (permalink / raw)
  To: dgatwood; +Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth


David A. Gatwood <dgatwood@mvista.com> wrote:

> No, that's not the only argument.  eieio and... isync, I think... causes
> the PPC 601 to shift one of its registers a few bits and send out an
> address only transaction using the address that results from that.  I

I can understand eieio causing an address-only transaction, but the
address should just be ignored.

> can't remember which register off the top of my head.  MkLinux ran into
> this in the late Pre-DR3 stage and it nearly cost us a large percentage of
> x100 support do to a hardware bug that can cause the machine to hang if an
> address only transaction is done into certain parts of the address space.

Hmmm, I didn't see any such problems with the 7200 and 7500 powermacs,
which have a 601 cpu.  It's a hardware bug in the x100's memory
controller or nubus bridge, right?  I guess it's lucky you actually
have some control over the address that gets put out. :-)

> The workaround is a really nasty bunch of code that creates a sizable
> performance hit by forcing that register to be cleared before the eieio
> and restored afterwards.  As a result, putting eieio in those macros will
> have a _very_ major performance hit if yuo ever start supporting x100

Surely it should only take a couple of cycles to move a register to
another and clear it?  I agree it's a pain though.  Actually, with gcc
the asm statement that uses the eieio could just specify the register
(which one is it?) as an input and give it the value 0.

I think a resolution of this issue is going to have to involve Linus
and the whole Linux community.  We may need two forms of the bus
access macros, one with the eieio's and one without.  I think the
`ordinary' form should have the eieio's, though.

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12 12:31                     ` Geert Uytterhoeven
@ 1999-08-13 12:18                       ` Paul Mackerras
  1999-08-18 11:02                       ` Gabriel Paubert
  1 sibling, 0 replies; 41+ messages in thread
From: Paul Mackerras @ 1999-08-13 12:18 UTC (permalink / raw)
  To: Geert.Uytterhoeven; +Cc: rth, Jes.Sorensen, linuxppc-dev, linux-fbdev


Geert Uytterhoeven <Geert.Uytterhoeven@cs.kuleuven.ac.be> wrote:

> I'm seeing different things (results don't tend to vary a lot):
> 
> | [14:27:01]/tmp# ./a.out 0xc2800000
> | 35 29 30 31 28 
> | 261 251 247 248 248 
> | 429 332 358 374 348 
> | 541 532 529 531 529 
> | [14:27:05]/tmp# 
> 
> Hence eieio() is quite expensive on memory.
> 
> This in on an IBM LongTrail (CHRP), with 604e at 200 MHz, 512 KB L2 cache,
> 66 MHz SDRAM bus, and 33 MHz PCI to an ATI RAGE II+.

I tried it on my longtrail, with a 300MHz 604 machV.  I changed the
loop count to 18 since that is the ratio of cpu clock to timebase
clock on this machine.  (You should probably use 12 on your machine.)

I got results much like yours:

23 23 20 20 21  av=21.4
180 175 175 175 175  av=176.0
288 358 275 359 309  av=317.8
375 400 351 423 351  av=380.0

So yes, in this case adding the eieios costs about 22 cycles each when
going to main memory, or 9 cycles each when going to the framebuffer.
I guess that when going to the framebuffer, much of the latency of the
eieio gets hidden.

It would be interesting to try a mix of loads and stores to the
framebuffer, perhaps 4 loads followed by 4 stores to get the effect of
a bitblt routine.  I tried my framebuffer-copy test on my 7600, which
has 200MHz 604e cpus, and I didn't see any difference in overall time
for the test, whether there were eieio's in or not.

This morning I read something in the PPC750 manual which implied that
the G3 doesn't reorder stores, and doesn't reorder non-cacheable
accesses.  That would mean eieio could be a no-op, which could help
explain why it only takes 1 cycle on a G3. :-)

(Not reordering non-cacheable accesses actually makes a lot of sense
to me.)

I think that probably the best thing is to have safe and fast variants
of readl/writel etc.  For the sake of not having to change a whole
heap of drivers (whose maintainers use x86 cpus :-() I would urge that
readl/writel include the eieio, and that we have readl_fast,
writel_fast etc. which don't include the eieio.

I would still be interested to see overall timings for frame-buffer
operations with and without the eieios.

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12  7:07                   ` Paul Mackerras
  1999-08-12  7:33                     ` Richard Henderson
  1999-08-12 12:31                     ` Geert Uytterhoeven
@ 1999-08-13 18:33                     ` Richard Henderson
  2 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 1999-08-13 18:33 UTC (permalink / raw)
  To: Paul.Mackerras
  Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev,
	Richard Henderson


On Thu, Aug 12, 1999 at 05:07:02PM +1000, Paul Mackerras wrote:
> What numbers do you get on alpha if you point it at a framebuffer,
> just for interest?

With some additional numbers for mb vs wmb --

Memory:
none       15    11    11    11    11
1 wmb      10    10    10    10    10
1 mb      140   129    62    59    59
8 wmb     171   157   101   101   101
8 mb      346   270   270   267   267

Millenium2 fb:
none     2599    11    11    11    11
1 wmb      10    10    10    10    10
1 mb      220   130   139   139   139
8 wmb     192   178   192   192   192
8 mb      538   423   423   423   423



r~

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
       [not found] ` <d3so5mdyta.fsf@lxp03.cern.ch>
@ 1999-08-14 18:34   ` Geert Uytterhoeven
  1999-08-14 18:36   ` David A. Gatwood
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 41+ messages in thread
From: Geert Uytterhoeven @ 1999-08-14 18:34 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: David A. Gatwood, Paul.Mackerras, linuxppc-dev, linux-fbdev, rth


On 14 Aug 1999, Jes Sorensen wrote:
> >>>>> "David" == David A Gatwood <dgatwood@mvista.com> writes:
> David> On Fri, 13 Aug 1999, Paul Mackerras wrote:
> >> Surely it should only take a couple of cycles to move a register to
> >> another and clear it?  I agree it's a pain though.  Actually, with
> >> gcc the asm statement that uses the eieio could just specify the
> >> register (which one is it?) as an input and give it the value 0.
> 
> David> For some reason, just loading the value isn't enough.  The code
> David> that Gilbert put in as a workaround shortly before DR3 looks
> David> like this:
> 
> David> #define eieio() __asm__ volatile("li 0,0: cmpwi 0,0; bne+ 0f;
> David> eieio; 0:" : : : "0")
> 
> Defininf a C function with the name of a PPC specific assembler
> function is pretty stupid. To the best of my knowledge wmb() is the

Well, David was talking about replacing the eieio() macro (which Linux/PPC had
since ages) by something that works around bugs in the hardware.

> generic name for the thing you are looking for.

Time to start grepping in include/asm-alpha/system.h. Huh, Alpha AXP has no
`rmb' mnemonic ;-)

Greetings,

						Geert

--
Geert Uytterhoeven                     Geert.Uytterhoeven@cs.kuleuven.ac.be
Wavelets, Linux/{m68k~Amiga,PPC~CHRP}  http://www.cs.kuleuven.ac.be/~geert/
Department of Computer Science -- Katholieke Universiteit Leuven -- Belgium



[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
       [not found] ` <d3so5mdyta.fsf@lxp03.cern.ch>
  1999-08-14 18:34   ` Geert Uytterhoeven
@ 1999-08-14 18:36   ` David A. Gatwood
  1999-08-14 19:48     ` Jes Sorensen
  1999-08-14 21:39   ` Richard Henderson
  1999-08-15 23:16   ` Paul Mackerras
  3 siblings, 1 reply; 41+ messages in thread
From: David A. Gatwood @ 1999-08-14 18:36 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Paul.Mackerras, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev,
	rth


On 14 Aug 1999, Jes Sorensen wrote:

> David> #define eieio() __asm__ volatile("li 0,0: cmpwi 0,0; bne+ 0f;
> David> eieio; 0:" : : : "0")
> 
> Defininf a C function with the name of a PPC specific assembler
> function is pretty stupid. To the best of my knowledge wmb() is the
> generic name for the thing you are looking for.

Keep in mind, I'm talking about MkLinux, _not_ LinuxPPC.  wmb() is a
linux-specific term, as far as I know.  The above is in mach.


David


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-14 18:36   ` David A. Gatwood
@ 1999-08-14 19:48     ` Jes Sorensen
  1999-08-15  1:28       ` David A. Gatwood
  0 siblings, 1 reply; 41+ messages in thread
From: Jes Sorensen @ 1999-08-14 19:48 UTC (permalink / raw)
  To: David A. Gatwood
  Cc: Paul.Mackerras, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev,
	rth


>>>>> "David" == David A Gatwood <dgatwood@mvista.com> writes:

David> On 14 Aug 1999, Jes Sorensen wrote: #define eieio() __asm__
David> volatile("li 0,0: cmpwi 0,0; bne+ 0f; eieio; 0:" : : : "0")
>>  Defininf a C function with the name of a PPC specific assembler
>> function is pretty stupid. To the best of my knowledge wmb() is the
>> generic name for the thing you are looking for.

David> Keep in mind, I'm talking about MkLinux, _not_ LinuxPPC.  wmb()
David> is a linux-specific term, as far as I know.  The above is in
David> mach.

Urgh

Ok we were discussing the normal kernel here.

Jes

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
       [not found] ` <d3so5mdyta.fsf@lxp03.cern.ch>
  1999-08-14 18:34   ` Geert Uytterhoeven
  1999-08-14 18:36   ` David A. Gatwood
@ 1999-08-14 21:39   ` Richard Henderson
  1999-08-15 23:16   ` Paul Mackerras
  3 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 1999-08-14 21:39 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: David A. Gatwood, Paul.Mackerras, Geert.Uytterhoeven,
	linuxppc-dev, linux-fbdev


On Sat, Aug 14, 1999 at 08:03:45PM +0200, Jes Sorensen wrote:
> Defininf a C function with the name of a PPC specific assembler
> function is pretty stupid. To the best of my knowledge wmb() is the
> generic name for the thing you are looking for.

Keep in mind that wmb is the name of an Alpha specific assembler insn.
Blame Linus.  ;-)


r~

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-14 19:48     ` Jes Sorensen
@ 1999-08-15  1:28       ` David A. Gatwood
  0 siblings, 0 replies; 41+ messages in thread
From: David A. Gatwood @ 1999-08-15  1:28 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Paul.Mackerras, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev,
	rth


On 14 Aug 1999, Jes Sorensen wrote:

> >>>>> "David" == David A Gatwood <dgatwood@mvista.com> writes:
> 
> David> On 14 Aug 1999, Jes Sorensen wrote: #define eieio() __asm__
> David> volatile("li 0,0: cmpwi 0,0; bne+ 0f; eieio; 0:" : : : "0")
> >>  Defininf a C function with the name of a PPC specific assembler
> >> function is pretty stupid. To the best of my knowledge wmb() is the
> >> generic name for the thing you are looking for.
> 
> David> Keep in mind, I'm talking about MkLinux, _not_ LinuxPPC.  wmb()
> David> is a linux-specific term, as far as I know.  The above is in
> David> mach.
> 
> Urgh
> 
> Ok we were discussing the normal kernel here.

Breakdown in communication again.  What I'm talking about is what
MkLinux's Mach Kernel did to get support for certain machines.  That's why
it was eieio().  It's the _equivalent_ of what LinuxPPC would have to do
to support the same machines, though the details of the names and stuff
would be different.  :-)


David


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
       [not found] <m3672hkxri.fsf@soma.andreas.org>
@ 1999-08-15 13:39 ` James Simmons
  0 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 1999-08-15 13:39 UTC (permalink / raw)
  To: Andreas Bogk; +Cc: linuxppc-dev, linux-fbdev



> Actually the name of the instruction is a joke by some unnamed IBM
> engineer (you know that children's song, "Old McDonalds had a farm,
> eieio..."),

I kind of figured that. It could of have been Enchaned Interface Extended
IO but I know for sure now. 


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
       [not found] <d3pv0p72yr.fsf@lxp03.cern.ch>
@ 1999-08-15 19:43 ` David A. Gatwood
  0 siblings, 0 replies; 41+ messages in thread
From: David A. Gatwood @ 1999-08-15 19:43 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Richard Henderson, Paul.Mackerras, Geert.Uytterhoeven,
	linuxppc-dev, linux-fbdev


On 15 Aug 1999, Jes Sorensen wrote:

> Richard> Keep in mind that wmb is the name of an Alpha specific
> Richard> assembler insn.  Blame Linus.  ;-)
> 
> Yeah I know that, but it is still a hell of a lot more explaining than
> something that when pronounced sounds like someone having certain
> vital parts of his anatomy cut off.

No, it's pronounced as in "Old Macdownald had a farm, e-i-e-i-o."  :-)
'course it's also short for ensure in-order execution for i/o, probably
about as descriptive as you can get.  ;-)


David


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
       [not found] ` <d3so5mdyta.fsf@lxp03.cern.ch>
                     ` (2 preceding siblings ...)
  1999-08-14 21:39   ` Richard Henderson
@ 1999-08-15 23:16   ` Paul Mackerras
  1999-08-16  0:29     ` Richard Henderson
  1999-08-16  7:11     ` Jes Sorensen
  3 siblings, 2 replies; 41+ messages in thread
From: Paul Mackerras @ 1999-08-15 23:16 UTC (permalink / raw)
  To: Jes.Sorensen; +Cc: dgatwood, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth


Jes Sorensen <Jes.Sorensen@cern.ch> wrote:

> Defininf a C function with the name of a PPC specific assembler
> function is pretty stupid. To the best of my knowledge wmb() is the
> generic name for the thing you are looking for.

Not exactly, in fact iobarrier() is probably better.

The eieio instruction has two separate effects:

1. as a write barrier for writes to cacheable memory (hence wmb)

2. as a read/write barrier for reads and writes to non-cacheable
   memory (hence iobarrier)

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-15 23:16   ` Paul Mackerras
@ 1999-08-16  0:29     ` Richard Henderson
  1999-08-16  7:11     ` Jes Sorensen
  1 sibling, 0 replies; 41+ messages in thread
From: Richard Henderson @ 1999-08-16  0:29 UTC (permalink / raw)
  To: Paul.Mackerras
  Cc: Jes.Sorensen, dgatwood, Geert.Uytterhoeven, linuxppc-dev,
	linux-fbdev


On Mon, Aug 16, 1999 at 09:16:47AM +1000, Paul Mackerras wrote:
> The eieio instruction has two separate effects:
> 
> 1. as a write barrier for writes to cacheable memory (hence wmb)
> 
> 2. as a read/write barrier for reads and writes to non-cacheable
>    memory (hence iobarrier)

Ah.  The Alpha wmb instruction does _not_ have this second effect.


r~

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-15 23:16   ` Paul Mackerras
  1999-08-16  0:29     ` Richard Henderson
@ 1999-08-16  7:11     ` Jes Sorensen
  1 sibling, 0 replies; 41+ messages in thread
From: Jes Sorensen @ 1999-08-16  7:11 UTC (permalink / raw)
  To: Paul.Mackerras
  Cc: dgatwood, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth


>>>>> "Paul" == Paul Mackerras <paulus@cs.anu.edu.au> writes:

Paul> Jes Sorensen <Jes.Sorensen@cern.ch> wrote:
>> Defininf a C function with the name of a PPC specific assembler
>> function is pretty stupid. To the best of my knowledge wmb() is the
>> generic name for the thing you are looking for.

Paul> Not exactly, in fact iobarrier() is probably better.

Hmmmm

Probably right, as long as it is made clear that iobarrier() means
memory mapped I/O and not I/O mapped I/O. The name is easy to
misunderstand IMHO.

Other than that I agree, question is whether we need to add more
functions - it seems we are already overly confused.

Jes

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
  1999-08-12 12:31                     ` Geert Uytterhoeven
  1999-08-13 12:18                       ` Paul Mackerras
@ 1999-08-18 11:02                       ` Gabriel Paubert
  1 sibling, 0 replies; 41+ messages in thread
From: Gabriel Paubert @ 1999-08-18 11:02 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Paul.Mackerras, rth, Jes.Sorensen, linuxppc-dev, linux-fbdev




On Thu, 12 Aug 1999, Geert Uytterhoeven wrote:

> 
> On Thu, 12 Aug 1999, Paul Mackerras wrote:
> > Richard Henderson <rth@cygnus.com> wrote:
> > The results tended to vary quite a lot from run to run, but here's a
> > typical set:
> > 
> > 17 10 9 9 9
> > 24 17 16 16 16
> > 732 731 736 786 727
> > 666 755 840 774 801
> > 
> > So the eieio doesn't look to be nearly as expensive on PPC as wmb is
> > on alpha.  (16 - 9) / 7 = 1 cycle for the eieio, which is going to be
> 
> I'm seeing different things (results don't tend to vary a lot):
> 
> | [14:27:01]/tmp# ./a.out 0xc2800000
> | 35 29 30 31 28 
> | 261 251 247 248 248 
> | 429 332 358 374 348 
> | 541 532 529 531 529 
> | [14:27:05]/tmp# 
> 
> Hence eieio() is quite expensive on memory.
> 
> This in on an IBM LongTrail (CHRP), with 604e at 200 MHz, 512 KB L2 cache,
> 66 MHz SDRAM bus, and 33 MHz PCI to an ATI RAGE II+.

Not surprising, on 603 and G3, eieio is an internal operation (it
prevents some forms of write combining on the G3). On 604 (and
601 AFAIR) every eieio translates into an actual bus cycle, which takes
time. Don't ask me exactly why (probably SMP issues).

However, expect the cost of always inserting an eieio to become huge
on a G4  if it ever comes out: it has longer memory queues and should
perform more aggressive combinations of memory operations from adjacent
addresses. 

Also a smart host bridge can merge writes from a processor into a burst
PCI transaction, the eieio cycle tells where it has to break the burst. 

> > insignificant in the context of an access to a device register, which
> > can easily take ~ 50 to 100 cycles.
> 
> For ISA (through PCI/ISA bridge). Isn't real PCI faster?

Depends on what you processor clock and whether you are speaking of reads
or writes. With posted writes which effectively stop at the host bridge,
this figure sounds exaggerated indeed (core / bus ratio between 3 and 6,
around 4 processor bus clocks for a single beat cycle).

OTOH, when filling a framebuffer, the buffers in the host bridge are
rapidly filled, write posting does not help and the figure might be
reasonable.

	Greetings,
	Gabriel.


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~1999-08-18 11:02 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
1999-08-09  8:17 readl() and friends and eieio on PPC Geert Uytterhoeven
1999-08-09 17:19 ` David A. Gatwood
1999-08-10  1:00 ` Paul Mackerras
1999-08-10  7:18   ` [linux-fbdev] " Jes Sorensen
1999-08-11  0:23     ` Paul Mackerras
1999-08-11  7:23       ` Jes Sorensen
1999-08-11  7:38         ` Richard Henderson
1999-08-12  0:13           ` Paul Mackerras
1999-08-12  1:39             ` Peter Chang
1999-08-12  4:52               ` Paul Mackerras
1999-08-12  6:17                 ` Peter Chang
1999-08-12  0:17           ` Paul Mackerras
1999-08-12  4:40             ` Richard Henderson
1999-08-12  5:00               ` Paul Mackerras
1999-08-12  5:43                 ` Richard Henderson
1999-08-12  7:07                   ` Paul Mackerras
1999-08-12  7:33                     ` Richard Henderson
1999-08-12  9:58                       ` Paul Mackerras
1999-08-12 12:31                     ` Geert Uytterhoeven
1999-08-13 12:18                       ` Paul Mackerras
1999-08-18 11:02                       ` Gabriel Paubert
1999-08-13 18:33                     ` Richard Henderson
1999-08-12  5:16               ` David Edelsohn
1999-08-12  5:27                 ` Paul Mackerras
1999-08-12  5:52                 ` Richard Henderson
1999-08-12  7:11                   ` Paul Mackerras
1999-08-12  7:32                 ` Jes Sorensen
1999-08-11 23:52         ` Paul Mackerras
1999-08-12  7:38           ` Jes Sorensen
1999-08-12 19:00           ` David A. Gatwood
1999-08-13  1:51             ` Paul Mackerras
     [not found] <m3672hkxri.fsf@soma.andreas.org>
1999-08-15 13:39 ` James Simmons
     [not found] <d3pv0p72yr.fsf@lxp03.cern.ch>
1999-08-15 19:43 ` David A. Gatwood
     [not found] <Pine.LNX.3.96.990813143741.27557B-100000@mvista.com>
     [not found] ` <d3so5mdyta.fsf@lxp03.cern.ch>
1999-08-14 18:34   ` Geert Uytterhoeven
1999-08-14 18:36   ` David A. Gatwood
1999-08-14 19:48     ` Jes Sorensen
1999-08-15  1:28       ` David A. Gatwood
1999-08-14 21:39   ` Richard Henderson
1999-08-15 23:16   ` Paul Mackerras
1999-08-16  0:29     ` Richard Henderson
1999-08-16  7:11     ` Jes Sorensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).