* readl() and friends and eieio on PPC
@ 1999-08-09 8:17 Geert Uytterhoeven
1999-08-09 17:19 ` David A. Gatwood
1999-08-10 1:00 ` Paul Mackerras
0 siblings, 2 replies; 41+ messages in thread
From: Geert Uytterhoeven @ 1999-08-09 8:17 UTC (permalink / raw)
To: Linux/PPC Development; +Cc: Linux Frame Buffer Device Development
Jes Sørensen pointed out to me that readl() and friends should not use eieio on
PPC. On other architectures (e.g. AXP) this isn't done neither.
Currently we have[*]:
#define readl(addr) in_le32((volatile unsigned *)(addr))
#define inl(port) in_le32((unsigned *)((port)+_IO_BASE))
#define inl_p(port) in_le32((unsigned *)((port)+_IO_BASE))
extern inline unsigned in_le32(volatile unsigned *addr){
unsigned ret;
__asm__ __volatile__("lwbrx %0,0,%1; eieio" : "=r" (ret) :
"r" (addr), "m" (*addr));
return ret;
}
[*] Except on APUS, where readl() uses native endianness.
Hence both inl() and readl() protect against reordering. This is not necessary
for readl(). Drivers that need to protect against reordering should use
wmb()/rmb()/mb() theirselves.
If readl() and friends don't do eieio, the fbcon-* routines won't be slowed
down by using readl() and friends (but we're still having the byte swapping
then).
And atyfb should use readl()/writel() instead of aty_{ld,st}_le32(), so we can
get rid of the inline assembler. Note that this will probably break on Atari,
since on m68k readl() doesn't do byte swapping. But that can be circumvented
with one #ifdef.
Greetings,
Geert
--
Geert Uytterhoeven Geert.Uytterhoeven@cs.kuleuven.ac.be
Wavelets, Linux/{m68k~Amiga,PPC~CHRP} http://www.cs.kuleuven.ac.be/~geert/
Department of Computer Science -- Katholieke Universiteit Leuven -- Belgium
[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting. ]]
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: readl() and friends and eieio on PPC 1999-08-09 8:17 readl() and friends and eieio on PPC Geert Uytterhoeven @ 1999-08-09 17:19 ` David A. Gatwood 1999-08-10 1:00 ` Paul Mackerras 1 sibling, 0 replies; 41+ messages in thread From: David A. Gatwood @ 1999-08-09 17:19 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Linux/PPC Development, Linux Frame Buffer Device Development On Mon, 9 Aug 1999, Geert Uytterhoeven wrote: > Jes Sørensen pointed out to me that readl() and friends should not use > eieio on PPC. On other architectures (e.g. AXP) this isn't done neither. > > Currently we have[*]: > > #define readl(addr) in_le32((volatile unsigned *)(addr)) > #define inl(port) in_le32((unsigned *)((port)+_IO_BASE)) > #define inl_p(port) in_le32((unsigned *)((port)+_IO_BASE)) > > extern inline unsigned in_le32(volatile unsigned *addr){ > unsigned ret; > > __asm__ __volatile__("lwbrx %0,0,%1; eieio" : "=r" (ret) : > "r" (addr), "m" (*addr)); > return ret; > } > > [*] Except on APUS, where readl() uses native endianness. > > Hence both inl() and readl() protect against reordering. This is not necessary > for readl(). Drivers that need to protect against reordering should use > wmb()/rmb()/mb() theirselves. Further, eieio should never be used by itself as an assembly instruction like this -- not in _any_ macro. If you ever hope to support all of the x100 PowerMacs, you'll have to have a macro just for eieio, as several instructions are required before and after eieio, sync, and isync (or at least two of them, and I forget which) to avoid hardware buglet on certain machines. Later, David [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: readl() and friends and eieio on PPC 1999-08-09 8:17 readl() and friends and eieio on PPC Geert Uytterhoeven 1999-08-09 17:19 ` David A. Gatwood @ 1999-08-10 1:00 ` Paul Mackerras 1999-08-10 7:18 ` [linux-fbdev] " Jes Sorensen 1 sibling, 1 reply; 41+ messages in thread From: Paul Mackerras @ 1999-08-10 1:00 UTC (permalink / raw) To: Geert.Uytterhoeven; +Cc: linuxppc-dev, linux-fbdev [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 2470 bytes --] Geert Uytterhoeven <Geert.Uytterhoeven@cs.kuleuven.ac.be> wrote: > Jes Sørensen pointed out to me that readl() and friends should not use eieio on > PPC. On other architectures (e.g. AXP) this isn't done neither. Readl/writel etc. are intended for "memory" space, but this could be either memory-mapped device registers or plain ordinary memory. The intel folks don't make the distinction because ia32 doesn't allow reordering of memory accesses AFAIK. > Hence both inl() and readl() protect against reordering. This is not necessary > for readl(). Drivers that need to protect against reordering should use > wmb()/rmb()/mb() theirselves. Linus made the point in a recent post to linux-kernel that people shouldn't necessarily expect inb/outb/readb/writeb etc. to be usable on every kind of bus - it's quite reasonable to define other access methods on other cpus or buses. > If readl() and friends don't do eieio, the fbcon-* routines won't be slowed > down by using readl() and friends (but we're still having the byte swapping > then). Do you have any numbers to show how much the eieios slow you down? If you take out the eieios, you will break other drivers, starting with the OHCI USB host driver. Can we think of another way around the problem? You could use le32_to_cpup for loading from the frame buffer, but there isn't currently an equivalent for stores, unfortunately (one could be invented, though). > And atyfb should use readl()/writel() instead of aty_{ld,st}_le32(), so we can > get rid of the inline assembler. Note that this will probably break on Atari, I thought the point of the aty_ld/st* routines was to avoid one add instruction each time by using the PPC indexed addressing mode. Anyway, IMO the aty_ld/st* routines *should* include the eieio. That would mean you wouldn't need the explicit eieio() calls scattered through the rest of the driver. I guess it's just luck that it works where you do a sequence of aty_st_le32's to set up some drawing command and then call wait_for_fifo (or wait_for_idle) which does an aty_ld_le32. Or doesn't it matter if the load gets done before all of the stores have completed? Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-10 1:00 ` Paul Mackerras @ 1999-08-10 7:18 ` Jes Sorensen 1999-08-11 0:23 ` Paul Mackerras 0 siblings, 1 reply; 41+ messages in thread From: Jes Sorensen @ 1999-08-10 7:18 UTC (permalink / raw) To: Paul.Mackerras; +Cc: Geert.Uytterhoeven, linuxppc-dev, linux-fbdev >>>>> "Paul" == Paul Mackerras <paulus@cs.anu.edu.au> writes: Paul> If you take out the eieios, you will break other drivers, Paul> starting with the OHCI USB host driver. Can we think of another Paul> way around the problem? You could use le32_to_cpup for loading Paul> from the frame buffer, but there isn't currently an equivalent Paul> for stores, unfortunately (one could be invented, though). This is quite easily solved by putting in mb()'s in the right places. This is how it is done for other drivers that are supposed to work on the Alpha. Paul> I thought the point of the aty_ld/st* routines was to avoid one Paul> add instruction each time by using the PPC indexed addressing Paul> mode. Anyway, IMO the aty_ld/st* routines *should* include the Paul> eieio. That would mean you wouldn't need the explicit eieio() Paul> calls scattered through the rest of the driver. I guess it's Paul> just luck that it works where you do a sequence of aty_st_le32's Paul> to set up some drawing command and then call wait_for_fifo (or Paul> wait_for_idle) which does an aty_ld_le32. Or doesn't it matter Paul> if the load gets done before all of the stores have completed? Having mb()'s explicitly put into the driver in the right places also makes sure that a driver will work on other architectures. Right now a driver that is written for the PPC is likely not to work on the Alpha if the author expects readl/writel to guarantee write ordering. Jes [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-10 7:18 ` [linux-fbdev] " Jes Sorensen @ 1999-08-11 0:23 ` Paul Mackerras 1999-08-11 7:23 ` Jes Sorensen 0 siblings, 1 reply; 41+ messages in thread From: Paul Mackerras @ 1999-08-11 0:23 UTC (permalink / raw) To: Jes.Sorensen; +Cc: Geert.Uytterhoeven, linuxppc-dev, linux-fbdev Jes Sorensen <Jes.Sorensen@cern.ch> wrote: > This is quite easily solved by putting in mb()'s in the right > places. This is how it is done for other drivers that are supposed to > work on the Alpha. No, this is not an acceptable solution. On ultrasparc at least, there is a "side-effect" bit in each PTE. If that bit is set, it tells the cpu not to reorder accesses to that page. I don't know whether alpha has the same facility, do you? Anyway, it's hard enough educating device driver writers about the need for byte-swapping on data in memory that is accessed by DMA. Trying to get people to scatter mb()'s around their drivers would be a herculean task (a bit like cleaning out the Augean stables, actually :-). Finally, mb() is actually a much stronger constraint than we need in a device driver, and will slow things down unnecessarily. mb() implies a strong ordering on all loads and stores to all memory. On the PPC, mb() translates into the sync instruction, which is much slower than eieio. For a sync, the cpu actually has to stop and wait for all bus activity to complete, whereas for an eieio, it just puts a special kind of entry in the stream of accesses going out to the memory bus. > Having mb()'s explicitly put into the driver in the right places also > makes sure that a driver will work on other architectures. Right now a > driver that is written for the PPC is likely not to work on the Alpha > if the author expects readl/writel to guarantee write ordering. Well, if alpha is actually like that, then IMO it is broken. I did some experiments this morning to test whether having eieio in readl/writel is actually going to slow you down. The bottom line is that the eieio introduces *no* measurable reduction in performance. I used the little program that I have appended below (mtest.c and mtm.S). I ran it on my 7600 like this: mtest 94000000 b420 e1480 200 400 2304 100 mtestn 94000000 b420 e1480 200 400 2304 100 This was with the screen at 1152x870, 16bpp. mtestn is just a symlink to mtest. The results for 10 runs were: with eieio: mean 2.825s, s.d. 0.007s without eieio: mean 2.824s, s.d. 0.027s I also tried it on my iMac (81000000 a000 b8350 200 400 2048 100) and got 4.76s both with and without eieio. So, unless and until you can show me some numbers that show an actual performance degradation from having the eieio in readl/writel, the eieio stays. Paul. mtest.c: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> extern void move_eieio(int *src, int *dst, int nx, int ny, int pitch); extern void move_no_eieio(int *src, int *dst, int nx, int ny, int pitch); main(int ac, char **av) { int fd; unsigned long base, sof, dof; int nx, ny, pitch; long ptr; int nrpt; int use_eieio; if (ac < 7) { fprintf(stderr, "Usage: %s base sof dof nx ny pitch\n", av[0]); exit(1); } base = strtoul(av[1], 0, 16); sof = strtoul(av[2], 0, 16); dof = strtoul(av[3], 0, 16); nx = atoi(av[4]); ny = atoi(av[5]); pitch = atoi(av[6]); nrpt = (ac > 7)? atoi(av[7]): 1; if ((fd = open("/dev/mem", 2)) < 0) { perror("/dev/mem"); exit(1); } use_eieio = strchr(av[0], 'n') == 0; printf("%seieio\n", use_eieio? "": "no "); ptr = mmap(0, 0x200000, PROT_READ|PROT_WRITE, MAP_SHARED, fd, base); if (ptr == -1) { perror("mmap"); exit(1); } if (use_eieio) { do { move_eieio((int *)(ptr + sof), (int *)(ptr + dof), nx, ny, pitch); dof += 4; } while (--nrpt > 0); } else { do { move_no_eieio((int *)(ptr + sof), (int *)(ptr + dof), nx, ny, pitch); dof += 4; } while (--nrpt > 0); } exit(0); } mtm.S: /* move_eieio(int *src, int *dst, int nx, int ny, int pitch) */ .globl move_eieio move_eieio: mtctr 5 li 8,0 2: lwbrx 0,3,8 eieio stwbrx 0,4,8 eieio addi 8,8,4 bdnz 2b addic. 6,6,-1 blelr add 3,3,7 add 4,4,7 b move_no_eieio /* move_no_eieio(int *src, int *dst, int nx, int ny, int pitch) */ .globl move_no_eieio move_no_eieio: mtctr 5 li 8,0 2: lwbrx 0,3,8 stwbrx 0,4,8 addi 8,8,4 bdnz 2b addic. 6,6,-1 blelr add 3,3,7 add 4,4,7 b move_no_eieio [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-11 0:23 ` Paul Mackerras @ 1999-08-11 7:23 ` Jes Sorensen 1999-08-11 7:38 ` Richard Henderson 1999-08-11 23:52 ` Paul Mackerras 0 siblings, 2 replies; 41+ messages in thread From: Jes Sorensen @ 1999-08-11 7:23 UTC (permalink / raw) To: Paul.Mackerras; +Cc: Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth >>>>> "Paul" == Paul Mackerras <paulus@cs.anu.edu.au> writes: Paul> Jes Sorensen <Jes.Sorensen@cern.ch> wrote: >> This is quite easily solved by putting in mb()'s in the right >> places. This is how it is done for other drivers that are supposed >> to work on the Alpha. Paul> No, this is not an acceptable solution. Paul> On ultrasparc at least, there is a "side-effect" bit in each Paul> PTE. If that bit is set, it tells the cpu not to reorder Paul> accesses to that page. I don't know whether alpha has the same Paul> facility, do you? No idea but I bet Richard Henderson can answer that question. I also checked with him after posting this message yesterday and the answer was readl/writel are not supposed to guarantee strict ordering. Paul> Anyway, it's hard enough educating device driver writers about Paul> the need for byte-swapping on data in memory that is accessed by Paul> DMA. Trying to get people to scatter mb()'s around their Paul> drivers would be a herculean task (a bit like cleaning out the Paul> Augean stables, actually :-). There are quite a few issues device driver authors needs to deal with, this is just one of them. I actually made quite an effort to explain the problem in my tutorial at Linux Expo. Besides people still have to deal with it when writing drivers for devices that are not mapped in PCI space but directly mapped. Having readl/writel guarantee ordering is inconsistant. Paul> Finally, mb() is actually a much stronger constraint than we Paul> need in a device driver, and will slow things down Paul> unnecessarily. mb() implies a strong ordering on all loads and Paul> stores to all memory. On the PPC, mb() translates into the sync Paul> instruction, which is much slower than eieio. For a sync, the Paul> cpu actually has to stop and wait for all bus activity to Paul> complete, whereas for an eieio, it just puts a special kind of Paul> entry in the stream of accesses going out to the memory bus. I don't know enough about the PPC architecture to comment on this, however I can see that wmb() translates into an eieio. wmb() is more fine grained and it would make sense to promote it over plain mb() in the places where it makes sense. >> Having mb()'s explicitly put into the driver in the right places >> also makes sure that a driver will work on other >> architectures. Right now a driver that is written for the PPC is >> likely not to work on the Alpha if the author expects readl/writel >> to guarantee write ordering. Paul> Well, if alpha is actually like that, then IMO it is broken. I will have to disagree with you on this one, I consider the PPC implementation to be very broken in this regard. Paul> So, unless and until you can show me some numbers that show an Paul> actual performance degradation from having the eieio in Paul> readl/writel, the eieio stays. So will the education of people telling them to use mb() after writel() if they want to be sure of the result. Jes [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-11 7:23 ` Jes Sorensen @ 1999-08-11 7:38 ` Richard Henderson 1999-08-12 0:13 ` Paul Mackerras 1999-08-12 0:17 ` Paul Mackerras 1999-08-11 23:52 ` Paul Mackerras 1 sibling, 2 replies; 41+ messages in thread From: Richard Henderson @ 1999-08-11 7:38 UTC (permalink / raw) To: Jes Sorensen Cc: Paul.Mackerras, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, Richard Henderson On Wed, Aug 11, 1999 at 09:23:29AM +0200, Jes Sorensen wrote: > Paul> On ultrasparc at least, there is a "side-effect" bit in each > Paul> PTE. If that bit is set, it tells the cpu not to reorder > Paul> accesses to that page. I don't know whether alpha has the same > Paul> facility, do you? No, it doesn't. > I don't know enough about the PPC architecture to comment on this, > however I can see that wmb() translates into an eieio. wmb() is more > fine grained and it would make sense to promote it over plain mb() in > the places where it makes sense. Definitely. Alpha's wmb and mb are very similar to ppc's sync and eieio. > Paul> Well, if alpha is actually like that, then IMO it is broken. IMO it is Most Correct. Memory barriers on alpha are a fact of life. It's not just I/O that requires it, though that is where it shows up most often with drivers. There are a great many cards that do memory mapped i/o that don't care about the ordering and write combining of the data setup, only that the data setup all be done before receiving the "go code". In these drivers, we need only one wmb insn, not one between each and every writel. This benefit is marked enough that there is zero chance you can convince me to add wmb() to writel(). The driver writer is the only one that knows whether this barrier is necessary. r~ [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-11 7:38 ` Richard Henderson @ 1999-08-12 0:13 ` Paul Mackerras 1999-08-12 1:39 ` Peter Chang 1999-08-12 0:17 ` Paul Mackerras 1 sibling, 1 reply; 41+ messages in thread From: Paul Mackerras @ 1999-08-12 0:13 UTC (permalink / raw) To: rth; +Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth Richard Henderson <rth@cygnus.com> wrote: > There are a great many cards that do memory mapped i/o that don't care > about the ordering and write combining of the data setup, only that the > data setup all be done before receiving the "go code". In these drivers, > we need only one wmb insn, not one between each and every writel. > > This benefit is marked enough that there is zero chance you can convince I'm curious to see the numbers. What sort of driver do you see this much of an effect in? For most things, the CPU spends an absolutely insignificant fraction of its time doing accesses to I/O device registers. The only exceptions I can think of would be 3D graphics cards (and possibly also gigabit ethernet cards, although they should be doing most stuff by DMA). Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 0:13 ` Paul Mackerras @ 1999-08-12 1:39 ` Peter Chang 1999-08-12 4:52 ` Paul Mackerras 0 siblings, 1 reply; 41+ messages in thread From: Peter Chang @ 1999-08-12 1:39 UTC (permalink / raw) To: linuxppc-dev, rth I may have sent this too early during editing. Sorry if you've seen an incomplete one already. At 10:13 +1000 08.12.1999, Paul Mackerras wrote: >Richard Henderson <rth@cygnus.com> wrote: > > > There are a great many cards that do memory mapped i/o that don't care > > about the ordering and write combining of the data setup, only that the > > data setup all be done before receiving the "go code". In these drivers, > > we need only one wmb insn, not one between each and every writel. > > > > This benefit is marked enough that there is zero chance you can convince > >I'm curious to see the numbers. What sort of driver do you see this >much of an effect in? When I did glide for the mac it definitely helped not do do an eieio after every pci write. The current generations of 3dfx hw use a sw managed fifo, and an eieio was only necessary when the sw layer needed to do do things to insert a 'barrier' in the fifo for later accounting. >The only exceptions I can think of would be 3D graphics cards (and possibly >also gigabit ethernet cards, although they should be doing most stuff >by DMA). Hmmm.... well the fifo in the 3dfx case lives on the board so there is a tradeoff of doing a lot of bus io and trying to make the rasterization responsive. Also the hw did not do dma, so this was sort of beside the point. :-) \p --- Underneath this flabby exterior is an enormous lack of character. -- Oscar Levant [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 1:39 ` Peter Chang @ 1999-08-12 4:52 ` Paul Mackerras 1999-08-12 6:17 ` Peter Chang 0 siblings, 1 reply; 41+ messages in thread From: Paul Mackerras @ 1999-08-12 4:52 UTC (permalink / raw) To: weasel; +Cc: linuxppc-dev, rth Peter Chang <weasel@cs.stanford.edu> wrote: > When I did glide for the mac it definitely helped not do do an eieio > after every pci write. The current generations of 3dfx hw use a sw > managed fifo, and an eieio was only necessary when the sw layer > needed to do do things to insert a 'barrier' in the fifo for later > accounting. Interesting. What was the magnitude of the effect? Are we talking about 1%, 10%, or 100% faster? This would have been from user level, right? Driving a 3D card through a kernel device driver would seem to be a bit painful. Would it have been possible to use double-precision floating loads and stores to transfer 8 bytes at a time? That can double the available bandwidth to PCI devices under some conditions. Thinking about it, it seems to me that if your device needs to be fed so fast that the eieio makes a difference, you *should* be feeding it from user level rather than the kernel anyway, so then the behaviour of readl/writel is irrelevant. Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 4:52 ` Paul Mackerras @ 1999-08-12 6:17 ` Peter Chang 0 siblings, 0 replies; 41+ messages in thread From: Peter Chang @ 1999-08-12 6:17 UTC (permalink / raw) To: Paul.Mackerras; +Cc: linuxppc-dev, rth At 14:52 +1000 08.12.1999, Paul Mackerras wrote: >Peter Chang <weasel@cs.stanford.edu> wrote: > > > When I did glide for the mac it definitely helped not do do an eieio > > after every pci write. The current generations of 3dfx hw use a sw > > managed fifo, and an eieio was only necessary when the sw layer > > needed to do do things to insert a 'barrier' in the fifo for later > > accounting. > >Interesting. What was the magnitude of the effect? Are we talking >about 1%, 10%, or 100% faster? It depended on the actual benchmark. A synthetic benchmark for a specific thing (flat triangles, gouraud triangls, texture download, etc) showed the biggest hits (~1% - 20% if memory serves). Actual game tests varied depending on their actual scene complexity, but had a similar effect. >This would have been from user level, right? Glide is always user level. (Well, on win32 there is a little driver level thing that does the mapping etc). >Would it have been possible to use double-precision floating loads and >stores to transfer 8 bytes at a time? That can double the available >bandwidth to PCI devices under some conditions. I did not do this, but I know that Ken (the actual mac guy at 3dfx) did this and got some impressive improvements. I did this for texture downloads and stuff for 3DNow! machines (amd k6 and k7), and got really impressive results. (More so on the k6 because of the lack of write combining). >Thinking about it, it seems to me that if your device needs to be fed >so fast that the eieio makes a difference, you *should* be feeding it >from user level rather than the kernel anyway, so then the behaviour >of readl/writel is irrelevant. That's true since the level switch will probably kill you anyway. I was just piping in w/ my $0.02 (a little off topic, you're right) that the eieio is not w/o its costs. \p --- Underneath this flabby exterior is an enormous lack of character. -- Oscar Levant [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-11 7:38 ` Richard Henderson 1999-08-12 0:13 ` Paul Mackerras @ 1999-08-12 0:17 ` Paul Mackerras 1999-08-12 4:40 ` Richard Henderson 1 sibling, 1 reply; 41+ messages in thread From: Paul Mackerras @ 1999-08-12 0:17 UTC (permalink / raw) To: rth; +Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth Richard Henderson <rth@cygnus.com> wrote: > Definitely. Alpha's wmb and mb are very similar to ppc's sync and eieio. Sync and eieio are different in that for sync, the cpu actually stops and waits for all memory accesses to complete, whereas for eieio the cpu doesn't have to stop and wait for anything. Do alpha's mb and wmb work the same way? My position is that if you can provide the ordering at essentially zero cost, then it is an advantage to have it since more drivers will work that way. Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 0:17 ` Paul Mackerras @ 1999-08-12 4:40 ` Richard Henderson 1999-08-12 5:00 ` Paul Mackerras 1999-08-12 5:16 ` David Edelsohn 0 siblings, 2 replies; 41+ messages in thread From: Richard Henderson @ 1999-08-12 4:40 UTC (permalink / raw) To: Paul.Mackerras Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev On Thu, Aug 12, 1999 at 10:17:34AM +1000, Paul Mackerras wrote: > Sync and eieio are different in that for sync, the cpu actually stops > and waits for all memory accesses to complete, whereas for eieio the > cpu doesn't have to stop and wait for anything. Do alpha's mb and wmb > work the same way? Yes. (Except for EV4, in which wmb == mb, but we don't care about that.) > My position is that if you can provide the ordering at essentially > zero cost, then it is an advantage to have it since more drivers will > work that way. But it isn't zero cost. It's not high cost, but that's not the same thing. r~ [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 4:40 ` Richard Henderson @ 1999-08-12 5:00 ` Paul Mackerras 1999-08-12 5:43 ` Richard Henderson 1999-08-12 5:16 ` David Edelsohn 1 sibling, 1 reply; 41+ messages in thread From: Paul Mackerras @ 1999-08-12 5:00 UTC (permalink / raw) To: rth; +Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev Richard Henderson <rth@cygnus.com> wrote: > But it isn't zero cost. It's not high cost, but that's not the same > thing. Show us the numbers? I'm starting to sound like Larry McVoy, I know. :-) The measurements I did showed no measurable difference in performance for copying stuff around a framebuffer on PPC (Richard, I guess you may not have seen that post). As far as PPC is concerned, I am unwilling to break drivers for the sake of an infinitesimal performance gain. I don't believe the frame-buffer guys will actually see any measurable improvement in performance from taking out the eieio from readl/writel on PPC. Of course, it may be different on alpha, and I would be very interested to know how big the effect is there. Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 5:00 ` Paul Mackerras @ 1999-08-12 5:43 ` Richard Henderson 1999-08-12 7:07 ` Paul Mackerras 0 siblings, 1 reply; 41+ messages in thread From: Richard Henderson @ 1999-08-12 5:43 UTC (permalink / raw) To: Paul.Mackerras Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev [-- Attachment #1: Type: text/plain, Size: 883 bytes --] On Thu, Aug 12, 1999 at 03:00:46PM +1000, Paul Mackerras wrote: > Show us the numbers? Attached is a quick userland study wrt main memory. How much more accurate to a real device do I need to get to convice you that the test is valid enough? As I see it, testing against main memory should be the lower bound of the numbers, since it's the quickest to respond. A real device will take longer to respond, so any enforced delays (or failures to write-combine) will only exagerate the difference. Anyway, the results (in cycles) from my 533MHz sx164 are: 10 10 10 10 10 223 94 94 94 94 So the cost of wmb for 8 store+wmb, versus 8 stores with one wmb, is over 9:1. > I don't believe the frame-buffer guys will actually > see any measurable improvement in performance from taking out the > eieio from readl/writel on PPC. For grins, will you try the same test on your ppc? r~ [-- Attachment #2: z.c --] [-- Type: text/plain, Size: 652 bytes --] #include <stdio.h> main() { int i; unsigned s, e; unsigned long mem; for (i = 0; i < 5; ++i) { asm("rpcc %0 stq $31,%2 stq $31,%2 stq $31,%2 stq $31,%2 stq $31,%2 stq $31,%2 stq $31,%2 wmb stq $31,%2 rpcc %1" : "=r"(s), "=r"(e), "=m"(mem)); printf("%u\n", e-s); } for (i = 0; i < 5; ++i) { asm("rpcc %0 stq $31,%2 wmb stq $31,%2 wmb stq $31,%2 wmb stq $31,%2 wmb stq $31,%2 wmb stq $31,%2 wmb stq $31,%2 wmb stq $31,%2 wmb rpcc %1" : "=r"(s), "=r"(e), "=m"(mem)); printf("%u\n", e-s); } } ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 5:43 ` Richard Henderson @ 1999-08-12 7:07 ` Paul Mackerras 1999-08-12 7:33 ` Richard Henderson ` (2 more replies) 0 siblings, 3 replies; 41+ messages in thread From: Paul Mackerras @ 1999-08-12 7:07 UTC (permalink / raw) To: rth; +Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev Richard Henderson <rth@cygnus.com> wrote: > As I see it, testing against main memory should be the lower > bound of the numbers, since it's the quickest to respond. A > real device will take longer to respond, so any enforced delays > (or failures to write-combine) will only exagerate the difference. Hmmm, no, doesn't it go the other way around? Going to L1 cache will mean that we can isolate the overhead of the wmb, and will exaggerate the ratio between the two cases. A real device that takes longer to respond will make the overhead of the wmb a smaller fraction of the total time. And you would hope that the cpu could overlap the wmb, or at least the time to decode and issue it, with the time waiting for the device to respond. > Anyway, the results (in cycles) from my 533MHz sx164 are: > > 10 One-cycle access to L1 cache, I guess? > 10 > 10 > 10 > 10 > 223 Because of i-cache misses, presumably > 94 > 94 > 94 > 94 > > So the cost of wmb for 8 store+wmb, versus 8 stores with one wmb, > is over 9:1. Interesting. Sounds like each wmb takes about 12 cycles ((94-10)/7), which sounds a bit like it is going all the way out to the memory bus and back before the cpu does the next instruction. (Ob. nitpicking: if a wmb takes 12 cycles, how come we can do a wmb and 8 stores in 10 cycles? :-) > For grins, will you try the same test on your ppc? Sure, happy to. I think I have correctly understood the alpha assembly syntax. My PPC version is below. I've added a couple of things. First, PPC has a `timebase' register which counts at 1/4 of the bus clock, which means once every 16 cycles on my G3 desktop at work. For this reason I have put a loop around the sets of stores to do them 16 times. The overhead of the loop should be zero (the branch is pretty easily predictable :-). The numbers should thus be cycles per iteration. Secondly, I added stuff to mmap a framebuffer and do the stores to a word in it, just for grins. The results tended to vary quite a lot from run to run, but here's a typical set: 17 10 9 9 9 24 17 16 16 16 732 731 736 786 727 666 755 840 774 801 So the eieio doesn't look to be nearly as expensive on PPC as wmb is on alpha. (16 - 9) / 7 = 1 cycle for the eieio, which is going to be insignificant in the context of an access to a device register, which can easily take ~ 50 to 100 cycles. The average of the 3rd line is 742, and of the 4th line is 767. But given the spread of the numbers, I don't think that the difference is statistically significant. This is going to the framebuffer on an ATI Rage chip. 760 cycles is 95 cpu cycles per access, or about 350ns. I guess ATI chips expect you to use the drawing engine if you are doing any significant amount of stuff. :-) What numbers do you get on alpha if you point it at a framebuffer, just for interest? Paul. #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> test(unsigned long *ptr) { int i; unsigned s, e; for (i = 0; i < 5; ++i) { asm("mftb %0 mtctr %3 1: stw 16,%2 stw 16,%2 stw 16,%2 stw 16,%2 stw 16,%2 stw 16,%2 stw 16,%2 eieio stw 16,%2 bdnz 1b mftb %1" : "=r"(s), "=r"(e), "=m"(*ptr) : "r"(16)); printf("%u ", e-s); } printf("\n"); for (i = 0; i < 5; ++i) { asm("mftb %0 mtctr %3 1: stw 16,%2 eieio stw 16,%2 eieio stw 16,%2 eieio stw 16,%2 eieio stw 16,%2 eieio stw 16,%2 eieio stw 16,%2 eieio stw 16,%2 eieio bdnz 1b mftb %1" : "=r"(s), "=r"(e), "=m"(*ptr) : "r"(16)); printf("%u ", e-s); } printf("\n"); } #define PAGESIZE 0x1000 main(int ac, char **av) { unsigned long base, offset; int fd; unsigned long mem; unsigned long *ptr; test(&mem); if (ac > 1) { base = strtoul(av[1], 0, 16); offset = (base & (PAGESIZE - 1)) / sizeof(unsigned long); base &= -PAGESIZE; if ((fd = open("/dev/mem", 2)) < 0) { perror("/dev/mem"); exit(1); } ptr = (unsigned long *) mmap(0, PAGESIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, base); if ((long)ptr == -1) { perror("mmap"); exit(1); } test(ptr + offset); } exit(0); } [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 7:07 ` Paul Mackerras @ 1999-08-12 7:33 ` Richard Henderson 1999-08-12 9:58 ` Paul Mackerras 1999-08-12 12:31 ` Geert Uytterhoeven 1999-08-13 18:33 ` Richard Henderson 2 siblings, 1 reply; 41+ messages in thread From: Richard Henderson @ 1999-08-12 7:33 UTC (permalink / raw) To: Paul.Mackerras Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, Richard Henderson On Thu, Aug 12, 1999 at 05:07:02PM +1000, Paul Mackerras wrote: > > 10 > > One-cycle access to L1 cache, I guess? No, 2 Cycles to L1 cache. One cycle to execute the store, which merely adds an entry to the store buffer. > > 223 > > Because of i-cache misses, presumably Presumably. The 10 and 94 numbers are all that's interesting. > Interesting. Sounds like each wmb takes about 12 cycles ((94-10)/7), > which sounds a bit like it is going all the way out to the memory bus > and back before the cpu does the next instruction. > > (Ob. nitpicking: if a wmb takes 12 cycles, how come we can do a wmb > and 8 stores in 10 cycles? :-) Because it doesn't work like that. wmb adds a magic token to the store buffer that prevents write combining and other such hw optimizations. Timing stq $31,addr stq $31,addr+8 vs stq $31,addr wmb stq $31,addr+8 shows only 1 cycle difference between the two. I'm not quite sure how the 12 works out. I do know that L2 cache is 12 cycles away, but that may just be coincidence. Going all the way out to the memory bus would take a whole lot longer than 12 cycles. More like 36. > What numbers do you get on alpha if you point it at a framebuffer, > just for interest? I'll give that a try tomorrow. r~ [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 7:33 ` Richard Henderson @ 1999-08-12 9:58 ` Paul Mackerras 0 siblings, 0 replies; 41+ messages in thread From: Paul Mackerras @ 1999-08-12 9:58 UTC (permalink / raw) To: rth; +Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth Richard Henderson <rth@cygnus.com> wrote: > No, 2 Cycles to L1 cache. One cycle to execute the store, > which merely adds an entry to the store buffer. Yes, of course, silly me. Same on PPC. > > (Ob. nitpicking: if a wmb takes 12 cycles, how come we can do a wmb > > and 8 stores in 10 cycles? :-) > > Because it doesn't work like that. wmb adds a magic token to the > store buffer that prevents write combining and other such hw > optimizations. Timing Then why is there such a big performance impact from the wmb's? Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 7:07 ` Paul Mackerras 1999-08-12 7:33 ` Richard Henderson @ 1999-08-12 12:31 ` Geert Uytterhoeven 1999-08-13 12:18 ` Paul Mackerras 1999-08-18 11:02 ` Gabriel Paubert 1999-08-13 18:33 ` Richard Henderson 2 siblings, 2 replies; 41+ messages in thread From: Geert Uytterhoeven @ 1999-08-12 12:31 UTC (permalink / raw) To: Paul.Mackerras; +Cc: rth, Jes.Sorensen, linuxppc-dev, linux-fbdev On Thu, 12 Aug 1999, Paul Mackerras wrote: > Richard Henderson <rth@cygnus.com> wrote: > The results tended to vary quite a lot from run to run, but here's a > typical set: > > 17 10 9 9 9 > 24 17 16 16 16 > 732 731 736 786 727 > 666 755 840 774 801 > > So the eieio doesn't look to be nearly as expensive on PPC as wmb is > on alpha. (16 - 9) / 7 = 1 cycle for the eieio, which is going to be I'm seeing different things (results don't tend to vary a lot): | [14:27:01]/tmp# ./a.out 0xc2800000 | 35 29 30 31 28 | 261 251 247 248 248 | 429 332 358 374 348 | 541 532 529 531 529 | [14:27:05]/tmp# Hence eieio() is quite expensive on memory. This in on an IBM LongTrail (CHRP), with 604e at 200 MHz, 512 KB L2 cache, 66 MHz SDRAM bus, and 33 MHz PCI to an ATI RAGE II+. > insignificant in the context of an access to a device register, which > can easily take ~ 50 to 100 cycles. For ISA (through PCI/ISA bridge). Isn't real PCI faster? Greetings, Geert -- Geert Uytterhoeven Geert.Uytterhoeven@cs.kuleuven.ac.be Wavelets, Linux/{m68k~Amiga,PPC~CHRP} http://www.cs.kuleuven.ac.be/~geert/ Department of Computer Science -- Katholieke Universiteit Leuven -- Belgium [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 12:31 ` Geert Uytterhoeven @ 1999-08-13 12:18 ` Paul Mackerras 1999-08-18 11:02 ` Gabriel Paubert 1 sibling, 0 replies; 41+ messages in thread From: Paul Mackerras @ 1999-08-13 12:18 UTC (permalink / raw) To: Geert.Uytterhoeven; +Cc: rth, Jes.Sorensen, linuxppc-dev, linux-fbdev Geert Uytterhoeven <Geert.Uytterhoeven@cs.kuleuven.ac.be> wrote: > I'm seeing different things (results don't tend to vary a lot): > > | [14:27:01]/tmp# ./a.out 0xc2800000 > | 35 29 30 31 28 > | 261 251 247 248 248 > | 429 332 358 374 348 > | 541 532 529 531 529 > | [14:27:05]/tmp# > > Hence eieio() is quite expensive on memory. > > This in on an IBM LongTrail (CHRP), with 604e at 200 MHz, 512 KB L2 cache, > 66 MHz SDRAM bus, and 33 MHz PCI to an ATI RAGE II+. I tried it on my longtrail, with a 300MHz 604 machV. I changed the loop count to 18 since that is the ratio of cpu clock to timebase clock on this machine. (You should probably use 12 on your machine.) I got results much like yours: 23 23 20 20 21 av=21.4 180 175 175 175 175 av=176.0 288 358 275 359 309 av=317.8 375 400 351 423 351 av=380.0 So yes, in this case adding the eieios costs about 22 cycles each when going to main memory, or 9 cycles each when going to the framebuffer. I guess that when going to the framebuffer, much of the latency of the eieio gets hidden. It would be interesting to try a mix of loads and stores to the framebuffer, perhaps 4 loads followed by 4 stores to get the effect of a bitblt routine. I tried my framebuffer-copy test on my 7600, which has 200MHz 604e cpus, and I didn't see any difference in overall time for the test, whether there were eieio's in or not. This morning I read something in the PPC750 manual which implied that the G3 doesn't reorder stores, and doesn't reorder non-cacheable accesses. That would mean eieio could be a no-op, which could help explain why it only takes 1 cycle on a G3. :-) (Not reordering non-cacheable accesses actually makes a lot of sense to me.) I think that probably the best thing is to have safe and fast variants of readl/writel etc. For the sake of not having to change a whole heap of drivers (whose maintainers use x86 cpus :-() I would urge that readl/writel include the eieio, and that we have readl_fast, writel_fast etc. which don't include the eieio. I would still be interested to see overall timings for frame-buffer operations with and without the eieios. Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 12:31 ` Geert Uytterhoeven 1999-08-13 12:18 ` Paul Mackerras @ 1999-08-18 11:02 ` Gabriel Paubert 1 sibling, 0 replies; 41+ messages in thread From: Gabriel Paubert @ 1999-08-18 11:02 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Paul.Mackerras, rth, Jes.Sorensen, linuxppc-dev, linux-fbdev On Thu, 12 Aug 1999, Geert Uytterhoeven wrote: > > On Thu, 12 Aug 1999, Paul Mackerras wrote: > > Richard Henderson <rth@cygnus.com> wrote: > > The results tended to vary quite a lot from run to run, but here's a > > typical set: > > > > 17 10 9 9 9 > > 24 17 16 16 16 > > 732 731 736 786 727 > > 666 755 840 774 801 > > > > So the eieio doesn't look to be nearly as expensive on PPC as wmb is > > on alpha. (16 - 9) / 7 = 1 cycle for the eieio, which is going to be > > I'm seeing different things (results don't tend to vary a lot): > > | [14:27:01]/tmp# ./a.out 0xc2800000 > | 35 29 30 31 28 > | 261 251 247 248 248 > | 429 332 358 374 348 > | 541 532 529 531 529 > | [14:27:05]/tmp# > > Hence eieio() is quite expensive on memory. > > This in on an IBM LongTrail (CHRP), with 604e at 200 MHz, 512 KB L2 cache, > 66 MHz SDRAM bus, and 33 MHz PCI to an ATI RAGE II+. Not surprising, on 603 and G3, eieio is an internal operation (it prevents some forms of write combining on the G3). On 604 (and 601 AFAIR) every eieio translates into an actual bus cycle, which takes time. Don't ask me exactly why (probably SMP issues). However, expect the cost of always inserting an eieio to become huge on a G4 if it ever comes out: it has longer memory queues and should perform more aggressive combinations of memory operations from adjacent addresses. Also a smart host bridge can merge writes from a processor into a burst PCI transaction, the eieio cycle tells where it has to break the burst. > > insignificant in the context of an access to a device register, which > > can easily take ~ 50 to 100 cycles. > > For ISA (through PCI/ISA bridge). Isn't real PCI faster? Depends on what you processor clock and whether you are speaking of reads or writes. With posted writes which effectively stop at the host bridge, this figure sounds exaggerated indeed (core / bus ratio between 3 and 6, around 4 processor bus clocks for a single beat cycle). OTOH, when filling a framebuffer, the buffers in the host bridge are rapidly filled, write posting does not help and the figure might be reasonable. Greetings, Gabriel. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 7:07 ` Paul Mackerras 1999-08-12 7:33 ` Richard Henderson 1999-08-12 12:31 ` Geert Uytterhoeven @ 1999-08-13 18:33 ` Richard Henderson 2 siblings, 0 replies; 41+ messages in thread From: Richard Henderson @ 1999-08-13 18:33 UTC (permalink / raw) To: Paul.Mackerras Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, Richard Henderson On Thu, Aug 12, 1999 at 05:07:02PM +1000, Paul Mackerras wrote: > What numbers do you get on alpha if you point it at a framebuffer, > just for interest? With some additional numbers for mb vs wmb -- Memory: none 15 11 11 11 11 1 wmb 10 10 10 10 10 1 mb 140 129 62 59 59 8 wmb 171 157 101 101 101 8 mb 346 270 270 267 267 Millenium2 fb: none 2599 11 11 11 11 1 wmb 10 10 10 10 10 1 mb 220 130 139 139 139 8 wmb 192 178 192 192 192 8 mb 538 423 423 423 423 r~ [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 4:40 ` Richard Henderson 1999-08-12 5:00 ` Paul Mackerras @ 1999-08-12 5:16 ` David Edelsohn 1999-08-12 5:27 ` Paul Mackerras ` (2 more replies) 1 sibling, 3 replies; 41+ messages in thread From: David Edelsohn @ 1999-08-12 5:16 UTC (permalink / raw) To: Richard Henderson Cc: Paul.Mackerras, Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev >>>>> Richard Henderson writes: >> My position is that if you can provide the ordering at essentially >> zero cost, then it is an advantage to have it since more drivers will >> work that way. Richard> But it isn't zero cost. It's not high cost, but that's not the same thing. Is your assumption that you want to provide the infrastructure to write high-performance device drivers or to write device drivers that don't require as much expertise and knowledge to produce correct results? There are conflicting goals in this design providing different benefits to Linux. David [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 5:16 ` David Edelsohn @ 1999-08-12 5:27 ` Paul Mackerras 1999-08-12 5:52 ` Richard Henderson 1999-08-12 7:32 ` Jes Sorensen 2 siblings, 0 replies; 41+ messages in thread From: Paul Mackerras @ 1999-08-12 5:27 UTC (permalink / raw) To: dje; +Cc: rth, Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev David Edelsohn <dje@watson.ibm.com> wrote: > Is your assumption that you want to provide the infrastructure to > write high-performance device drivers or to write device drivers that > don't require as much expertise and knowledge to produce correct results? Interesting question. I guess I would be trying both to make it easy to write device drivers that work, and possible to write very high-performance device drivers. Particularly since the vast majority of drivers in Linux have been written for the i386 platform, which doesn't do pesky (;-) things like reordering reads and writes. In any case, as far as the question of using readl/writel in framebuffer code goes, and whether readl/writel should include the eieio, the measurements I did showed zero performance impact of having the eieio, in frame-buffer copy code at least. Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 5:16 ` David Edelsohn 1999-08-12 5:27 ` Paul Mackerras @ 1999-08-12 5:52 ` Richard Henderson 1999-08-12 7:11 ` Paul Mackerras 1999-08-12 7:32 ` Jes Sorensen 2 siblings, 1 reply; 41+ messages in thread From: Richard Henderson @ 1999-08-12 5:52 UTC (permalink / raw) To: David Edelsohn Cc: Paul.Mackerras, Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, Richard Henderson On Thu, Aug 12, 1999 at 01:16:14AM -0400, David Edelsohn wrote: > Is your assumption that you want to provide the infrastructure to > write high-performance device drivers or to write device drivers that > don't require as much expertise and knowledge to produce correct results? I prefer high-performance drivers. There are enough other things (virt_to_bus, ioremap, et al) that are non-optional that driver writers must learn about for non-peecee driver programming that proper use of mb/wmb doesn't seem that big a deal to me. I guess I personally can afford to be somewhat idealistic in this, because I only use about 4 drivers -- ncr, aic7xxx, tulip, epic100 -- and the authors of all these drivers have clue. But the thought of coddling to folks that can't be bothered to do things Right gives me hives. r~ [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 5:52 ` Richard Henderson @ 1999-08-12 7:11 ` Paul Mackerras 0 siblings, 0 replies; 41+ messages in thread From: Paul Mackerras @ 1999-08-12 7:11 UTC (permalink / raw) To: rth; +Cc: dje, Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth Richard Henderson <rth@cygnus.com> wrote: > I prefer high-performance drivers. Sure, so do I. But when I can get safety as well, for the cost of one extra cpu cycle per device access, which can probably be overlapped with the device access anyway, I think it's a good deal. On alpha, does wmb() stop a subsequent load from being moved ahead of a previous store? Or do you have to use mb() to get that effect? Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 5:16 ` David Edelsohn 1999-08-12 5:27 ` Paul Mackerras 1999-08-12 5:52 ` Richard Henderson @ 1999-08-12 7:32 ` Jes Sorensen 2 siblings, 0 replies; 41+ messages in thread From: Jes Sorensen @ 1999-08-12 7:32 UTC (permalink / raw) To: David Edelsohn Cc: Richard Henderson, Paul.Mackerras, linuxppc-dev, linux-fbdev >>>>> "David" == David Edelsohn <dje@watson.ibm.com> writes: >>>>> Richard Henderson writes: Richard> But it isn't zero cost. It's not high cost, but that's not Richard> the same thing. David> Is your assumption that you want to provide the infrastructure David> to write high-performance device drivers or to write device David> drivers that don't require as much expertise and knowledge to David> produce correct results? There are conflicting goals in this David> design providing different benefits to Linux. I am certainly up for high performance device drivers. Even having writel do the syncing there are enough other pitfalls for people to take into account. Some of these are much harder to understand than dealing with write ordering and as such, trying to make things invisible are only giving us a false guarantee. Ie. if we want to use spin locks in the kernel we need to teach people how to use them correctly. Jes [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-11 7:23 ` Jes Sorensen 1999-08-11 7:38 ` Richard Henderson @ 1999-08-11 23:52 ` Paul Mackerras 1999-08-12 7:38 ` Jes Sorensen 1999-08-12 19:00 ` David A. Gatwood 1 sibling, 2 replies; 41+ messages in thread From: Paul Mackerras @ 1999-08-11 23:52 UTC (permalink / raw) To: Jes.Sorensen; +Cc: Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth Jes Sorensen <Jes.Sorensen@cern.ch> wrote: > I will have to disagree with you on this one, I consider the PPC > implementation to be very broken in this regard. "Very broken" - because drivers work and there is no measurable performance impact?? !!! ?? The only possible argument for *not* having the eieio in readl/writel is that it hurts performance (actually and measurably, not just potentially). Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-11 23:52 ` Paul Mackerras @ 1999-08-12 7:38 ` Jes Sorensen 1999-08-12 19:00 ` David A. Gatwood 1 sibling, 0 replies; 41+ messages in thread From: Jes Sorensen @ 1999-08-12 7:38 UTC (permalink / raw) To: Paul.Mackerras; +Cc: Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth >>>>> "Paul" == Paul Mackerras <paulus@cs.anu.edu.au> writes: Paul> Jes Sorensen <Jes.Sorensen@cern.ch> wrote: >> I will have to disagree with you on this one, I consider the PPC >> implementation to be very broken in this regard. Paul> "Very broken" - because drivers work and there is no measurable Paul> performance impact?? !!! ?? Paul> The only possible argument for *not* having the eieio in Paul> readl/writel is that it hurts performance (actually and Paul> measurably, not just potentially). Ok strong wording maybe. I am just quite displeased when people try to hide the real world from programmers because most code was written for the x86 by people without a clue. In the long term I think that sort of approach is going to bite us since code will not get fixed where it should. Jes [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-11 23:52 ` Paul Mackerras 1999-08-12 7:38 ` Jes Sorensen @ 1999-08-12 19:00 ` David A. Gatwood 1999-08-13 1:51 ` Paul Mackerras 1 sibling, 1 reply; 41+ messages in thread From: David A. Gatwood @ 1999-08-12 19:00 UTC (permalink / raw) To: Paul.Mackerras Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth On Thu, 12 Aug 1999, Paul Mackerras wrote: > Jes Sorensen <Jes.Sorensen@cern.ch> wrote: > > > I will have to disagree with you on this one, I consider the PPC > > implementation to be very broken in this regard. > > "Very broken" - because drivers work and there is no measurable > performance impact?? !!! ?? > > The only possible argument for *not* having the eieio in readl/writel > is that it hurts performance (actually and measurably, not just > potentially). No, that's not the only argument. eieio and... isync, I think... causes the PPC 601 to shift one of its registers a few bits and send out an address only transaction using the address that results from that. I can't remember which register off the top of my head. MkLinux ran into this in the late Pre-DR3 stage and it nearly cost us a large percentage of x100 support do to a hardware bug that can cause the machine to hang if an address only transaction is done into certain parts of the address space. The workaround is a really nasty bunch of code that creates a sizable performance hit by forcing that register to be cleared before the eieio and restored afterwards. As a result, putting eieio in those macros will have a _very_ major performance hit if yuo ever start supporting x100 PowerMacs. It will also require lots of really nasty #ifdef structures in the readl and writel code that can be avoided just by making a macro eieio() and using it only where needed. It will also greatly decrease the headaches for x100 folks have in their efforts to find all the eieios and figure out why their machines crash randomly. :-) David [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-12 19:00 ` David A. Gatwood @ 1999-08-13 1:51 ` Paul Mackerras 0 siblings, 0 replies; 41+ messages in thread From: Paul Mackerras @ 1999-08-13 1:51 UTC (permalink / raw) To: dgatwood; +Cc: Jes.Sorensen, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth David A. Gatwood <dgatwood@mvista.com> wrote: > No, that's not the only argument. eieio and... isync, I think... causes > the PPC 601 to shift one of its registers a few bits and send out an > address only transaction using the address that results from that. I I can understand eieio causing an address-only transaction, but the address should just be ignored. > can't remember which register off the top of my head. MkLinux ran into > this in the late Pre-DR3 stage and it nearly cost us a large percentage of > x100 support do to a hardware bug that can cause the machine to hang if an > address only transaction is done into certain parts of the address space. Hmmm, I didn't see any such problems with the 7200 and 7500 powermacs, which have a 601 cpu. It's a hardware bug in the x100's memory controller or nubus bridge, right? I guess it's lucky you actually have some control over the address that gets put out. :-) > The workaround is a really nasty bunch of code that creates a sizable > performance hit by forcing that register to be cleared before the eieio > and restored afterwards. As a result, putting eieio in those macros will > have a _very_ major performance hit if yuo ever start supporting x100 Surely it should only take a couple of cycles to move a register to another and clear it? I agree it's a pain though. Actually, with gcc the asm statement that uses the eieio could just specify the register (which one is it?) as an input and give it the value 0. I think a resolution of this issue is going to have to involve Linus and the whole Linux community. We may need two forms of the bus access macros, one with the eieio's and one without. I think the `ordinary' form should have the eieio's, though. Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <m3672hkxri.fsf@soma.andreas.org>]
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC [not found] <m3672hkxri.fsf@soma.andreas.org> @ 1999-08-15 13:39 ` James Simmons 0 siblings, 0 replies; 41+ messages in thread From: James Simmons @ 1999-08-15 13:39 UTC (permalink / raw) To: Andreas Bogk; +Cc: linuxppc-dev, linux-fbdev > Actually the name of the instruction is a joke by some unnamed IBM > engineer (you know that children's song, "Old McDonalds had a farm, > eieio..."), I kind of figured that. It could of have been Enchaned Interface Extended IO but I know for sure now. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <d3pv0p72yr.fsf@lxp03.cern.ch>]
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC [not found] <d3pv0p72yr.fsf@lxp03.cern.ch> @ 1999-08-15 19:43 ` David A. Gatwood 0 siblings, 0 replies; 41+ messages in thread From: David A. Gatwood @ 1999-08-15 19:43 UTC (permalink / raw) To: Jes Sorensen Cc: Richard Henderson, Paul.Mackerras, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev On 15 Aug 1999, Jes Sorensen wrote: > Richard> Keep in mind that wmb is the name of an Alpha specific > Richard> assembler insn. Blame Linus. ;-) > > Yeah I know that, but it is still a hell of a lot more explaining than > something that when pronounced sounds like someone having certain > vital parts of his anatomy cut off. No, it's pronounced as in "Old Macdownald had a farm, e-i-e-i-o." :-) 'course it's also short for ensure in-order execution for i/o, probably about as descriptive as you can get. ;-) David [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <Pine.LNX.3.96.990813143741.27557B-100000@mvista.com>]
[parent not found: <d3so5mdyta.fsf@lxp03.cern.ch>]
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC [not found] ` <d3so5mdyta.fsf@lxp03.cern.ch> @ 1999-08-14 18:34 ` Geert Uytterhoeven 1999-08-14 18:36 ` David A. Gatwood ` (2 subsequent siblings) 3 siblings, 0 replies; 41+ messages in thread From: Geert Uytterhoeven @ 1999-08-14 18:34 UTC (permalink / raw) To: Jes Sorensen Cc: David A. Gatwood, Paul.Mackerras, linuxppc-dev, linux-fbdev, rth On 14 Aug 1999, Jes Sorensen wrote: > >>>>> "David" == David A Gatwood <dgatwood@mvista.com> writes: > David> On Fri, 13 Aug 1999, Paul Mackerras wrote: > >> Surely it should only take a couple of cycles to move a register to > >> another and clear it? I agree it's a pain though. Actually, with > >> gcc the asm statement that uses the eieio could just specify the > >> register (which one is it?) as an input and give it the value 0. > > David> For some reason, just loading the value isn't enough. The code > David> that Gilbert put in as a workaround shortly before DR3 looks > David> like this: > > David> #define eieio() __asm__ volatile("li 0,0: cmpwi 0,0; bne+ 0f; > David> eieio; 0:" : : : "0") > > Defininf a C function with the name of a PPC specific assembler > function is pretty stupid. To the best of my knowledge wmb() is the Well, David was talking about replacing the eieio() macro (which Linux/PPC had since ages) by something that works around bugs in the hardware. > generic name for the thing you are looking for. Time to start grepping in include/asm-alpha/system.h. Huh, Alpha AXP has no `rmb' mnemonic ;-) Greetings, Geert -- Geert Uytterhoeven Geert.Uytterhoeven@cs.kuleuven.ac.be Wavelets, Linux/{m68k~Amiga,PPC~CHRP} http://www.cs.kuleuven.ac.be/~geert/ Department of Computer Science -- Katholieke Universiteit Leuven -- Belgium [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC [not found] ` <d3so5mdyta.fsf@lxp03.cern.ch> 1999-08-14 18:34 ` Geert Uytterhoeven @ 1999-08-14 18:36 ` David A. Gatwood 1999-08-14 19:48 ` Jes Sorensen 1999-08-14 21:39 ` Richard Henderson 1999-08-15 23:16 ` Paul Mackerras 3 siblings, 1 reply; 41+ messages in thread From: David A. Gatwood @ 1999-08-14 18:36 UTC (permalink / raw) To: Jes Sorensen Cc: Paul.Mackerras, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth On 14 Aug 1999, Jes Sorensen wrote: > David> #define eieio() __asm__ volatile("li 0,0: cmpwi 0,0; bne+ 0f; > David> eieio; 0:" : : : "0") > > Defininf a C function with the name of a PPC specific assembler > function is pretty stupid. To the best of my knowledge wmb() is the > generic name for the thing you are looking for. Keep in mind, I'm talking about MkLinux, _not_ LinuxPPC. wmb() is a linux-specific term, as far as I know. The above is in mach. David [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-14 18:36 ` David A. Gatwood @ 1999-08-14 19:48 ` Jes Sorensen 1999-08-15 1:28 ` David A. Gatwood 0 siblings, 1 reply; 41+ messages in thread From: Jes Sorensen @ 1999-08-14 19:48 UTC (permalink / raw) To: David A. Gatwood Cc: Paul.Mackerras, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth >>>>> "David" == David A Gatwood <dgatwood@mvista.com> writes: David> On 14 Aug 1999, Jes Sorensen wrote: #define eieio() __asm__ David> volatile("li 0,0: cmpwi 0,0; bne+ 0f; eieio; 0:" : : : "0") >> Defininf a C function with the name of a PPC specific assembler >> function is pretty stupid. To the best of my knowledge wmb() is the >> generic name for the thing you are looking for. David> Keep in mind, I'm talking about MkLinux, _not_ LinuxPPC. wmb() David> is a linux-specific term, as far as I know. The above is in David> mach. Urgh Ok we were discussing the normal kernel here. Jes [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-14 19:48 ` Jes Sorensen @ 1999-08-15 1:28 ` David A. Gatwood 0 siblings, 0 replies; 41+ messages in thread From: David A. Gatwood @ 1999-08-15 1:28 UTC (permalink / raw) To: Jes Sorensen Cc: Paul.Mackerras, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth On 14 Aug 1999, Jes Sorensen wrote: > >>>>> "David" == David A Gatwood <dgatwood@mvista.com> writes: > > David> On 14 Aug 1999, Jes Sorensen wrote: #define eieio() __asm__ > David> volatile("li 0,0: cmpwi 0,0; bne+ 0f; eieio; 0:" : : : "0") > >> Defininf a C function with the name of a PPC specific assembler > >> function is pretty stupid. To the best of my knowledge wmb() is the > >> generic name for the thing you are looking for. > > David> Keep in mind, I'm talking about MkLinux, _not_ LinuxPPC. wmb() > David> is a linux-specific term, as far as I know. The above is in > David> mach. > > Urgh > > Ok we were discussing the normal kernel here. Breakdown in communication again. What I'm talking about is what MkLinux's Mach Kernel did to get support for certain machines. That's why it was eieio(). It's the _equivalent_ of what LinuxPPC would have to do to support the same machines, though the details of the names and stuff would be different. :-) David [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC [not found] ` <d3so5mdyta.fsf@lxp03.cern.ch> 1999-08-14 18:34 ` Geert Uytterhoeven 1999-08-14 18:36 ` David A. Gatwood @ 1999-08-14 21:39 ` Richard Henderson 1999-08-15 23:16 ` Paul Mackerras 3 siblings, 0 replies; 41+ messages in thread From: Richard Henderson @ 1999-08-14 21:39 UTC (permalink / raw) To: Jes Sorensen Cc: David A. Gatwood, Paul.Mackerras, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev On Sat, Aug 14, 1999 at 08:03:45PM +0200, Jes Sorensen wrote: > Defininf a C function with the name of a PPC specific assembler > function is pretty stupid. To the best of my knowledge wmb() is the > generic name for the thing you are looking for. Keep in mind that wmb is the name of an Alpha specific assembler insn. Blame Linus. ;-) r~ [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC [not found] ` <d3so5mdyta.fsf@lxp03.cern.ch> ` (2 preceding siblings ...) 1999-08-14 21:39 ` Richard Henderson @ 1999-08-15 23:16 ` Paul Mackerras 1999-08-16 0:29 ` Richard Henderson 1999-08-16 7:11 ` Jes Sorensen 3 siblings, 2 replies; 41+ messages in thread From: Paul Mackerras @ 1999-08-15 23:16 UTC (permalink / raw) To: Jes.Sorensen; +Cc: dgatwood, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth Jes Sorensen <Jes.Sorensen@cern.ch> wrote: > Defininf a C function with the name of a PPC specific assembler > function is pretty stupid. To the best of my knowledge wmb() is the > generic name for the thing you are looking for. Not exactly, in fact iobarrier() is probably better. The eieio instruction has two separate effects: 1. as a write barrier for writes to cacheable memory (hence wmb) 2. as a read/write barrier for reads and writes to non-cacheable memory (hence iobarrier) Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-15 23:16 ` Paul Mackerras @ 1999-08-16 0:29 ` Richard Henderson 1999-08-16 7:11 ` Jes Sorensen 1 sibling, 0 replies; 41+ messages in thread From: Richard Henderson @ 1999-08-16 0:29 UTC (permalink / raw) To: Paul.Mackerras Cc: Jes.Sorensen, dgatwood, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev On Mon, Aug 16, 1999 at 09:16:47AM +1000, Paul Mackerras wrote: > The eieio instruction has two separate effects: > > 1. as a write barrier for writes to cacheable memory (hence wmb) > > 2. as a read/write barrier for reads and writes to non-cacheable > memory (hence iobarrier) Ah. The Alpha wmb instruction does _not_ have this second effect. r~ [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [linux-fbdev] Re: readl() and friends and eieio on PPC 1999-08-15 23:16 ` Paul Mackerras 1999-08-16 0:29 ` Richard Henderson @ 1999-08-16 7:11 ` Jes Sorensen 1 sibling, 0 replies; 41+ messages in thread From: Jes Sorensen @ 1999-08-16 7:11 UTC (permalink / raw) To: Paul.Mackerras Cc: dgatwood, Geert.Uytterhoeven, linuxppc-dev, linux-fbdev, rth >>>>> "Paul" == Paul Mackerras <paulus@cs.anu.edu.au> writes: Paul> Jes Sorensen <Jes.Sorensen@cern.ch> wrote: >> Defininf a C function with the name of a PPC specific assembler >> function is pretty stupid. To the best of my knowledge wmb() is the >> generic name for the thing you are looking for. Paul> Not exactly, in fact iobarrier() is probably better. Hmmmm Probably right, as long as it is made clear that iobarrier() means memory mapped I/O and not I/O mapped I/O. The name is easy to misunderstand IMHO. Other than that I agree, question is whether we need to add more functions - it seems we are already overly confused. Jes [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]] ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~1999-08-18 11:02 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
1999-08-09 8:17 readl() and friends and eieio on PPC Geert Uytterhoeven
1999-08-09 17:19 ` David A. Gatwood
1999-08-10 1:00 ` Paul Mackerras
1999-08-10 7:18 ` [linux-fbdev] " Jes Sorensen
1999-08-11 0:23 ` Paul Mackerras
1999-08-11 7:23 ` Jes Sorensen
1999-08-11 7:38 ` Richard Henderson
1999-08-12 0:13 ` Paul Mackerras
1999-08-12 1:39 ` Peter Chang
1999-08-12 4:52 ` Paul Mackerras
1999-08-12 6:17 ` Peter Chang
1999-08-12 0:17 ` Paul Mackerras
1999-08-12 4:40 ` Richard Henderson
1999-08-12 5:00 ` Paul Mackerras
1999-08-12 5:43 ` Richard Henderson
1999-08-12 7:07 ` Paul Mackerras
1999-08-12 7:33 ` Richard Henderson
1999-08-12 9:58 ` Paul Mackerras
1999-08-12 12:31 ` Geert Uytterhoeven
1999-08-13 12:18 ` Paul Mackerras
1999-08-18 11:02 ` Gabriel Paubert
1999-08-13 18:33 ` Richard Henderson
1999-08-12 5:16 ` David Edelsohn
1999-08-12 5:27 ` Paul Mackerras
1999-08-12 5:52 ` Richard Henderson
1999-08-12 7:11 ` Paul Mackerras
1999-08-12 7:32 ` Jes Sorensen
1999-08-11 23:52 ` Paul Mackerras
1999-08-12 7:38 ` Jes Sorensen
1999-08-12 19:00 ` David A. Gatwood
1999-08-13 1:51 ` Paul Mackerras
[not found] <m3672hkxri.fsf@soma.andreas.org>
1999-08-15 13:39 ` James Simmons
[not found] <d3pv0p72yr.fsf@lxp03.cern.ch>
1999-08-15 19:43 ` David A. Gatwood
[not found] <Pine.LNX.3.96.990813143741.27557B-100000@mvista.com>
[not found] ` <d3so5mdyta.fsf@lxp03.cern.ch>
1999-08-14 18:34 ` Geert Uytterhoeven
1999-08-14 18:36 ` David A. Gatwood
1999-08-14 19:48 ` Jes Sorensen
1999-08-15 1:28 ` David A. Gatwood
1999-08-14 21:39 ` Richard Henderson
1999-08-15 23:16 ` Paul Mackerras
1999-08-16 0:29 ` Richard Henderson
1999-08-16 7:11 ` Jes Sorensen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).