From: Paul Mackerras <paulus@cs.anu.edu.au>
To: rth@cygnus.com
Cc: Jes.Sorensen@cern.ch, Geert.Uytterhoeven@cs.kuleuven.ac.be,
linuxppc-dev@lists.linuxppc.org, linux-fbdev@vuser.vu.union.edu
Subject: Re: [linux-fbdev] Re: readl() and friends and eieio on PPC
Date: Thu, 12 Aug 1999 17:07:02 +1000 [thread overview]
Message-ID: <199908120707.RAA30438@tango.anu.edu.au> (raw)
In-Reply-To: <19990811224344.A14713@cygnus.com> (message from Richard Henderson on Wed, 11 Aug 1999 22:43:44 -0700)
Richard Henderson <rth@cygnus.com> wrote:
> As I see it, testing against main memory should be the lower
> bound of the numbers, since it's the quickest to respond. A
> real device will take longer to respond, so any enforced delays
> (or failures to write-combine) will only exagerate the difference.
Hmmm, no, doesn't it go the other way around?
Going to L1 cache will mean that we can isolate the overhead of the
wmb, and will exaggerate the ratio between the two cases.
A real device that takes longer to respond will make the overhead of
the wmb a smaller fraction of the total time. And you would hope that
the cpu could overlap the wmb, or at least the time to decode and
issue it, with the time waiting for the device to respond.
> Anyway, the results (in cycles) from my 533MHz sx164 are:
>
> 10
One-cycle access to L1 cache, I guess?
> 10
> 10
> 10
> 10
> 223
Because of i-cache misses, presumably
> 94
> 94
> 94
> 94
>
> So the cost of wmb for 8 store+wmb, versus 8 stores with one wmb,
> is over 9:1.
Interesting. Sounds like each wmb takes about 12 cycles ((94-10)/7),
which sounds a bit like it is going all the way out to the memory bus
and back before the cpu does the next instruction.
(Ob. nitpicking: if a wmb takes 12 cycles, how come we can do a wmb
and 8 stores in 10 cycles? :-)
> For grins, will you try the same test on your ppc?
Sure, happy to.
I think I have correctly understood the alpha assembly syntax. My PPC
version is below. I've added a couple of things. First, PPC has a
`timebase' register which counts at 1/4 of the bus clock, which means
once every 16 cycles on my G3 desktop at work. For this reason I have
put a loop around the sets of stores to do them 16 times. The
overhead of the loop should be zero (the branch is pretty easily
predictable :-). The numbers should thus be cycles per iteration.
Secondly, I added stuff to mmap a framebuffer and do the stores to a
word in it, just for grins.
The results tended to vary quite a lot from run to run, but here's a
typical set:
17 10 9 9 9
24 17 16 16 16
732 731 736 786 727
666 755 840 774 801
So the eieio doesn't look to be nearly as expensive on PPC as wmb is
on alpha. (16 - 9) / 7 = 1 cycle for the eieio, which is going to be
insignificant in the context of an access to a device register, which
can easily take ~ 50 to 100 cycles.
The average of the 3rd line is 742, and of the 4th line is 767. But
given the spread of the numbers, I don't think that the difference is
statistically significant. This is going to the framebuffer on an ATI
Rage chip. 760 cycles is 95 cpu cycles per access, or about 350ns. I
guess ATI chips expect you to use the drawing engine if you are doing
any significant amount of stuff. :-)
What numbers do you get on alpha if you point it at a framebuffer,
just for interest?
Paul.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
test(unsigned long *ptr)
{
int i;
unsigned s, e;
for (i = 0; i < 5; ++i)
{
asm("mftb %0
mtctr %3
1: stw 16,%2
stw 16,%2
stw 16,%2
stw 16,%2
stw 16,%2
stw 16,%2
stw 16,%2
eieio
stw 16,%2
bdnz 1b
mftb %1"
: "=r"(s), "=r"(e), "=m"(*ptr)
: "r"(16));
printf("%u ", e-s);
}
printf("\n");
for (i = 0; i < 5; ++i)
{
asm("mftb %0
mtctr %3
1: stw 16,%2
eieio
stw 16,%2
eieio
stw 16,%2
eieio
stw 16,%2
eieio
stw 16,%2
eieio
stw 16,%2
eieio
stw 16,%2
eieio
stw 16,%2
eieio
bdnz 1b
mftb %1"
: "=r"(s), "=r"(e), "=m"(*ptr)
: "r"(16));
printf("%u ", e-s);
}
printf("\n");
}
#define PAGESIZE 0x1000
main(int ac, char **av)
{
unsigned long base, offset;
int fd;
unsigned long mem;
unsigned long *ptr;
test(&mem);
if (ac > 1) {
base = strtoul(av[1], 0, 16);
offset = (base & (PAGESIZE - 1)) / sizeof(unsigned long);
base &= -PAGESIZE;
if ((fd = open("/dev/mem", 2)) < 0) {
perror("/dev/mem");
exit(1);
}
ptr = (unsigned long *)
mmap(0, PAGESIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, base);
if ((long)ptr == -1) {
perror("mmap");
exit(1);
}
test(ptr + offset);
}
exit(0);
}
[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting. ]]
next prev parent reply other threads:[~1999-08-12 7:07 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
1999-08-09 8:17 readl() and friends and eieio on PPC Geert Uytterhoeven
1999-08-09 17:19 ` David A. Gatwood
1999-08-10 1:00 ` Paul Mackerras
1999-08-10 7:18 ` [linux-fbdev] " Jes Sorensen
1999-08-11 0:23 ` Paul Mackerras
1999-08-11 7:23 ` Jes Sorensen
1999-08-11 7:38 ` Richard Henderson
1999-08-12 0:13 ` Paul Mackerras
1999-08-12 1:39 ` Peter Chang
1999-08-12 4:52 ` Paul Mackerras
1999-08-12 6:17 ` Peter Chang
1999-08-12 0:17 ` Paul Mackerras
1999-08-12 4:40 ` Richard Henderson
1999-08-12 5:00 ` Paul Mackerras
1999-08-12 5:43 ` Richard Henderson
1999-08-12 7:07 ` Paul Mackerras [this message]
1999-08-12 7:33 ` Richard Henderson
1999-08-12 9:58 ` Paul Mackerras
1999-08-12 12:31 ` Geert Uytterhoeven
1999-08-13 12:18 ` Paul Mackerras
1999-08-18 11:02 ` Gabriel Paubert
1999-08-13 18:33 ` Richard Henderson
1999-08-12 5:16 ` David Edelsohn
1999-08-12 5:27 ` Paul Mackerras
1999-08-12 5:52 ` Richard Henderson
1999-08-12 7:11 ` Paul Mackerras
1999-08-12 7:32 ` Jes Sorensen
1999-08-11 23:52 ` Paul Mackerras
1999-08-12 7:38 ` Jes Sorensen
1999-08-12 19:00 ` David A. Gatwood
1999-08-13 1:51 ` Paul Mackerras
[not found] <Pine.LNX.3.96.990813143741.27557B-100000@mvista.com>
[not found] ` <d3so5mdyta.fsf@lxp03.cern.ch>
1999-08-14 18:34 ` Geert Uytterhoeven
1999-08-14 18:36 ` David A. Gatwood
1999-08-14 19:48 ` Jes Sorensen
1999-08-15 1:28 ` David A. Gatwood
1999-08-14 21:39 ` Richard Henderson
1999-08-15 23:16 ` Paul Mackerras
1999-08-16 0:29 ` Richard Henderson
1999-08-16 7:11 ` Jes Sorensen
[not found] <m3672hkxri.fsf@soma.andreas.org>
1999-08-15 13:39 ` James Simmons
[not found] <d3pv0p72yr.fsf@lxp03.cern.ch>
1999-08-15 19:43 ` David A. Gatwood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=199908120707.RAA30438@tango.anu.edu.au \
--to=paulus@cs.anu.edu.au \
--cc=Geert.Uytterhoeven@cs.kuleuven.ac.be \
--cc=Jes.Sorensen@cern.ch \
--cc=Paul.Mackerras@cs.anu.edu.au \
--cc=linux-fbdev@vuser.vu.union.edu \
--cc=linuxppc-dev@lists.linuxppc.org \
--cc=rth@cygnus.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).