Fwd: Re: still no accelerated X ($#!$*)

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* Fwd: Re: still no accelerated X ($#!$*)
@ 2000-01-20 18:12 ` Kevin Hendricks
  2000-01-20 18:26   ` David Edelsohn
  2000-01-20 18:46   ` Franz Sirl
  0 siblings, 2 replies; 28+ messages in thread
From: Kevin Hendricks @ 2000-01-20 18:12 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2408 bytes --]

Hi,

Can anyone explain this to me?

> Finally I got it!

>   asm("stwbrx %0,%1,%2": : "r"(regdata), "r"(regindex), "r"(base_addr));

>   asm("lwbrx %0,%1,%2": "=r"(val):"r"(regindex), "r"(base_addr));

>   asm("stwbrx %0,%1,%2": : "r"(regdata), "b"(regindex), "r"(base_addr));

>   asm("lwbrx %0,%1,%2": "=r"(val):"b"(regindex), "r"(base_addr));


> Don't know if this is correct (no clue about ppc assembly), but it works...

Well I did the following with the attached sample program:

gcc -O0 -S testit.c

then I looked at testit.s  (the assembler).

old_regw:
        stwu 1,-32(1)
        stw 31,28(1)
        mr 31,1
        stw 3,8(31)
        mr 0,4
        lis 11,mach64MemReg@ha
        lwz 9,mach64MemReg@l(11)
        lwz 11,8(31)
        stwbrx 0,11,9
.L2:
        lwz 11,0(1)
        lwz 31,-4(11)
        mr 1,11
        blr
.Lfe2:
        .size    old_regw,.Lfe2-old_regw
        .align 2
        .type    old_regr,@function
old_regr:
        stwu 1,-32(1)
        stw 31,28(1)
        mr 31,1
        stw 3,8(31)
        lis 9,mach64MemReg@ha
        lwz 0,mach64MemReg@l(9)
        lwz 11,8(31)
        lwbrx 9,11,0
        mr 3,9
        b .L3
.L3:
        lwz 11,0(1)
        lwz 31,-4(11)
        mr 1,11
        blr
.Lfe3:
        .size    old_regr,.Lfe3-old_regr
        .align 2
:regw:
        stwu 1,-32(1)
        stw 31,28(1)
        mr 31,1
        stw 3,8(31)
        mr 0,4
        lis 11,mach64MemReg@ha
        lwz 9,mach64MemReg@l(11)
        lwz 11,8(31)
        stwbrx 0,11,9
.L4:
        lwz 11,0(1)
        lwz 31,-4(11)
        mr 1,11
        blr
.Lfe4:
        .size    regw,.Lfe4-regw
        .align 2
        .type    regr,@function
regr:
        stwu 1,-32(1)
        stw 31,28(1)
        mr 31,1
        stw 3,8(31)
        lis 9,mach64MemReg@ha
        lwz 0,mach64MemReg@l(9)
        lwz 11,8(31)
        lwbrx 9,11,0
        mr 3,9
        b .L5
.L5:
        lwz 11,0(1)
        lwz 31,-4(11)
        mr 1,11
        blr

And I simply can not see any difference in the actual code produced by each
bunch of asm statements which leads me to believe that there is something else
going on here.

I would love to know exactly what.

Will you please try compiling the code I attached to get the assembler out and
compare old_regr and regr and old_rew and regw and see if you find any
differences.


Are you sure you haven't changed *anything* else?

Thanks,

Kevin


[-- Attachment #2: testit.c --]
[-- Type: text/plain, Size: 1144 bytes --]

#if defined (__powerpc__)

unsigned int mach64MemReg = 0xDEADBEAF;

static inline void old_regw(volatile unsigned long regindex, unsigned long regdata)
{
  register unsigned long base_addr = (unsigned long)mach64MemReg;

  asm("stwbrx %0,%1,%2": : "r"(regdata), "r"(regindex), "r"(base_addr));
}

static inline unsigned long old_regr(volatile unsigned long regindex)
{
  register unsigned long base_addr = (unsigned long)mach64MemReg, val;

  asm("lwbrx %0,%1,%2": "=r"(val):"r"(regindex), "r"(base_addr));
  return(val);
}

static inline void regw(volatile unsigned long regindex, unsigned long regdata)
{
  register unsigned long base_addr = (unsigned long)mach64MemReg;

  asm("stwbrx %0,%1,%2": : "r"(regdata), "b"(regindex), "r"(base_addr));
}

static inline unsigned long regr(volatile unsigned long regindex)
{
  register unsigned long base_addr = (unsigned long)mach64MemReg, val;

  asm("lwbrx %0,%1,%2": "=r"(val):"b"(regindex), "r"(base_addr));
  return(val);
}
#endif

int main() {
 int offset=10;
 int data=12;
 int input, input2;
 
 input = regr(offset);
 input = old_regr(offset);

 regw(offset,data);
 old_regw(offset,data);

}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-20 18:12 ` Fwd: Re: still no accelerated X ($#!$*) Kevin Hendricks
@ 2000-01-20 18:26   ` David Edelsohn
  2000-01-20 18:45     ` Benjamin Herrenschmidt
  2000-01-20 18:52     ` Franz Sirl
  2000-01-20 18:46   ` Franz Sirl
  1 sibling, 2 replies; 28+ messages in thread
From: David Edelsohn @ 2000-01-20 18:26 UTC (permalink / raw)
  To: khendricks; +Cc: linuxppc-dev


	The "b" constraint should be associated with "base_addr", not with
"regindex":

	asm("stwbrx %0,%1,%2": : "r"(regdata), "r"(regindex), "b"(base_addr));

David

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-20 18:26   ` David Edelsohn
@ 2000-01-20 18:45     ` Benjamin Herrenschmidt
  2000-01-20 18:51       ` David Edelsohn
  2000-01-20 18:52     ` Franz Sirl
  1 sibling, 1 reply; 28+ messages in thread
From: Benjamin Herrenschmidt @ 2000-01-20 18:45 UTC (permalink / raw)
  To: David Edelsohn, linuxppc-dev

On Thu, Jan 20, 2000, David Edelsohn <dje@watson.ibm.com> wrote:

>	The "b" constraint should be associated with "base_addr", not with
>"regindex":
>
>	asm("stwbrx %0,%1,%2": : "r"(regdata), "r"(regindex), "b"(base_addr));

Hum... I still have to check what gcc/ppc specific constraints are. But
in this specific case, it's the index who should not be assigned to r0.
Both base and regdata can be r0. So either I'm mising something, or the
"b" constraint is actually wrong semanticall, or we need yet-another
constraint for the index.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-20 18:12 ` Fwd: Re: still no accelerated X ($#!$*) Kevin Hendricks
  2000-01-20 18:26   ` David Edelsohn
@ 2000-01-20 18:46   ` Franz Sirl
  1 sibling, 0 replies; 28+ messages in thread
From: Franz Sirl @ 2000-01-20 18:46 UTC (permalink / raw)
  To: khendricks; +Cc: linuxppc-dev


At 19:12 20.01.00 , Kevin Hendricks wrote:
>Hi,
>
>Can anyone explain this to me?
>
> > Finally I got it!
>
> >   asm("stwbrx %0,%1,%2": : "r"(regdata), "r"(regindex), "r"(base_addr));
>
> >   asm("lwbrx %0,%1,%2": "=r"(val):"r"(regindex), "r"(base_addr));
>
> >   asm("stwbrx %0,%1,%2": : "r"(regdata), "b"(regindex), "r"(base_addr));
>
> >   asm("lwbrx %0,%1,%2": "=r"(val):"b"(regindex), "r"(base_addr));
>
>
> > Don't know if this is correct (no clue about ppc assembly), but it works...
>
>Well I did the following with the attached sample program:
>
>gcc -O0 -S testit.c
>
>then I looked at testit.s  (the assembler).
>
>old_regw:
>         stwu 1,-32(1)
>         stw 31,28(1)
>         mr 31,1
>         stw 3,8(31)
>         mr 0,4
>         lis 11,mach64MemReg@ha
>         lwz 9,mach64MemReg@l(11)
>         lwz 11,8(31)
>         stwbrx 0,11,9
>.L2:
>         lwz 11,0(1)
>         lwz 31,-4(11)
>         mr 1,11
>         blr
>.Lfe2:
>         .size    old_regw,.Lfe2-old_regw
>         .align 2
>         .type    old_regr,@function
>old_regr:
>         stwu 1,-32(1)
>         stw 31,28(1)
>         mr 31,1
>         stw 3,8(31)
>         lis 9,mach64MemReg@ha
>         lwz 0,mach64MemReg@l(9)
>         lwz 11,8(31)
>         lwbrx 9,11,0
>         mr 3,9
>         b .L3
>.L3:
>         lwz 11,0(1)
>         lwz 31,-4(11)
>         mr 1,11
>         blr
>.Lfe3:
>         .size    old_regr,.Lfe3-old_regr
>         .align 2
>:regw:
>         stwu 1,-32(1)
>         stw 31,28(1)
>         mr 31,1
>         stw 3,8(31)
>         mr 0,4
>         lis 11,mach64MemReg@ha
>         lwz 9,mach64MemReg@l(11)
>         lwz 11,8(31)
>         stwbrx 0,11,9
>.L4:
>         lwz 11,0(1)
>         lwz 31,-4(11)
>         mr 1,11
>         blr
>.Lfe4:
>         .size    regw,.Lfe4-regw
>         .align 2
>         .type    regr,@function
>regr:
>         stwu 1,-32(1)
>         stw 31,28(1)
>         mr 31,1
>         stw 3,8(31)
>         lis 9,mach64MemReg@ha
>         lwz 0,mach64MemReg@l(9)
>         lwz 11,8(31)
>         lwbrx 9,11,0
>         mr 3,9
>         b .L5
>.L5:
>         lwz 11,0(1)
>         lwz 31,-4(11)
>         mr 1,11
>         blr
>
>And I simply can not see any difference in the actual code produced by each
>bunch of asm statements which leads me to believe that there is something else
>going on here.
>
>I would love to know exactly what.
>
>Will you please try compiling the code I attached to get the assembler out and
>compare old_regr and regr and old_rew and regw and see if you find any
>differences.

Kevin,
the fix is correct, you cannot use "r" (allow r0-r31) as a base register
constraint, you have to use "b" (allow r1-r31). This is not easy to
reproduce with a small testprogram, because it will only fail if r0 is
assigned for the inline assembly by the compiler, which depends on a lot of
factors. In practice the outdated egcs-1.1.2 seems to have a lower
probability to trigger this bug than the current gcc-2.95.2, probably due
to the better optimizers in gcc-2.95.2 producing higher register pressure.

Additionally with inline assembly you should always be as explicit as
possible, so for stuff possible relying on ordering, you should add
'volatile', and a "memory" clobber for writes:

asm volatile ("stwbrx %0,%1,%2" : :"r"(regdata), "b"(regindex),
"r"(base_addr) : "memory");
asm volatile ("lwbrx %0,%1,%2" : "=r"(val) : "b"(regindex), "r"(base_addr));

Franz.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-20 18:45     ` Benjamin Herrenschmidt
@ 2000-01-20 18:51       ` David Edelsohn
  0 siblings, 0 replies; 28+ messages in thread
From: David Edelsohn @ 2000-01-20 18:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Kevin Hendricks
  Cc: Moritz Thomas, linuxppc-dev, linuxppc-user


> Hum... I still have to check what gcc/ppc specific constraints are. But
> in this specific case, it's the index who should not be assigned to r0.
> Both base and regdata can be r0. So either I'm mising something, or the
> "b" constraint is actually wrong semanticall, or we need yet-another
> constraint for the index.

	Sorry, you are right.  I am multitasking on too many things.  The
way that you have written the inlined assembly, regindex should be the one
with the "b" constraint.

	Also, Geert's comment was correct that there is no difference in
the example function simply because of luck.

Sorry, David
===============================================================================
David Edelsohn                                      T.J. Watson Research Center
dje@watson.ibm.com                                  P.O. Box 218
+1 914 945 4364 (TL 862)                            Yorktown Heights, NY 10598

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-20 18:26   ` David Edelsohn
  2000-01-20 18:45     ` Benjamin Herrenschmidt
@ 2000-01-20 18:52     ` Franz Sirl
  2000-01-20 19:31       ` Gabriel Paubert
  1 sibling, 1 reply; 28+ messages in thread
From: Franz Sirl @ 2000-01-20 18:52 UTC (permalink / raw)
  To: David Edelsohn; +Cc: khendricks, linuxppc-dev


At 19:26 20.01.00 , David Edelsohn wrote:

>         The "b" constraint should be associated with "base_addr", not with
>"regindex":
>
>         asm("stwbrx %0,%1,%2": : "r"(regdata), "r"(regindex),
> "b"(base_addr));

Uhm, David, that's wrong. "b" has to be assigned to %1!

Franz.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-20 18:52     ` Franz Sirl
@ 2000-01-20 19:31       ` Gabriel Paubert
  2000-01-20 19:36         ` Kevin Hendricks
  0 siblings, 1 reply; 28+ messages in thread
From: Gabriel Paubert @ 2000-01-20 19:31 UTC (permalink / raw)
  To: Franz Sirl; +Cc: David Edelsohn, khendricks, linuxppc-dev

On Thu, 20 Jan 2000, Franz Sirl wrote:

>
> At 19:26 20.01.00 , David Edelsohn wrote:
>
> >         The "b" constraint should be associated with "base_addr", not with
> >"regindex":
> >
> >         asm("stwbrx %0,%1,%2": : "r"(regdata), "r"(regindex),
> > "b"(base_addr));
>
> Uhm, David, that's wrong. "b" has to be assigned to %1!

Actually if base_addr can be reused by the compiler for other accesses
to the same area (byte or big endian), it should be written as:

"stwbrx %0,%1,%2": : "r" (regdata), "b" (base_addr), "r" (regindex)

with a volatile qualifier on the asm statement but I disagree on the
"memory" clobber if this does not access areas the compiler will ever
touch and does not have side effect.

There are already too many memory clobbers out there, they are bad because
they basically tell the compiler that it can not keep a single variable
in a register. Have a look at the effect of in and out instructions
in the linux kernel which reload isa_io_base repeatedly when there is no
need for it, that's not exactly a speed issue but a icache footprint
and bloat issue.

This is an effect which is important on all machines which have a
resonable number of registers (all modern ones in practice and perhaps
even the 68k).

	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-20 19:31       ` Gabriel Paubert
@ 2000-01-20 19:36         ` Kevin Hendricks
  2000-01-20 19:51           ` Geert Uytterhoeven
  2000-01-20 19:59           ` Gabriel Paubert
  0 siblings, 2 replies; 28+ messages in thread
From: Kevin Hendricks @ 2000-01-20 19:36 UTC (permalink / raw)
  To: Gabriel Paubert, Franz Sirl; +Cc: David Edelsohn, khendricks, linuxppc-dev


Hi,

> Actually if base_addr can be reused by the compiler for other accesses
> to the same area (byte or big endian), it should be written as:
>
> "stwbrx %0,%1,%2": : "r" (regdata), "b" (base_addr), "r" (regindex)
>
> with a volatile qualifier on the asm statement but I disagree on the
> "memory" clobber if this does not access areas the compiler will ever
> touch and does not have side effect.
>
> There are already too many memory clobbers out there, they are bad because
> they basically tell the compiler that it can not keep a single variable
> in a register.

In this particular case, the base address can change (but very very rarely
such as writing to one Aperature or Another on the Rage 128 card) and all of
the writes are made to either the card memory mapped io or the frame buffer
itself.

Should I not include the : "memory" clobber in this case?  Will it hurt
performance much?

Thanks,

Kevin

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-20 19:36         ` Kevin Hendricks
@ 2000-01-20 19:51           ` Geert Uytterhoeven
  2000-01-20 19:59           ` Gabriel Paubert
  1 sibling, 0 replies; 28+ messages in thread
From: Geert Uytterhoeven @ 2000-01-20 19:51 UTC (permalink / raw)
  To: Kevin Hendricks; +Cc: Gabriel Paubert, Franz Sirl, David Edelsohn, linuxppc-dev


On Thu, 20 Jan 2000, Kevin Hendricks wrote:
> > Actually if base_addr can be reused by the compiler for other accesses
> > to the same area (byte or big endian), it should be written as:
> >
> > "stwbrx %0,%1,%2": : "r" (regdata), "b" (base_addr), "r" (regindex)
> >
> > with a volatile qualifier on the asm statement but I disagree on the
> > "memory" clobber if this does not access areas the compiler will ever
> > touch and does not have side effect.
> >
> > There are already too many memory clobbers out there, they are bad because
> > they basically tell the compiler that it can not keep a single variable
> > in a register.
>
> In this particular case, the base address can change (but very very rarely
> such as writing to one Aperature or Another on the Rage 128 card) and all of
> the writes are made to either the card memory mapped io or the frame buffer
> itself.

The base_addr may change in between calls, but not while executing the asm
code. The memory clobber means that memory contents may change while
executing the asm code without gcc knowing about it.

Gr{oetje,eeting}s,
--
Geert Uytterhoeven -- Linux/{m68k~Amiga,PPC~CHRP} -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-20 19:36         ` Kevin Hendricks
  2000-01-20 19:51           ` Geert Uytterhoeven
@ 2000-01-20 19:59           ` Gabriel Paubert
  2000-01-20 20:08             ` David Edelsohn
  2000-01-20 22:34             ` Franz Sirl
  1 sibling, 2 replies; 28+ messages in thread
From: Gabriel Paubert @ 2000-01-20 19:59 UTC (permalink / raw)
  To: Kevin Hendricks; +Cc: Franz Sirl, David Edelsohn, linuxppc-dev

	Hi,

> In this particular case, the base address can change (but very very rarely
> such as writing to one Aperature or Another on the Rage 128 card) and all of
> the writes are made to either the card memory mapped io or the frame buffer
> itself.

Then the memory clobber would force the compiler to reload base_addr
between 2 writes to the frame buffer.

> Should I not include the : "memory" clobber in this case?  Will it hurt
> performance much?

I think that it is not necessary: the best thing with a compiler which
performs alias analysis might be to tell the truth

asm ("stwbrx %1,%2,%3"
     : "=m" (*(volatile unsigned *)(base_addr+regindex))
     : "r" (regdata), "b" (base_addr), "r" (regindex));

Note we don't use %0, and it won't produce any aditional code. You may
want to check what the compiler would have generated as addressing mode
by appending " # %0" at the end of the code string.

I truly wish there were a constraint for indexed addressing modes (anyway
this will be necessary for proper Altivec support).

	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-20 19:59           ` Gabriel Paubert
@ 2000-01-20 20:08             ` David Edelsohn
  2000-01-20 22:34             ` Franz Sirl
  1 sibling, 0 replies; 28+ messages in thread
From: David Edelsohn @ 2000-01-20 20:08 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Kevin Hendricks, linuxppc-dev


>>>>> Gabriel Paubert writes:

Gabriel> I think that it is not necessary: the best thing with a compiler which
Gabriel> performs alias analysis might be to tell the truth

Gabriel> asm ("stwbrx %1,%2,%3"
Gabriel> : "=m" (*(volatile unsigned *)(base_addr+regindex))
Gabriel> : "r" (regdata), "b" (base_addr), "r" (regindex));

Gabriel> Note we don't use %0, and it won't produce any aditional code. You may
Gabriel> want to check what the compiler would have generated as addressing mode
Gabriel> by appending " # %0" at the end of the code string.

	Yes, this type of inlined assembly describing the actual memory
operation as an output constraint is better than clobbering all of memory,
if possible.

David

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-20 19:59           ` Gabriel Paubert
  2000-01-20 20:08             ` David Edelsohn
@ 2000-01-20 22:34             ` Franz Sirl
  2000-01-21  0:05               ` Gabriel Paubert
  1 sibling, 1 reply; 28+ messages in thread
From: Franz Sirl @ 2000-01-20 22:34 UTC (permalink / raw)
  To: Gabriel Paubert, Kevin Hendricks; +Cc: David Edelsohn, linuxppc-dev


Am Don, 20 Jan 2000 schrieb Gabriel Paubert:
>Hi,
>
>> In this particular case, the base address can change (but very very rarely
>> such as writing to one Aperature or Another on the Rage 128 card) and all of
>> the writes are made to either the card memory mapped io or the frame buffer
>> itself.
>
>Then the memory clobber would force the compiler to reload base_addr
>between 2 writes to the frame buffer.
>
>> Should I not include the : "memory" clobber in this case?  Will it hurt
>> performance much?
>
>I think that it is not necessary: the best thing with a compiler which
>performs alias analysis might be to tell the truth
>
>asm ("stwbrx %1,%2,%3"
>     : "=m" (*(volatile unsigned *)(base_addr+regindex))
>     : "r" (regdata), "b" (base_addr), "r" (regindex));
>
>Note we don't use %0, and it won't produce any aditional code. You may
>want to check what the compiler would have generated as addressing mode
>by appending " # %0" at the end of the code string.

It depends a little bit on the usage of the asm's if the memory (either
global or local) clobber is needed or not. If you use them for read/writes to HW
registers needing ordering (which is very likely here since we talk about
graphics HW), the compiler can only decide on the memory usage defined by the
clobbers/memory inputs on how to order the inlines (volatile has no effect on
this).

Actually the load instructions need a memory input too:

asm volatile ("lwbrx %0,%1,%2" : "=r"(val) : "b"(regindex), "r"(base_addr),
"m" (*(volatile unsigned *)(base_addr+regindex)));

And to insure ordering on processor level you still need the eieio (with a
memory clobber) as usual.

Franz.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-20 22:34             ` Franz Sirl
@ 2000-01-21  0:05               ` Gabriel Paubert
  2000-01-21  0:35                 ` Kevin Hendricks
  2000-01-21 15:47                 ` Franz Sirl
  0 siblings, 2 replies; 28+ messages in thread
From: Gabriel Paubert @ 2000-01-21  0:05 UTC (permalink / raw)
  To: Franz Sirl; +Cc: Kevin Hendricks, David Edelsohn, linuxppc-dev

On Thu, 20 Jan 2000, Franz Sirl wrote:

> It depends a little bit on the usage of the asm's if the memory (either
> global or local) clobber is needed or not. If you use them for read/writes to HW
> registers needing ordering (which is very likely here since we talk about
> graphics HW), the compiler can only decide on the memory usage defined by the
> clobbers/memory inputs on how to order the inlines (volatile has no effect on
> this).

Do you mean that a volatile memory reference can be reordered wit an asm
volatile statement ? I thought theat this was not possible.

>From GCC's documentation:

   "You can prevent an `asm' instruction from being deleted, moved
significantly, or combined, by writing the keyword `volatile' after the
`asm'."

You might disagree, but I consider that moving across a volatile memory
reference is a _significant_ move, a very significant one.

> Actually the load instructions need a memory input too:
>
> asm volatile ("lwbrx %0,%1,%2" : "=r"(val) : "b"(regindex), "r"(base_addr),
> "m" (*(volatile unsigned *)(base_addr+regindex)));

Indeed, perhaps even more than the store since asm without output operands
are actually assumed to have side effects.

>
> And to insure ordering on processor level you still need the eieio (with a
> memory clobber) as usual.

The best thing to do is probably to put the eieio inside the asm
statement, perhaps with 2 macros, one that includes the eieio and one that
does not include it.

Memory clobbers should be used extremely sparingly, I view them as the
last resort when _nothing_ else would work. They are used far too often
for my taste in the kernel, probably because they don't affect
significantly machines with very few registers like Intel which have to
permanently spill registers to memory or stack.

	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-21  0:05               ` Gabriel Paubert
@ 2000-01-21  0:35                 ` Kevin Hendricks
  2000-01-21  1:53                   ` Gabriel Paubert
  2000-01-21 11:54                   ` Benjamin Herrenschmidt
  2000-01-21 15:47                 ` Franz Sirl
  1 sibling, 2 replies; 28+ messages in thread
From: Kevin Hendricks @ 2000-01-21  0:35 UTC (permalink / raw)
  To: Gabriel Paubert, Franz Sirl; +Cc: David Edelsohn, linuxppc-dev

Hi,

I don't want to be dense here but why two macros (one with eieio and one without
eieio).  Can two processors in an SMP setting be trying to drive the same
video card at the same time?   I didn't think that was possible.  I saw the
eieio usage in the kernel versions and in aty128fb.c but thought that multiple
processors might use those macros at the same time and multiple processors
might have different fbdev drivers running in multi-head applications.  But I
thought only one processor could drive a video hardware card.  Is this a bad
assumption?

So exactly what is the best way to write these macros for Xpmac?  Using
the output constraints approach with eieio following it or is all of this
overkill.

>From the various posts (given the operand ordering done in the original post),
here is what I have tried to piece together.

asm volatile ("stwbrx %1,%2,%3; eieio" : "=m" (*(volatile unsigned
*)(base_addr+regindex))       : "r" (regdata), "b" (regindex), "r" (base_addr));

asm volatile ("lwbrx %0,%1,%2; eieio" : "=r"(val) : "b"(regindex),
"r"(base_addr), "m" (*(volatile unsigned *)(base_addr+regindex)));

Please let me know how to change the above so that I get it right this time.

Thanks,

Kevin

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-21  0:35                 ` Kevin Hendricks
@ 2000-01-21  1:53                   ` Gabriel Paubert
  2000-01-21  2:19                     ` Kevin Hendricks
  2000-01-21 11:54                   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 28+ messages in thread
From: Gabriel Paubert @ 2000-01-21  1:53 UTC (permalink / raw)
  To: Kevin Hendricks; +Cc: Franz Sirl, David Edelsohn, linuxppc-dev

	Hi,

> I don't want to be dense here but why two macros (one with eieio and one without
> eieio).  Can two processors in an SMP setting be trying to drive the same
> video card at the same time?   I didn't think that was possible.  I saw the
> eieio usage in the kernel versions and in aty128fb.c but thought that multiple
> processors might use those macros at the same time and multiple processors
> might have different fbdev drivers running in multi-head applications.  But I
> thought only one processor could drive a video hardware card.  Is this a bad
> assumption?

Multiple processors attempting to access the video hardware should be
serialized one way or anoter (semaphores or whatever).

> So exactly what is the best way to write these macros for Xpmac?  Using
> the output constraints approach with eieio following it or is all of this
> overkill.

eieio may be useful and even necessary when accessing the registers, but
in the frame buffer section it prevents the processor from performing
store merging to non guarded memory and other optimizations (like the
bridge to merge the writes and transforming them into bursts to the
framebuffer).

eieio is also necessary to avoid load from being moved before stores when
accessing IO registers: if your intent is to write to register A and then
read register B and that the first write actually changes the contents of
register B (for example your trigger an operation with the write and wait
for it to finish by reading register B which contains a busy bit), then
the eieio is necessary (it is not necessary if the registers are the
same).

> >From the various posts (given the operand ordering done in the original post),
> here is what I have tried to piece together.
>
> asm volatile ("stwbrx %1,%2,%3; eieio" : "=m" (*(volatile unsigned
> *)(base_addr+regindex))       : "r" (regdata), "b" (regindex), "r" (base_addr));
>
> asm volatile ("lwbrx %0,%1,%2; eieio" : "=r"(val) : "b"(regindex),
> "r"(base_addr), "m" (*(volatile unsigned *)(base_addr+regindex)));

I actually doubt that the eieio are necessary but then I'm not a
specialist on this kind of hardware. Every eieio is a bus broadcast
operation (except on 603, on G3 it is IIRC an option controlled by a bit
in HID0) and actually has a cost comparable to a write posted I/O access
but the other consequences (preventing bursts on the I/O bus) may actually
cause a significant performance hit.  So it should be used only when
necessary...

> Please let me know how to change the above so that I get it right this time.

Try to determine first whether the eieio are necessary; for access to the
frame buffer I'm almost sure that they are superfluous and potentially
very costly in terms of performance. For the MMIO I suspect that they may
be necessary at some places, but adding them systematically will have less
impact.

	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-21  1:53                   ` Gabriel Paubert
@ 2000-01-21  2:19                     ` Kevin Hendricks
  2000-01-21  7:58                       ` Geert Uytterhoeven
  2000-01-21 14:15                       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 28+ messages in thread
From: Kevin Hendricks @ 2000-01-21  2:19 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Franz Sirl, David Edelsohn, linuxppc-dev

Hi,

> I actually doubt that the eieio are necessary but then I'm not a
> specialist on this kind of hardware. Every eieio is a bus broadcast
> operation (except on 603, on G3 it is IIRC an option controlled by a bit
> in HID0) and actually has a cost comparable to a write posted I/O access
> but the other consequences (preventing bursts on the I/O bus) may actually
> cause a significant performance hit.  So it should be used only when
> necessary...
>
> > Please let me know how to change the above so that I get it right this time.
>
> Try to determine first whether the eieio are necessary; for access to the
> frame buffer I'm almost sure that they are superfluous and potentially
> very costly in terms of performance. For the MMIO I suspect that they may
> be necessary at some places, but adding them systematically will have less
> impact.

Okay, I went and looked at the latest aty128fb.c code and it does not use eieio
anywhere.  I looked at ealier verions of this file and it at one time had eieio
but they have since been removed.

I also looked and the endian conversion routines do not use the output
contraint approach you took but do include the memory clobber on the writes.

I think I will go with the output constraint version given above without the
eieio until or unless the kernel driver begins to use them too.

Thanks for all of your help with this everyone.  I have learned alot.

Kevin

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-21  2:19                     ` Kevin Hendricks
@ 2000-01-21  7:58                       ` Geert Uytterhoeven
  2000-01-21 14:15                       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 28+ messages in thread
From: Geert Uytterhoeven @ 2000-01-21  7:58 UTC (permalink / raw)
  To: Kevin Hendricks; +Cc: Gabriel Paubert, Franz Sirl, David Edelsohn, linuxppc-dev


On Thu, 20 Jan 2000, Kevin Hendricks wrote:
> > I actually doubt that the eieio are necessary but then I'm not a
> > specialist on this kind of hardware. Every eieio is a bus broadcast
> > operation (except on 603, on G3 it is IIRC an option controlled by a bit
> > in HID0) and actually has a cost comparable to a write posted I/O access
> > but the other consequences (preventing bursts on the I/O bus) may actually
> > cause a significant performance hit.  So it should be used only when
> > necessary...
> >
> > > Please let me know how to change the above so that I get it right this time.
> >
> > Try to determine first whether the eieio are necessary; for access to the
> > frame buffer I'm almost sure that they are superfluous and potentially
> > very costly in terms of performance. For the MMIO I suspect that they may
> > be necessary at some places, but adding them systematically will have less
> > impact.
>
> Okay, I went and looked at the latest aty128fb.c code and it does not use eieio
> anywhere.  I looked at ealier verions of this file and it at one time had eieio
> but they have since been removed.

Perhaps they were replaced by the platform independent wmb()?

Gr{oetje,eeting}s,
--
Geert Uytterhoeven -- Linux/{m68k~Amiga,PPC~CHRP} -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-21  0:35                 ` Kevin Hendricks
  2000-01-21  1:53                   ` Gabriel Paubert
@ 2000-01-21 11:54                   ` Benjamin Herrenschmidt
  2000-01-21 13:34                     ` Gabriel Paubert
  1 sibling, 1 reply; 28+ messages in thread
From: Benjamin Herrenschmidt @ 2000-01-21 11:54 UTC (permalink / raw)
  To: khendricks, linuxppc-dev

On Thu, Jan 20, 2000, Kevin Hendricks <khendricks@ivey.uwo.ca> wrote:

>From the various posts (given the operand ordering done in the original
post),
>here is what I have tried to piece together.
>
>asm volatile ("stwbrx %1,%2,%3; eieio" : "=m" (*(volatile unsigned
>*)(base_addr+regindex))       : "r" (regdata), "b" (regindex), "r"
>(base_addr));
>
>asm volatile ("lwbrx %0,%1,%2; eieio" : "=r"(val) : "b"(regindex),
>"r"(base_addr), "m" (*(volatile unsigned *)(base_addr+regindex)));

Hi Kevin !

A good rule is to use eieio() when accessing a register (that means doing
an access that actually performs an action and whose ordering is
important relative to other accesses of the same type) and not use it
when filling the framebuffer. There are usually few enough register
accesses for this to work. it may be optimal to skip eieio's when writing
to a bunch "parameters" registers where ordering is not important, in
this case you just need to put an eieio() between those, and the register
write that triggers the engine operation that will use those parameters.
However, when doing that, the PCI bridge is allowed to combine your
register writes in a burst, and I know some cards who don't handle burst
access to MMIO registers very well.

Basically, eieio() will make sure that all previous memory accesses will
have been finished before memory accesses after the eieio are done.

It may be important to make sure that the last bit of framebuffer has
been written before "starting" an engine operation. So one eieio between
frame buffer filling and engine register access may be useful in the case
where you use eieio after the write in the asm.

I personally tend to prefer doing the eieio _before_ the read/write in
the asm code, but there are some rare cases where you mix eieio and
non-eieio accesses (like with your framebuffer) where special care must
be taken and may require both eieio before and after the register access.

Another thing to take care of is PCI write posting: Basically, when you
write, let's say, a MMIO register, you are not guaranteed that this write
have actually been done unless you do a read from the same io space. For
example: If you write an interrupt mask register to disable an interrupt
followed by critical code in which this interrupt _must not_ happen, you
need absolutely to do a read (typically to re-read the mask you just
wrote to) after the write, and before the critical code.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-21 11:54                   ` Benjamin Herrenschmidt
@ 2000-01-21 13:34                     ` Gabriel Paubert
  2000-01-21 14:06                       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 28+ messages in thread
From: Gabriel Paubert @ 2000-01-21 13:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: khendricks, linuxppc-dev




On Fri, 21 Jan 2000, Benjamin Herrenschmidt wrote:

> Hi Kevin !
>
> A good rule is to use eieio() when accessing a register (that means doing
> an access that actually performs an action and whose ordering is
> important relative to other accesses of the same type) and not use it
> when filling the framebuffer. There are usually few enough register
> accesses for this to work. it may be optimal to skip eieio's when writing
> to a bunch "parameters" registers where ordering is not important, in
> this case you just need to put an eieio() between those, and the register
> write that triggers the engine operation that will use those parameters.

Indeed, that's what I do in my VME driver. I did not like the fact that
readl/writel always insert an eieio. This is safe but almost doubles the
number of bus cycles. Very often it translates into:

- a bunch or register accesses,
- eieio,
- the access which triggers the operation (which was a read in some cases
  for the VME driver).
- eieio again, so you don't have problems with setting up the next
operation...

> However, when doing that, the PCI bridge is allowed to combine your
> register writes in a burst, and I know some cards who don't handle burst
> access to MMIO registers very well.

Broken HW exists but in a device specific driver, you should know whether
it is necessary or not from the errata...

> Basically, eieio() will make sure that all previous memory accesses will
> have been finished before memory accesses after the eieio are done.
>
> It may be important to make sure that the last bit of framebuffer has
> been written before "starting" an engine operation. So one eieio between
> frame buffer filling and engine register access may be useful in the case
> where you use eieio after the write in the asm.

If there is one before the register access that triggers the engine
operation, it should be enough.

>
> I personally tend to prefer doing the eieio _before_ the read/write in
> the asm code, but there are some rare cases where you mix eieio and
> non-eieio accesses (like with your framebuffer) where special care must
> be taken and may require both eieio before and after the register access.

That's a good quetion, does anybody have any final conclusion about
whether it is better to ensure ordering before or after the access ?

> Another thing to take care of is PCI write posting: Basically, when you
> write, let's say, a MMIO register, you are not guaranteed that this write
> have actually been done unless you do a read from the same io space. For
> example: If you write an interrupt mask register to disable an interrupt
> followed by critical code in which this interrupt _must not_ happen, you
> need absolutely to do a read (typically to re-read the mask you just
> wrote to) after the write, and before the critical code.

In this case, I think that you need the following:

- write,
- eieio,
- read,
- isync to make sure that the read has reached the registers and is not in
a load pending queue or whatever which can be quite deep especially if the
processor does never need the result of the read...

	Gabriel.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-21 13:34                     ` Gabriel Paubert
@ 2000-01-21 14:06                       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 28+ messages in thread
From: Benjamin Herrenschmidt @ 2000-01-21 14:06 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev

On Fri, Jan 21, 2000, Gabriel Paubert <paubert@iram.es> wrote:

>- write,
>- eieio,
>- read,
>- isync to make sure that the read has reached the registers and is not in
>a load pending queue or whatever which can be quite deep especially if the
>processor does never need the result of the read...

Indeed. Some time ago, I fixed the pmac-pic mask/unmask routines this way:

 - setup new mask in cached_mask (variable)
 - write_mask(cached_mask)
 - do {
 -   sync();
 - } while (read_mask() != cached_mask)

Note that both read_mask and write_mask will do eieio.

I beleive the sync could be replaced by an isync, I'm just not 100% sure
of the SMP behaviour but the mask/unmask routines should be fully
synchronized anyway. (And they are called with EE off).

Without this fix, we occasionally had bogus interrupts coming from the
IDE and possibly other rare problems.

I added a smiliar fix to the openpic code in my recent kernels too.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-21  2:19                     ` Kevin Hendricks
  2000-01-21  7:58                       ` Geert Uytterhoeven
@ 2000-01-21 14:15                       ` Benjamin Herrenschmidt
  2000-01-22 20:54                         ` [linux-fbdev] " anthony tong
  1 sibling, 1 reply; 28+ messages in thread
From: Benjamin Herrenschmidt @ 2000-01-21 14:15 UTC (permalink / raw)
  To: khendricks; +Cc: linuxppc-dev, linux-fbdev

On Thu, Jan 20, 2000, Kevin Hendricks <khendricks@ivey.uwo.ca> wrote:

>Okay, I went and looked at the latest aty128fb.c code and it does not use
>eieio
>anywhere.  I looked at ealier verions of this file and it at one time had
>eieio
>but they have since been removed.
>
>I also looked and the endian conversion routines do not use the output
>contraint approach you took but do include the memory clobber on the writes.

I just looked at atyfb.c and aty128fb.c in my source tree (atyfb is
2.2.14 one and aty128fb is the latest backport done by atong) and neither
uses eieio nor mb(), wmb(), ...

This looks bogus to me. I've spotted a few cases where those calls should
be in.

We can either put the eieio back in the access functions (less optimal,
but we can also fix the constraints to get rid of the memory clobber as
discussed previously), or we can fill the code with carefuly placed mb()
and wmb() but this requires more knowledge of the chipset than I actually
have.

I'll put back eieio() in the access macros for my kernels until a
definitive answer pops up on this issue.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-21  0:05               ` Gabriel Paubert
  2000-01-21  0:35                 ` Kevin Hendricks
@ 2000-01-21 15:47                 ` Franz Sirl
  2000-01-21 19:08                   ` Gabriel Paubert
  1 sibling, 1 reply; 28+ messages in thread
From: Franz Sirl @ 2000-01-21 15:47 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Kevin Hendricks, David Edelsohn, linuxppc-dev


At 01:05 21.01.00 , Gabriel Paubert wrote:



>On Thu, 20 Jan 2000, Franz Sirl wrote:
>
> > It depends a little bit on the usage of the asm's if the memory (either
> > global or local) clobber is needed or not. If you use them for
> read/writes to HW
> > registers needing ordering (which is very likely here since we talk about
> > graphics HW), the compiler can only decide on the memory usage defined
> by the
> > clobbers/memory inputs on how to order the inlines (volatile has no
> effect on
> > this).
>
>Do you mean that a volatile memory reference can be reordered wit an asm
>volatile statement ? I thought theat this was not possible.
>
> >From GCC's documentation:
>
>    "You can prevent an `asm' instruction from being deleted, moved
>significantly, or combined, by writing the keyword `volatile' after the
>`asm'."
>
>You might disagree, but I consider that moving across a volatile memory
>reference is a _significant_ move, a very significant one.

asm volatile doesn't declare any volatile memory references! What makes you
think it does?
There was a lengthy discussion about this with Richard Henderson on the
gcc/egcs lists a while ago. The result was a rewrite of
linux/include/asm-ppc/io.h in the kernel to it's current state.

Franz.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [linux-fbdev] Re: Fwd: Re: still no accelerated X ($#!$*)
@ 2000-01-21 16:22 Brad Douglas
  0 siblings, 0 replies; 28+ messages in thread
From: Brad Douglas @ 2000-01-21 16:22 UTC (permalink / raw)
  To: khendricks, Benjamin Herrenschmidt, anthony tong
  Cc: linuxppc-dev, linux-fbdev


-----Original Message-----
From: Benjamin Herrenschmidt <bh40@calva.net>

>
>>Okay, I went and looked at the latest aty128fb.c code and it does not use
>>eieio
>>anywhere.  I looked at ealier verions of this file and it at one time had
>>eieio
>>but they have since been removed.
>>
>>I also looked and the endian conversion routines do not use the output
>>contraint approach you took but do include the memory clobber on the
writes.
>
>I just looked at atyfb.c and aty128fb.c in my source tree (atyfb is
>2.2.14 one and aty128fb is the latest backport done by atong) and neither
>uses eieio nor mb(), wmb(), ...
>
>This looks bogus to me. I've spotted a few cases where those calls should
>be in.
>
>We can either put the eieio back in the access functions (less optimal,
>but we can also fix the constraints to get rid of the memory clobber as
>discussed previously), or we can fill the code with carefuly placed mb()
>and wmb() but this requires more knowledge of the chipset than I actually
>have.
>
>I'll put back eieio() in the access macros for my kernels until a
>definitive answer pops up on this issue.


I'm forwarding this to Anthony...

Thanks,

Brad Douglas
brad@neruo.com


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-21 15:47                 ` Franz Sirl
@ 2000-01-21 19:08                   ` Gabriel Paubert
  0 siblings, 0 replies; 28+ messages in thread
From: Gabriel Paubert @ 2000-01-21 19:08 UTC (permalink / raw)
  To: Franz Sirl; +Cc: Kevin Hendricks, David Edelsohn, linuxppc-dev

On Fri, 21 Jan 2000, Franz Sirl wrote:

> >    "You can prevent an `asm' instruction from being deleted, moved
> >significantly, or combined, by writing the keyword `volatile' after the
> >`asm'."
> >
> >You might disagree, but I consider that moving across a volatile memory
> >reference is a _significant_ move, a very significant one.
>
> asm volatile doesn't declare any volatile memory references! What makes you
> think it does?

No it does not declare any volatile memory reference, but it is
"guaranteed not to be moved significantly". What does this exactly mean ?

Anyway asm volatile statements are guranteed not to reordered wrt each
other. So let us illustrate it with examples; if you write:

asm volatile("stwbrx...", ...);

asm volatile("eieio");

asm volatile("lwbrx...", ...);

the compiler should respect the order even without nasty memory clobbers.
The funny thing is that if the accesses are big endian or byte, then you
don't need the asm and a reordering might perhaps take place. In this
case:

*(volatile unsigned *)output_reg = data;

asm volatile("eieio");

result = *(volatile unsigned *) input_reg;

then the eieio might move become the first or last statement according to
what you say. I consider that this does not reflect the documentation
since moving across a volatile memory reference is a significant move
in my book.

Note that the documentation also states clearly that such an asm without
any parameters is considered as having undeterminate side effects so it
might actually not be moved, but it might be as bad as a memory clobber.

> There was a lengthy discussion about this with Richard Henderson on the
> gcc/egcs lists a while ago. The result was a rewrite of
> linux/include/asm-ppc/io.h in the kernel to it's current state.

I remember, but my goal is a memory-clobber-free kernel, although there
are many other efficiency issues right now in Linux/PPC...

	Gabriel
	(desperately trying to build gcc_latest_snapshot)

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [linux-fbdev] Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-21 14:15                       ` Benjamin Herrenschmidt
@ 2000-01-22 20:54                         ` anthony tong
  2000-01-23  2:44                           ` Kevin Hendricks
  0 siblings, 1 reply; 28+ messages in thread
From: anthony tong @ 2000-01-22 20:54 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: khendricks, linuxppc-dev, linux-fbdev


Benjamin Herrenschmidt (Fri, Jan 21, 2000 at 03:15:11PM +0100):
> On Thu, Jan 20, 2000, Kevin Hendricks <khendricks@ivey.uwo.ca> wrote:
> >Okay, I went and looked at the latest aty128fb.c code and it does not use
> >eieio
> >anywhere.  I looked at ealier verions of this file and it at one time had
> >eieio
> >but they have since been removed.
> >
> >I also looked and the endian conversion routines do not use the output
> >contraint approach you took but do include the memory clobber on the writes.
>
> I just looked at atyfb.c and aty128fb.c in my source tree (atyfb is
> 2.2.14 one and aty128fb is the latest backport done by atong) and neither
> uses eieio nor mb(), wmb(), ...
>
> This looks bogus to me. I've spotted a few cases where those calls should
> be in.
>
> We can either put the eieio back in the access functions (less optimal,
> but we can also fix the constraints to get rid of the memory clobber as
> discussed previously), or we can fill the code with carefuly placed mb()
> and wmb() but this requires more knowledge of the chipset than I actually
> have.
>
> I'll put back eieio() in the access macros for my kernels until a
> definitive answer pops up on this issue.

I must have missed when these were taken out; does anyone know the reason
why it was done?


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [linux-fbdev] Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-22 20:54                         ` [linux-fbdev] " anthony tong
@ 2000-01-23  2:44                           ` Kevin Hendricks
  0 siblings, 0 replies; 28+ messages in thread
From: Kevin Hendricks @ 2000-01-23  2:44 UTC (permalink / raw)
  To: anthony tong, Benjamin Herrenschmidt
  Cc: khendricks, linuxppc-dev, linux-fbdev

Hi Anthony,

> > I'll put back eieio() in the access macros for my kernels until a
> > definitive answer pops up on this issue.
>
> I must have missed when these were taken out; does anyone know the reason
> why it was done?

When the eieio's in aty128fb.c have been added back in to the version that was
backported to 2.2.14 please cc a copy of the file you send to Ben to me also.
I am still having trouble debugging some final nits in the xf 3.9.17 r128
module and would love to have a "fixed" driver.

Thanks,

Kevin

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [linux-fbdev] Re: Fwd: Re: still no accelerated X ($#!$*)
       [not found] <Pine.GSO.4.05.10001261006570.12458-100000@callisto.acsu.buffalo.edu>
@ 2000-01-26 16:20 ` Geert Uytterhoeven
  2000-01-26 17:23   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 28+ messages in thread
From: Geert Uytterhoeven @ 2000-01-26 16:20 UTC (permalink / raw)
  To: James A Simmons
  Cc: Benjamin Herrenschmidt, khendricks, linuxppc-dev, linux-fbdev

On Wed, 26 Jan 2000, James A Simmons wrote:
>   I have seen alot of talk about wmb() and eieio(). Where can I find docs
> on these functions and how to program driver correctly using them? Thank
> you.

Very simple, cfr. include/asm-ppc/system.h:

/*
 * Memory barrier.
 * The sync instruction guarantees that all memory accesses initiated
 * by this processor have been performed (with respect to all other
 * mechanisms that access memory).  The eieio instruction is a barrier
 * providing an ordering (separately) for (a) cacheable stores and (b)
 * loads and stores to non-cacheable memory (e.g. I/O devices).
 *
 * mb() prevents loads and stores being reordered across this point.
 * rmb() prevents loads being reordered across this point.
 * wmb() prevents stores being reordered across this point.
 *
 * We can use the eieio instruction for wmb, but since it doesn't
 * give any ordering guarantees about loads, we have to use the
 * stronger but slower sync instruction for mb and rmb.
 */
#define mb()  __asm__ __volatile__ ("sync" : : : "memory")
#define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
#define wmb()  __asm__ __volatile__ ("eieio" : : : "memory")

The *mb() macro's are portable across the different architectures supported by
Linux. Yes, *mb are Alpha mnemonics :-)

Gr{oetje,eeting}s,
--
Geert Uytterhoeven -- Linux/{m68k~Amiga,PPC~CHRP} -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [linux-fbdev] Re: Fwd: Re: still no accelerated X ($#!$*)
  2000-01-26 16:20 ` Geert Uytterhoeven
@ 2000-01-26 17:23   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 28+ messages in thread
From: Benjamin Herrenschmidt @ 2000-01-26 17:23 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: linuxppc-dev, linux-fbdev

On Wed, Jan 26, 2000, Geert Uytterhoeven <geert@linux-m68k.org> wrote:

>Very simple, cfr. include/asm-ppc/system.h:
>
>/*
> * Memory barrier.
> * The sync instruction guarantees that all memory accesses initiated
> * by this processor have been performed (with respect to all other
> * mechanisms that access memory).  The eieio instruction is a barrier
> * providing an ordering (separately) for (a) cacheable stores and (b)
> * loads and stores to non-cacheable memory (e.g. I/O devices).
> *
> * mb() prevents loads and stores being reordered across this point.
> * rmb() prevents loads being reordered across this point.
> * wmb() prevents stores being reordered across this point.
> *
> * We can use the eieio instruction for wmb, but since it doesn't
> * give any ordering guarantees about loads, we have to use the
> * stronger but slower sync instruction for mb and rmb.
> */
>#define mb()  __asm__ __volatile__ ("sync" : : : "memory")
>#define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
>#define wmb()  __asm__ __volatile__ ("eieio" : : : "memory")
>
>The *mb() macro's are portable across the different architectures
supported by
>Linux. Yes, *mb are Alpha mnemonics :-)

The semantics used here are a bit strange. Definitely sync is evil and
should be avoided except in rare few cases. It's not a memory barrier,
it's a lot more since it ensure synchronisation on the bus, inter-cpu
synchro, and interrupt synchro. It eats a lot of cycles on SMP configs
and with recent G3/G4 CPUs.

eieio is used for wmb, but it actually works for both reads and writes
provided that you actually want a _memory access_ barrier and not an
interrupt barrier or other means of syncing CPUs on the bus.

Note that include/asm/io.h contains iobarrer macros which are better
suited for io operations.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2000-01-26 17:23 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Message from Kevin Hendricks <khendricks@ivey.uwo.ca>
2000-01-20 18:12 ` Fwd: Re: still no accelerated X ($#!$*) Kevin Hendricks
2000-01-20 18:26   ` David Edelsohn
2000-01-20 18:45     ` Benjamin Herrenschmidt
2000-01-20 18:51       ` David Edelsohn
2000-01-20 18:52     ` Franz Sirl
2000-01-20 19:31       ` Gabriel Paubert
2000-01-20 19:36         ` Kevin Hendricks
2000-01-20 19:51           ` Geert Uytterhoeven
2000-01-20 19:59           ` Gabriel Paubert
2000-01-20 20:08             ` David Edelsohn
2000-01-20 22:34             ` Franz Sirl
2000-01-21  0:05               ` Gabriel Paubert
2000-01-21  0:35                 ` Kevin Hendricks
2000-01-21  1:53                   ` Gabriel Paubert
2000-01-21  2:19                     ` Kevin Hendricks
2000-01-21  7:58                       ` Geert Uytterhoeven
2000-01-21 14:15                       ` Benjamin Herrenschmidt
2000-01-22 20:54                         ` [linux-fbdev] " anthony tong
2000-01-23  2:44                           ` Kevin Hendricks
2000-01-21 11:54                   ` Benjamin Herrenschmidt
2000-01-21 13:34                     ` Gabriel Paubert
2000-01-21 14:06                       ` Benjamin Herrenschmidt
2000-01-21 15:47                 ` Franz Sirl
2000-01-21 19:08                   ` Gabriel Paubert
2000-01-20 18:46   ` Franz Sirl
2000-01-21 16:22 [linux-fbdev] " Brad Douglas
     [not found] <Pine.GSO.4.05.10001261006570.12458-100000@callisto.acsu.buffalo.edu>
2000-01-26 16:20 ` Geert Uytterhoeven
2000-01-26 17:23   ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).