linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* question about altivec registers
@ 1999-10-25 20:51 Jim Terman
  1999-10-25 22:27 ` Claude Robitaille
  1999-10-26  4:42 ` Kumar Gala
  0 siblings, 2 replies; 23+ messages in thread
From: Jim Terman @ 1999-10-25 20:51 UTC (permalink / raw)
  To: linuxppc-dev


I have a question about using the altivec registers on a G4 Macintosh
running linuxppc.  Will there be any conflicts.  I don't expect any
kernal support, but can I be confident that the kernal will not touch
any of the altivec registers.  Any info will be greatly appreciated.

-- 
______________________________________________________________________________
Jim Terman       |   323 Vintage Park Dr.   |   Voice: (650) 356-5446
terman@ddi.com   |   Foster City, CA        |   Fax:   (650) 356-5490
Diab-SDS, Inc.   |   94404                  |   web site - http://www.ddi.com

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-25 20:51 question about altivec registers Jim Terman
@ 1999-10-25 22:27 ` Claude Robitaille
  1999-10-25 22:31   ` Jim Terman
  1999-10-26  4:42 ` Kumar Gala
  1 sibling, 1 reply; 23+ messages in thread
From: Claude Robitaille @ 1999-10-25 22:27 UTC (permalink / raw)
  To: Jim Terman; +Cc: linuxppc-dev


There is a flag for that purpose. You should look into the Altivec
environment on Motorola's Web site, or at www.altivec.org

Claude


On Mon, 25 Oct 1999, Jim Terman wrote:

> 
> I have a question about using the altivec registers on a G4 Macintosh
> running linuxppc.  Will there be any conflicts.  I don't expect any
> kernal support, but can I be confident that the kernal will not touch
> any of the altivec registers.  Any info will be greatly appreciated.
> 
> -- 
> ______________________________________________________________________________
> Jim Terman       |   323 Vintage Park Dr.   |   Voice: (650) 356-5446
> terman@ddi.com   |   Foster City, CA        |   Fax:   (650) 356-5490
> Diab-SDS, Inc.   |   94404                  |   web site - http://www.ddi.com
> 
> 


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-25 22:27 ` Claude Robitaille
@ 1999-10-25 22:31   ` Jim Terman
  1999-10-25 22:44     ` erik cameron
                       ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Jim Terman @ 1999-10-25 22:31 UTC (permalink / raw)
  To: Claude Robitaille; +Cc: Jim Terman, linuxppc-dev


I understand how to get it working from the processor point of view.  I 
just want to make sure that the linuxppc kernal will not interfere if I 
compile a program that uses these registers.

The bottom line is that I am not looking for any help with the altivec
registers from the kernal.  I just want to be sure that there will not
be any inteference.  If not, great.  I'd just like that insurance.
Thanks for any clarification you can give me.

Claude Robitaille writes:
> There is a flag for that purpose. You should look into the Altivec
> environment on Motorola's Web site, or at www.altivec.org
> 
> Claude
> 
> 
> On Mon, 25 Oct 1999, Jim Terman wrote:
> 
> > 
> > I have a question about using the altivec registers on a G4 Macintosh
> > running linuxppc.  Will there be any conflicts.  I don't expect any
> > kernal support, but can I be confident that the kernal will not touch
> > any of the altivec registers.  Any info will be greatly appreciated.
> > 
> > -- 
> > ______________________________________________________________________________
> > Jim Terman       |   323 Vintage Park Dr.   |   Voice: (650) 356-5446
> > terman@ddi.com   |   Foster City, CA        |   Fax:   (650) 356-5490
> > Diab-SDS, Inc.   |   94404                  |   web site - http://www.ddi.com
> > 
> > 
> 
> 
-- 
______________________________________________________________________________
Jim Terman       |   323 Vintage Park Dr.   |   Voice: (650) 356-5446
terman@ddi.com   |   Foster City, CA        |   Fax:   (650) 356-5490
Diab-SDS, Inc.   |   94404                  |   web site - http://www.ddi.com

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-25 22:31   ` Jim Terman
@ 1999-10-25 22:44     ` erik cameron
  1999-10-25 23:28     ` Claude Robitaille
       [not found]     ` <Pine.LNX.4.10.9910251916060.5902-100000@modemcable220.93-200-24.mtl.mc.vi deotron.net>
  2 siblings, 0 replies; 23+ messages in thread
From: erik cameron @ 1999-10-25 22:44 UTC (permalink / raw)
  To: Jim Terman; +Cc: Claude Robitaille, linuxppc-dev



perhaps this is a stupid question, but aren't the values of the altivec
registers just saved as part of the process' hardware context and swapped
in and out as part of a context switch?  in which case it wouldn't matter
if the kernel (or any other process) changed the values while your job
was on a sleeping.

On Mon, Oct 25, 1999 at 03:31:48PM -0700, Jim Terman wrote:
> 
> I understand how to get it working from the processor point of view.  I 
> just want to make sure that the linuxppc kernal will not interfere if I 
> compile a program that uses these registers.
> 
> The bottom line is that I am not looking for any help with the altivec
> registers from the kernal.  I just want to be sure that there will not
> be any inteference.  If not, great.  I'd just like that insurance.
> Thanks for any clarification you can give me.
> 
> Claude Robitaille writes:
> > There is a flag for that purpose. You should look into the Altivec
> > environment on Motorola's Web site, or at www.altivec.org
> > 
> > Claude
> > 
> > 
> > On Mon, 25 Oct 1999, Jim Terman wrote:
> > 
> > > 
> > > I have a question about using the altivec registers on a G4 Macintosh
> > > running linuxppc.  Will there be any conflicts.  I don't expect any
> > > kernal support, but can I be confident that the kernal will not touch
> > > any of the altivec registers.  Any info will be greatly appreciated.
> > > 
> > > -- 
> > > ______________________________________________________________________________
> > > Jim Terman       |   323 Vintage Park Dr.   |   Voice: (650) 356-5446
> > > terman@ddi.com   |   Foster City, CA        |   Fax:   (650) 356-5490
> > > Diab-SDS, Inc.   |   94404                  |   web site - http://www.ddi.com
> > > 
> > > 
> > 
> > 
> -- 
> ______________________________________________________________________________
> Jim Terman       |   323 Vintage Park Dr.   |   Voice: (650) 356-5446
> terman@ddi.com   |   Foster City, CA        |   Fax:   (650) 356-5490
> Diab-SDS, Inc.   |   94404                  |   web site - http://www.ddi.com

-- 
erik cameron  unix systems administrator
jfi/mrsec @ the university of chicago
e-cameron@uchicago.edu

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-25 22:31   ` Jim Terman
  1999-10-25 22:44     ` erik cameron
@ 1999-10-25 23:28     ` Claude Robitaille
       [not found]     ` <Pine.LNX.4.10.9910251916060.5902-100000@modemcable220.93-200-24.mtl.mc.vi deotron.net>
  2 siblings, 0 replies; 23+ messages in thread
From: Claude Robitaille @ 1999-10-25 23:28 UTC (permalink / raw)
  To: Jim Terman; +Cc: linuxppc-dev


The flag is to actually tell the kernel that your application is using
the Altivec registers, so that it can save time by not saving and
restoring them when the interrupted or swapped application is not using
them. It is supposed to be dynamic so only routines actually using Altivec
should set (and clear) it. I am not sure of the details so look into the
manual for the hardware support. I think the kernel should enforce the
use of this flag since moving the full Altivec register set is time
consuming (16 bytes X 32 registers = 1/2 KB).

Claude

On Mon, 25 Oct 1999, Jim Terman wrote:

> I understand how to get it working from the processor point of view.  I 
> just want to make sure that the linuxppc kernal will not interfere if I 
> compile a program that uses these registers.
> 
> The bottom line is that I am not looking for any help with the altivec
> registers from the kernal.  I just want to be sure that there will not
> be any inteference.  If not, great.  I'd just like that insurance.
> Thanks for any clarification you can give me.
> 
> Claude Robitaille writes:
> > There is a flag for that purpose. You should look into the Altivec
> > environment on Motorola's Web site, or at www.altivec.org
> > 
> > Claude
> > 
> > 
> > On Mon, 25 Oct 1999, Jim Terman wrote:
> > 
> > > 
> > > I have a question about using the altivec registers on a G4 Macintosh
> > > running linuxppc.  Will there be any conflicts.  I don't expect any
l
> > > kernal support, but can I be confident that the kernal will not
touch
> > > any of the altivec registers.  Any info will be greatly appreciated.
> > > 
> > > -- 
> > > ______________________________________________________________________________
> > > Jim Terman       |   323 Vintage Park Dr.   |   Voice: (650) 356-5446
> > > terman@ddi.com   |   Foster City, CA        |   Fax:   (650) 356-5490
> > > Diab-SDS, Inc.   |   94404                  |   web site - http://www.ddi.com
> > > 
> > > 
> > 
> > 
> -- 
> ______________________________________________________________________________
> Jim Terman       |   323 Vintage Park Dr.   |   Voice: (650) 356-5446
> terman@ddi.com   |   Foster City, CA        |   Fax:   (650) 356-5490
> Diab-SDS, Inc.   |   94404                  |   web site - http://www.ddi.com
> 
> 


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
       [not found]     ` <Pine.LNX.4.10.9910251916060.5902-100000@modemcable220.93-200-24.mtl.mc.vi deotron.net>
@ 1999-10-25 23:53       ` Rob Barris
  1999-10-26 18:22         ` Geert Uytterhoeven
  1999-10-26 22:03         ` Tom Vier
  0 siblings, 2 replies; 23+ messages in thread
From: Rob Barris @ 1999-10-25 23:53 UTC (permalink / raw)
  To: linuxppc-dev


>The flag is to actually tell the kernel that your application is using
>the Altivec registers, so that it can save time by not saving and
>restoring them when the interrupted or swapped application is not using
>them. It is supposed to be dynamic so only routines actually using Altivec
>should set (and clear) it. I am not sure of the details so look into the
>manual for the hardware support. I think the kernel should enforce the
>use of this flag since moving the full Altivec register set is time
>consuming (16 bytes X 32 registers = 1/2 KB).

   I worked this out once, the extra 512 bytes of register context,
multiplied by (say) a thousand context switches per second only add up to
about a MB of memory traffic per second - a fraction of a percent of the
available memory bandwidth in a G4 machine.  Most of that will sit in cache
anyway depending on the working set size of the processes involved.

   Wiggling the mouse probably causes more memory traffic than that (code
fetches and ISR handling).

   And this is with a very high hypothetical context switch rate which I
suspect may never be seen in real life use.

--
Rob Barris       Quicksilver Software Inc.      rbarris@quicksilver.com


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-25 20:51 question about altivec registers Jim Terman
  1999-10-25 22:27 ` Claude Robitaille
@ 1999-10-26  4:42 ` Kumar Gala
  1999-10-26 21:52   ` Jim Terman
  1 sibling, 1 reply; 23+ messages in thread
From: Kumar Gala @ 1999-10-26  4:42 UTC (permalink / raw)
  To: linuxppc-dev


The linux kernel as is will not effect the AltiVec registers in any way.
However, there is a minor change to the kernel that will be required.  You
will need to enable the MSR VEC bit (bit 6 in the MSR) to tell the
processor that the AltiVec Unit is available (this is similar to the MSR
FP bit).  If the bit is not set the processor will generate an AltIVec
Unavailable exception which will be trapped (incorrectly) as an unknown
0xf00 exception

the 0xf00 exception is for the performance monitors
and 0xf20 in the AltiVec unavailable exception.

All if these details are documented in the AltiVec Programming Environ
Manual (available from the Motorola Website).  

If you need any help getting a simple kernel up and running for running
single altiVec enabled processes let me know.

- kumar gala


ignorance is bliss.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-25 23:53       ` Rob Barris
@ 1999-10-26 18:22         ` Geert Uytterhoeven
  1999-10-26 22:13           ` Rob Barris
  1999-10-26 22:38           ` Tom Vier
  1999-10-26 22:03         ` Tom Vier
  1 sibling, 2 replies; 23+ messages in thread
From: Geert Uytterhoeven @ 1999-10-26 18:22 UTC (permalink / raw)
  To: Rob Barris; +Cc: linuxppc-dev


On Mon, 25 Oct 1999, Rob Barris wrote:
> >The flag is to actually tell the kernel that your application is using
> >the Altivec registers, so that it can save time by not saving and
> >restoring them when the interrupted or swapped application is not using
> >them. It is supposed to be dynamic so only routines actually using Altivec
> >should set (and clear) it. I am not sure of the details so look into the
> >manual for the hardware support. I think the kernel should enforce the
> >use of this flag since moving the full Altivec register set is time
> >consuming (16 bytes X 32 registers = 1/2 KB).
> 
>    I worked this out once, the extra 512 bytes of register context,
> multiplied by (say) a thousand context switches per second only add up to
> about a MB of memory traffic per second - a fraction of a percent of the
> available memory bandwidth in a G4 machine.  Most of that will sit in cache
> anyway depending on the working set size of the processes involved.

Moving around blocks of 512 bytes quickly thrashes the L1 cache, unless the
loads/stores are done using cache-bypassing instructions (cfr. MOVE16 on '040).
Don't know whether PPC has these (still no PPC guru :-(

Gr{oetje,eeting}s,
--
Geert Uytterhoeven -- Linux/{m68k~Amiga,PPC~CHRP} -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-26  4:42 ` Kumar Gala
@ 1999-10-26 21:52   ` Jim Terman
  1999-10-26 22:43     ` Kumar Gala
  0 siblings, 1 reply; 23+ messages in thread
From: Jim Terman @ 1999-10-26 21:52 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev


Will the linuxppc kernal as it is right now save the AltiVec registers
if we enable the MSR VEC bit.  I've been trying to follow the other messages
on this subject, but I'm not clear.

Kumar Gala writes:
> 
> The linux kernel as is will not effect the AltiVec registers in any way.
> However, there is a minor change to the kernel that will be required.  You
> will need to enable the MSR VEC bit (bit 6 in the MSR) to tell the
> processor that the AltiVec Unit is available (this is similar to the MSR
> FP bit).  If the bit is not set the processor will generate an AltIVec
> Unavailable exception which will be trapped (incorrectly) as an unknown
> 0xf00 exception
> 
> the 0xf00 exception is for the performance monitors
> and 0xf20 in the AltiVec unavailable exception.
> 
> All if these details are documented in the AltiVec Programming Environ
> Manual (available from the Motorola Website).  
> 
> If you need any help getting a simple kernel up and running for running
> single altiVec enabled processes let me know.
> 
> - kumar gala
> 
> 
> ignorance is bliss.
> 
> 
> 
-- 
______________________________________________________________________________
Jim Terman       |   323 Vintage Park Dr.   |   Voice: (650) 356-5446
terman@ddi.com   |   Foster City, CA        |   Fax:   (650) 356-5490
Diab-SDS, Inc.   |   94404                  |   web site - http://www.ddi.com

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-25 23:53       ` Rob Barris
  1999-10-26 18:22         ` Geert Uytterhoeven
@ 1999-10-26 22:03         ` Tom Vier
  1 sibling, 0 replies; 23+ messages in thread
From: Tom Vier @ 1999-10-26 22:03 UTC (permalink / raw)
  To: Rob Barris; +Cc: linuxppc-dev


On Mon, Oct 25, 1999 at 04:53:52PM -0700, Rob Barris wrote:

>    I worked this out once, the extra 512 bytes of register context,
> multiplied by (say) a thousand context switches per second only add up to
> about a MB of memory traffic per second - a fraction of a percent of the
> available memory bandwidth in a G4 machine.  Most of that will sit in cac=
he
> anyway depending on the working set size of the processes involved.

couldn't you just do lazy context saves? ie, disable the vector ops by 
default; when a proc tries to use a vector op catch the exception, mark the
proc as vector using and enable vectors. when a context switch occurs,
mark the proc as vector enable, disable vectors, continue (and re-enable
when you switch the proc's context back in). it's a little more complicated
when more than one proc wants vectors. in that case, before you re-enable
vectors, check to see if the vector regs need their context switched.

or maybe that complexity isn't worth bandwidth/latency it saves.
does linux/ppc do lazy FPU context saves this way?

if you don't do lazy vector saves, i would think it would raise context
switch times a sizable amount. there's four times as much data in those
128bit regs as there are in the 32bit GPRs.

-- 
Tom Vier - 0x27371A1C
thomassr@erols.com
http://users.erols.com/thomassr/zero/

DSA Key fingerprint:
42D4 82D6 6DF5 77EC 1251  30D2 D9E7 E858 2737 1A2C

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-26 18:22         ` Geert Uytterhoeven
@ 1999-10-26 22:13           ` Rob Barris
  1999-10-26 22:38           ` Tom Vier
  1 sibling, 0 replies; 23+ messages in thread
From: Rob Barris @ 1999-10-26 22:13 UTC (permalink / raw)
  To: linuxppc-dev


>On Mon, 25 Oct 1999, Rob Barris wrote:
>> >The flag is to actually tell the kernel that your application is using
>> >the Altivec registers, so that it can save time by not saving and
>> >restoring them when the interrupted or swapped application is not using
>> >them. It is supposed to be dynamic so only routines actually using Altivec
>> >should set (and clear) it. I am not sure of the details so look into the
>> >manual for the hardware support. I think the kernel should enforce the
>> >use of this flag since moving the full Altivec register set is time
>> >consuming (16 bytes X 32 registers = 1/2 KB).
>>
>>    I worked this out once, the extra 512 bytes of register context,
>> multiplied by (say) a thousand context switches per second only add up to
>> about a MB of memory traffic per second - a fraction of a percent of the
>> available memory bandwidth in a G4 machine.  Most of that will sit in cache
>> anyway depending on the working set size of the processes involved.
>
>Moving around blocks of 512 bytes quickly thrashes the L1 cache, unless the
>loads/stores are done using cache-bypassing instructions (cfr. MOVE16 on
>'040).
>Don't know whether PPC has these (still no PPC guru :-(


   Well, doing anything useful will cause traffic in and out of L1. That's
just a fact of life.  "Thrash" is a strong word, considering we're talking
about 512 bytes of data, that's 512/32768 == 1/64 of the typical PPC 750
data cache.

   Further, the PPC register state was already large (at least 384 bytes
for all 32 int and 32 fp regs) - no one seemed to be noticing context
switch time as a problem before, this further supports my assertion. 2.5
times "tiny" is still "tiny".

   Now, copying a 16K or 32K block from point A to point B will indeed
cause a complete cache replacement.  But that's not what's going on here.
In fact, for a few processes being switched between rapidly, it may well be
the case that those register state blocks may park in the L1 or L2 and not
go back out to main memory at all.   But my estimate was based on a worst
case again, and assuming that anything leaving L1 has to go to RAM and not
the L2.

   The point I was trying to make is that even in a hypothetical worst case
scenario, the added traffic is modest and possibly below the threshold of
noticeability.

   Moving things in and out of L1 is not bad in itself. The net impact is
what matters, that's what my calculation was trying to show.  If for
example, memory was infinitely fast, traffic to and from L1 would have no
impact.  OK so that's not true, the question then becomes "so how much time
does in fact get spent servicing that traffic, given real memory speeds".
At a hypothetical switch rate of 1KHz (extremely high) the overhead is
still quite small.

--
Rob Barris       Quicksilver Software Inc.      rbarris@quicksilver.com


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-26 18:22         ` Geert Uytterhoeven
  1999-10-26 22:13           ` Rob Barris
@ 1999-10-26 22:38           ` Tom Vier
  1 sibling, 0 replies; 23+ messages in thread
From: Tom Vier @ 1999-10-26 22:38 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: linuxppc-dev


On Tue, Oct 26, 1999 at 08:22:06PM +0200, Geert Uytterhoeven wrote:

> Moving around blocks of 512 bytes quickly thrashes the L1 cache, unless the
> loads/stores are done using cache-bypassing instructions (cfr. MOVE16 on '040).
> Don't know whether PPC has these (still no PPC guru :-(

from what i've read, you can disable cache for the altivec regs. this was
intended for doing infrequent vector ops between frequent vectors ops
(loops) without distrubing the cache.

-- 
Tom Vier - 0x27371A1C
thomassr@erols.com
http://users.erols.com/thomassr/zero/

DSA Key fingerprint:
42D4 82D6 6DF5 77EC 1251  30D2 D9E7 E858 2737 1A2C

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-26 21:52   ` Jim Terman
@ 1999-10-26 22:43     ` Kumar Gala
  1999-10-27  8:58       ` Adrian Cox
  0 siblings, 1 reply; 23+ messages in thread
From: Kumar Gala @ 1999-10-26 22:43 UTC (permalink / raw)
  To: Jim Terman; +Cc: linuxppc-dev


>
> Will the linuxppc kernal as it is right now save the AltiVec registers
> if we enable the MSR VEC bit.  I've been trying to follow the other messages
> on this subject, but I'm not clear.
> 

No, the kernel does not know anything about the AltiVec registers,
unlike the x86 platform on interrupts only two registers and saved and
restored typically.  SRR0 and SRR1, one contains the MSR settings and the
other the PC to return two.  On an rfi values are copied out of these
registers.

The AltiVec registers have to be saved and restore explicitly, if you look
at /arch/ppc/kernel/head.S and look for load_up_fp you will see how the
floating point unit is handled on exceptions.  Essential what is done is
there are some checks done, and a pointer is kept to the last
process using the FP unit (last_task_used_fp) which then if needed the FP
regs are saved in to that processes context and the FPs for the incoming
are restored.

- kumar



ignorance is bliss.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-26 22:43     ` Kumar Gala
@ 1999-10-27  8:58       ` Adrian Cox
  1999-10-27 13:21         ` Gabriel Paubert
  0 siblings, 1 reply; 23+ messages in thread
From: Adrian Cox @ 1999-10-27  8:58 UTC (permalink / raw)
  To: Kumar Gala; +Cc: Jim Terman, linuxppc-dev


Kumar Gala wrote:

> The AltiVec registers have to be saved and restore explicitly, if you look
> at /arch/ppc/kernel/head.S and look for load_up_fp you will see how the
> floating point unit is handled on exceptions.  Essential what is done is
> there are some checks done, and a pointer is kept to the last
> process using the FP unit (last_task_used_fp) which then if needed the FP
> regs are saved in to that processes context and the FPs for the incoming
> are restored.

Linux on PowerPC should end up doing a classic lazy save/restore for the
vector context, as it already does for the floating point registers. On
SMP systems this simple approach isn't possible, but a quick
approximation is to detect the first time a process uses Altivec, and
marking it to always save and restore vector context from then on.

I'd recommend that compiler writers use the vrsave register to mark
which vector registers they use, as a precaution against future kernels
which may look at this. Note that the G4 is extremely fast at linear
sequences of cacheable stores (store miss merging), and it is probably
cheaper for the kernel to ignore vrsave and avoid branches in the save
and restore sequence.  Of course, it is correct to simply set every bit
in vrsave at the start of your application, and never change it again.
It may be non-optimal on future systems, but it should remain correct.

As for the cache thrashing effect, remember that 512 bytes going in and
out of the L2 cache is not very expensive, and that there is probably 1
or 2MB of L2 fitted.

- Adrian Cox, AG Electronics

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-27  8:58       ` Adrian Cox
@ 1999-10-27 13:21         ` Gabriel Paubert
  1999-10-27 16:05           ` Geert Uytterhoeven
                             ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Gabriel Paubert @ 1999-10-27 13:21 UTC (permalink / raw)
  To: Adrian Cox; +Cc: Kumar Gala, Jim Terman, linuxppc-dev




On Wed, 27 Oct 1999, Adrian Cox wrote:

> Linux on PowerPC should end up doing a classic lazy save/restore for the
> vector context, as it already does for the floating point registers. On
> SMP systems this simple approach isn't possible, but a quick
> approximation is to detect the first time a process uses Altivec, and
> marking it to always save and restore vector context from then on.

Agreed.

> I'd recommend that compiler writers use the vrsave register to mark
> which vector registers they use, as a precaution against future kernels
> which may look at this. Note that the G4 is extremely fast at linear
> sequences of cacheable stores (store miss merging), and it is probably
> cheaper for the kernel to ignore vrsave and avoid branches in the save
> and restore sequence.  Of course, it is correct to simply set every bit
> in vrsave at the start of your application, and never change it again.
> It may be non-optimal on future systems, but it should remain correct.

Don't forget nevertheless a worthwhile optimization: that VRSAVE=0 means
that the program has no active Altivec registers at the time so that the
save can be skipped altogether (except for vrsave and the control/status
register).

And why would you want to use a bitmap ? This seems braindead to me, put a
value between 0 and 32 in vrsave. Since all registers are identical
in use and purpose, save registers 0 to n. Disclaimer: I've not seen if
the ABI specifies how and which Altivec registers are saved restored
across calls. 

Paranoid point of view: the restore must reload all altivec registers
(or clear the ones which are not specified as used by VRSAVE), otherwise
you might leak the contents of the Altivec registers of another process.
I'm not a security expert, but I don't like this possibility at all. 

Code bloat concerns: actually to save or restore a single altivec
register, you need 2 instructions given the available addressing modes:  
this makes 512 bytes of code for 32 register save + 32 register restore
(there are ways to slightly reduce it but there is also the overhead of
setting up several integer registers, saving vrsave and the control/status
register...). Count 12 bytes/register if you use a bit in vrsave to check
every register. But the branches are not that expensive if the cr bits are
set enough in advance: assuming vrsave has been copied to r0:

	cmpwi	r0,0
	bne-	done
	mtcrf	0x1,r0
	la 	r3,vregsavearea+448
	li	r4,16
	li	r5,32
	li	r6,48
	bf	31,30f
	stvx	v31,r6,r3
30:	mtcrf	0x2,r0
	bf	30,29f
	stvx	v30,r5,r3
29:	srwi	r0,r0,8
	bf	29,28f
	stvx	v29,r4,r3
28:	bf	28,27f
	stvx	v28,0,r3
27:	addi	r3,r3,-64	
	bf	27,26f
	stvx	v27,r6,r3
26:	mtcrf	0x1,r0		
	bf	26,25f
	stvx	v26,r5,r3
25:	bf	25,24f
	stvx	v25,r4,r3
24:	bf	24,23f
	stvx	v24,0,r3
23:	addi	r3,r3,-64
	bf	31,22f
	stvx	v23,r6,r3
22:	mtcrf	0x2,r0		# Cycle since 30: repeats here
	bf	30,21f
	stvx	v22,r5,r3
21:	srwi	r0,r0,8
	bf	29,20f
	...
0:	bf	24,done
	stvx	v0,0,r3
done:	  			# now save the control/status register...
	
in this code the bits to test are always set or moe 3 branches ahead of
the test by interleaving 2 cr fields set up by mtcrf according to vrsave
bits.  But the code is significantly larger than using a count and
branching at the right place in the save routine.

> As for the cache thrashing effect, remember that 512 bytes going in and
> out of the L2 cache is not very expensive, and that there is probably 1
> or 2MB of L2 fitted.

My feeling is that it is unlikely that the code is in the L1 cache, this
code is not a tight loop which is executed 1000 times in a row, and it is
probably saturating L2 cache bandwidth. If you need 8 bytes of code and 16
bytes of data for each register save/load on average, it's 3 L2 data beats
or 6 clocks in the most common scenario (L2 at 1/2 core frequency).

	Gabriel.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-27 13:21         ` Gabriel Paubert
@ 1999-10-27 16:05           ` Geert Uytterhoeven
  1999-10-27 18:23           ` Kumar Gala
  1999-10-27 22:39           ` Tony Mantler
  2 siblings, 0 replies; 23+ messages in thread
From: Geert Uytterhoeven @ 1999-10-27 16:05 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Adrian Cox, Kumar Gala, Jim Terman, linuxppc-dev


On Wed, 27 Oct 1999, Gabriel Paubert wrote:
> On Wed, 27 Oct 1999, Adrian Cox wrote:
> > As for the cache thrashing effect, remember that 512 bytes going in and
> > out of the L2 cache is not very expensive, and that there is probably 1
> > or 2MB of L2 fitted.
> 
> My feeling is that it is unlikely that the code is in the L1 cache, this
> code is not a tight loop which is executed 1000 times in a row, and it is
> probably saturating L2 cache bandwidth. If you need 8 bytes of code and 16
> bytes of data for each register save/load on average, it's 3 L2 data beats
> or 6 clocks in the most common scenario (L2 at 1/2 core frequency).

I wasn't primarily concerned about code taking space in the L1 cache, but the
saved Altived registers pushing out valuable data of the L1 cache on each save.

Gr{oetje,eeting}s,
--
Geert Uytterhoeven -- Linux/{m68k~Amiga,PPC~CHRP} -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-27 13:21         ` Gabriel Paubert
  1999-10-27 16:05           ` Geert Uytterhoeven
@ 1999-10-27 18:23           ` Kumar Gala
  1999-10-27 22:39           ` Tony Mantler
  2 siblings, 0 replies; 23+ messages in thread
From: Kumar Gala @ 1999-10-27 18:23 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Adrian Cox, Jim Terman, linuxppc-dev


The main purpose of VRSAVE was for apple.  The allocate registers in order
which makes it efficient to use it to decide which AltiVec registers to
save and restore since you only need to determine up to which register to
save.

The ABI for System V, allocates registers similar to GPRs, in that
vr3,vr4,vr5 are used to pass arguments and so on.  These is documented in
the Motorola docs on (AltiVec Programmers Environments Manual on there
website).  The problem then becomes that due to the ABI it is more costly
to determine which vector registers where used and which were not if
VRSAVE is used as a bitmap.  

Also, the current ABI does not state the use of VRSAVE at all and therefor
the non-Apple compilers to not even include it in code generated that uses
AltiVec.

Also, based on how FP is implemented, if the process currently running
does not use AltiVec in its current time slice the registers are not saved
and restored.

As for the cache issues with saving and restoring all the VR registers, I
am looking into this further.

 - kumar


ignorance is bliss.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-27 13:21         ` Gabriel Paubert
  1999-10-27 16:05           ` Geert Uytterhoeven
  1999-10-27 18:23           ` Kumar Gala
@ 1999-10-27 22:39           ` Tony Mantler
  1999-10-28 11:01             ` Gabriel Paubert
  2 siblings, 1 reply; 23+ messages in thread
From: Tony Mantler @ 1999-10-27 22:39 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev


At 8:21 AM -0500 10/27/99, Gabriel Paubert wrote:
[...]
>And why would you want to use a bitmap ? This seems braindead to me, put a
>value between 0 and 32 in vrsave. Since all registers are identical
>in use and purpose, save registers 0 to n. Disclaimer: I've not seen if
>the ABI specifies how and which Altivec registers are saved restored
>across calls.

It would seem to me that using a bitmap to mark used registers would allow
more flexibility on the compiler side to play with register usage without
incurring longer context switch times. I wouldn't try to guess the relative
tradeoff values though.


>Paranoid point of view: the restore must reload all altivec registers
>(or clear the ones which are not specified as used by VRSAVE), otherwise
>you might leak the contents of the Altivec registers of another process.
>I'm not a security expert, but I don't like this possibility at all.
[...]

Beyond security, clearning the registers would also serve to enforce strict
usage of whatever is defined as the VRSAVE format, and avoid the
possibility of a mouth-breathing code-typist releasing a binary that
doesn't mark it's registers, which in theory would only break once a
different application touches the altivec registers, resulting in a
situation of either A: the kernel being forced to save all altivec
registers (bad) or B: allowing those binaries to be broken and upsetting
it's users (slightly less bad). Obviously pre-breaking those binaries would
be the preferable solution, so they never need see the light of day.


That's my 2c.

Cheers - Tony :)


--
Tony Mantler         Renaissance Nerd Extraordinaire         eek@escape.ca
Winnipeg, Manitoba, Canada                       http://www.escape.ca/~eek


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-27 22:39           ` Tony Mantler
@ 1999-10-28 11:01             ` Gabriel Paubert
  1999-10-28 21:20               ` Tony Mantler
  0 siblings, 1 reply; 23+ messages in thread
From: Gabriel Paubert @ 1999-10-28 11:01 UTC (permalink / raw)
  To: Tony Mantler; +Cc: linuxppc-dev




On Wed, 27 Oct 1999, Tony Mantler wrote:

> At 8:21 AM -0500 10/27/99, Gabriel Paubert wrote:
> [...]
> >And why would you want to use a bitmap ? This seems braindead to me, put a
> >value between 0 and 32 in vrsave. Since all registers are identical
> >in use and purpose, save registers 0 to n. Disclaimer: I've not seen if
> >the ABI specifies how and which Altivec registers are saved restored
> >across calls.
> 
> It would seem to me that using a bitmap to mark used registers would allow
> more flexibility on the compiler side to play with register usage without
> incurring longer context switch times. I wouldn't try to guess the relative
> tradeoff values though.

Since all 32 registers have exactly the same capabilities, using only a
consecutive set of registers is not a problem for the compiler. That's how
it works for integer registers and FP registers 14 to 31 (for integer
actually register 0 is special). Taking into account the comments on the
ABI, the only strategy which seems implementable is to distinguish 2 cases
only vrsave=0 and vrsave!=0. Using a bitmap means a lot of conditional
branches, which can obviously be folded but nevertheless bloat the code.
(For save+restore: 64 conditional branches + 16 mtcrf= 320 bytes). 

There is also a problem of knowing how to update the bitmap in nested
subroutines: to keep it correct the called subroutine must save the
current vrsave and then or it with the bitmask of the registers it uses.
Then on exit it has to restore caller's vrsave. Do we want such a complex
strategy ? I don't mean that it is impossible to implement, but that it
looks complex. 

OTOH, if the register usage is designed similarly to integer and FP, the
bitmask might look like 111...1100...0011...11 (i.e. with at most 2
transitions between 0 and 1 in the bit string). It might be worth
optimizing the save/restore routine for this case, saving/restoring more
registers than necessary when vrsave does not have such a canonical form.


> 
> 
> >Paranoid point of view: the restore must reload all altivec registers
> >(or clear the ones which are not specified as used by VRSAVE), otherwise
> >you might leak the contents of the Altivec registers of another process.
> >I'm not a security expert, but I don't like this possibility at all.
> [...]
> 
> Beyond security, clearning the registers would also serve to enforce strict
> usage of whatever is defined as the VRSAVE format, and avoid the
> possibility of a mouth-breathing code-typist releasing a binary that
> doesn't mark it's registers, which in theory would only break once a
> different application touches the altivec registers, resulting in a
> situation of either A: the kernel being forced to save all altivec
> registers (bad) or B: allowing those binaries to be broken and upsetting
> it's users (slightly less bad). Obviously pre-breaking those binaries would
> be the preferable solution, so they never need see the light of day.

Indeed, I had not considered this problem. Note that conditional clearing
of most registers can probably be done without conditional branches. Just
put a copy of vrsave in one vr and then find a smart way to transform
these bits in masks to clear the registers (probably you'll have to splat
it first). It won't work for the the last register(s) because you need
some workspace, however. 

Anyway, having a special fast path for the case vrsave=0 is probably the
most important optimization IMHO. 

	Gabriel.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-28 11:01             ` Gabriel Paubert
@ 1999-10-28 21:20               ` Tony Mantler
  1999-10-29 11:58                 ` Benjamin Herrenschmidt
  1999-10-29 12:49                 ` Gabriel Paubert
  0 siblings, 2 replies; 23+ messages in thread
From: Tony Mantler @ 1999-10-28 21:20 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev


At 6:01 AM -0500 10/28/99, Gabriel Paubert wrote:
>On Wed, 27 Oct 1999, Tony Mantler wrote:
>
[...]
>> It would seem to me that using a bitmap to mark used registers would allow
>> more flexibility on the compiler side to play with register usage without
>> incurring longer context switch times. I wouldn't try to guess the relative
>> tradeoff values though.
>
>Since all 32 registers have exactly the same capabilities, using only a
>consecutive set of registers is not a problem for the compiler. That's how
>it works for integer registers and FP registers 14 to 31 (for integer
>actually register 0 is special).

I suppose I'm a bit too used to 68k stuff, where sorting register usage
takes a back seat to efficient register re-use. However, with the size of
the data in the Altivec registers, I would expect a bit of optimization to
slant away from cases where the registers can be easily sorted and packed.


>Taking into account the comments on the
>ABI, the only strategy which seems implementable is to distinguish 2 cases
>only vrsave=0 and vrsave!=0. Using a bitmap means a lot of conditional
>branches, which can obviously be folded but nevertheless bloat the code.
>(For save+restore: 64 conditional branches + 16 mtcrf= 320 bytes).

Quite true.


>There is also a problem of knowing how to update the bitmap in nested
>subroutines: to keep it correct the called subroutine must save the
>current vrsave and then or it with the bitmask of the registers it uses.
>Then on exit it has to restore caller's vrsave. Do we want such a complex
>strategy ? I don't mean that it is impossible to implement, but that it
>looks complex.

I think saving registers in a subroutine is a pain no matter how it's
implemented. If the VRSAVE is used as a count, the subroutine still has to
save the old value, save the overwritten registers, calculate what the
proper new value is (think new < old = oops!) then restore the overwritten
registers and old VRSAVE value when it exits.


>OTOH, if the register usage is designed similarly to integer and FP, the
>bitmask might look like 111...1100...0011...11 (i.e. with at most 2
>transitions between 0 and 1 in the bit string). It might be worth
>optimizing the save/restore routine for this case, saving/restoring more
>registers than necessary when vrsave does not have such a canonical form.

Hmm, count bits in from the left and right, mask and check for missed bits,
then branch to either a full save or a left+right save.

Doing it that way would also somewhat optimize VRSAVE=0, since both the
leftmost and rightmost bits are 0, it would pass right through the
left-save and right-save half of the optimized register save.

Perhaps a little longer than "if (VRSAVE==0) return;", but it's quick
enough for me.


[.. enforce proper VRSAVE usage ..]
>
>Indeed, I had not considered this problem. Note that conditional clearing
>of most registers can probably be done without conditional branches. Just
>put a copy of vrsave in one vr and then find a smart way to transform
>these bits in masks to clear the registers (probably you'll have to splat
>it first). It won't work for the the last register(s) because you need
>some workspace, however.

Mmm, the joys of a bitwise AND.


>Anyway, having a special fast path for the case vrsave=0 is probably the
>most important optimization IMHO.

Indeed.


--
Tony Mantler         Renaissance Nerd Extraordinaire         eek@escape.ca
Winnipeg, Manitoba, Canada                       http://www.escape.ca/~eek


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-28 21:20               ` Tony Mantler
@ 1999-10-29 11:58                 ` Benjamin Herrenschmidt
  1999-10-29 12:49                 ` Gabriel Paubert
  1 sibling, 0 replies; 23+ messages in thread
From: Benjamin Herrenschmidt @ 1999-10-29 11:58 UTC (permalink / raw)
  To: Tony Mantler, linuxppc-dev


On Thu, Oct 28, 1999, Tony Mantler <eek@escape.ca> wrote:

>I think saving registers in a subroutine is a pain no matter how it's
>implemented. If the VRSAVE is used as a count, the subroutine still has to
>save the old value, save the overwritten registers, calculate what the
>proper new value is (think new < old = oops!) then restore the overwritten
>registers and old VRSAVE value when it exits.

In their latest Mac compiler, Metroweks added a pragma for manually
optimising this when you know you'll use a bunch of those registers:

<<
 #pragma altivec_vrsave allon is now supported. It sets VRsave assuming
  that all altivec registers are in use, best used with
  "#pragma altivec_vrsave off" so only the parent routine updates the
  vrsave register.
>>


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-28 21:20               ` Tony Mantler
  1999-10-29 11:58                 ` Benjamin Herrenschmidt
@ 1999-10-29 12:49                 ` Gabriel Paubert
  1999-10-30  4:14                   ` Tony Mantler
  1 sibling, 1 reply; 23+ messages in thread
From: Gabriel Paubert @ 1999-10-29 12:49 UTC (permalink / raw)
  To: Tony Mantler; +Cc: linuxppc-dev




On Thu, 28 Oct 1999, Tony Mantler wrote:

> I suppose I'm a bit too used to 68k stuff, where sorting register usage
> takes a back seat to efficient register re-use. However, with the size of
> the data in the Altivec registers, I would expect a bit of optimization to
> slant away from cases where the registers can be easily sorted and packed.

Things are different when all registers are identical and instructions
have separate operands for inputs and the output. I've programmed 68k to
and it's often painful (Intel is worse, to be fair). 

> >There is also a problem of knowing how to update the bitmap in nested
> >subroutines: to keep it correct the called subroutine must save the
> >current vrsave and then or it with the bitmask of the registers it uses.
> >Then on exit it has to restore caller's vrsave. Do we want such a complex
> >strategy ? I don't mean that it is impossible to implement, but that it
> >looks complex.
> 
> I think saving registers in a subroutine is a pain no matter how it's
> implemented. If the VRSAVE is used as a count, the subroutine still has to
> save the old value, save the overwritten registers, calculate what the
> proper new value is (think new < old = oops!) then restore the overwritten
> registers and old VRSAVE value when it exits.

In the end a bitmap seems the best, since the code can be free of
conditionals and fairly compact:
- at start of routine (register numbers chosen randomly):
	mfspr	r12,vrsave 
	oris	r0,r12,0x....	# mask of used bits
	ori	r0,r0,0x....	# mask of used bits (only is using vr16-vr31)
	stw	r12,somewhere on the stack
	mtspr	vrsave,r0

and the end:
	lwz	r12,somewhere on the stack
	mtspr	vrsave,r12

> >OTOH, if the register usage is designed similarly to integer and FP, the
> >bitmask might look like 111...1100...0011...11 (i.e. with at most 2
> >transitions between 0 and 1 in the bit string). It might be worth
> >optimizing the save/restore routine for this case, saving/restoring more
> >registers than necessary when vrsave does not have such a canonical form.
> 
> Hmm, count bits in from the left and right, mask and check for missed bits,
> then branch to either a full save or a left+right save.

Yes, cntlzw on a vrsave copy (after a few simple manipulations) is your
friend. Besides this the ABI separates two ranges: R0 to R13 and R14
to R31 (I could be off by one). Optimize for the common case, find the
first set bit with index >=14 and last set bit with index <=13 and save
only these 2 ranges. Optimizing for more complex cases is not worth the
trouble, just ensure that they work properly. 

> Doing it that way would also somewhat optimize VRSAVE=0, since both the
> leftmost and rightmost bits are 0, it would pass right through the
> left-save and right-save half of the optimized register save.

I would also optimize speecifically for the vrsave=0, a compare and a
conditional branch are not that costly, especially if the branch is done
well after the branch, with all the bitmap manipulation in between: 

	mfspr	r3,vrsave
	cpmwi	cr1,r3,0
	andis.	r4,r3,0,0xfffc
	rlwinm	r5,r3,0,0x0003ffff
	neg	r6,r4
	cntlzw	r5,r5		# first register of r14..r31 to save
	and	r4,r4,r6
	cntlzw	r4,r4		# last register of r0..r13 to save
	beq	cr1,nothing_to_save	

It's not finished: you've to setup registers to addres the save area and
compute a branch inside the save routine to actually perform the save
(backwards for r0..r13, forwards for r14..r31). 

> Perhaps a little longer than "if (VRSAVE==0) return;", but it's quick
> enough for me.

Probably close to optimal, lazy enough without trying to be too smart
and executing tons of code as the result. Never forget that this code
is unlikely to be found in L1 icache. 

> >Indeed, I had not considered this problem. Note that conditional clearing
> >of most registers can probably be done without conditional branches. Just
> >put a copy of vrsave in one vr and then find a smart way to transform
> >these bits in masks to clear the registers (probably you'll have to splat
> >it first). It won't work for the the last register(s) because you need
> >some workspace, however.
> 
> Mmm, the joys of a bitwise AND.

Well, after having a moore detailed look at Altivec, I missed a shift
by immdieate amount in bits to make the code as compact as possible. There
are probably tricks to work around this, I might have started with the
wrong idea on the way to implement this...

	Gabriel.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: question about altivec registers
  1999-10-29 12:49                 ` Gabriel Paubert
@ 1999-10-30  4:14                   ` Tony Mantler
  0 siblings, 0 replies; 23+ messages in thread
From: Tony Mantler @ 1999-10-30  4:14 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev


At 7:49 AM -0500 10/29/99, Gabriel Paubert wrote:
>On Thu, 28 Oct 1999, Tony Mantler wrote:
>
>> I suppose I'm a bit too used to 68k stuff, where sorting register usage
>> takes a back seat to efficient register re-use. However, with the size of
>> the data in the Altivec registers, I would expect a bit of optimization to
>> slant away from cases where the registers can be easily sorted and packed.
>
>Things are different when all registers are identical and instructions
>have separate operands for inputs and the output. I've programmed 68k to
>and it's often painful (Intel is worse, to be fair).

I haven't found it too bad. It's rather sensibly designed for it's intended
applications, and considering that it was originally laid out way back in
the early 80's (iirc), it's stood the test of time rather well.


[...]
>> I think saving registers in a subroutine is a pain no matter how it's
>> implemented. If the VRSAVE is used as a count, the subroutine still has to
>> save the old value, save the overwritten registers, calculate what the
>> proper new value is (think new < old = oops!) then restore the overwritten
>> registers and old VRSAVE value when it exits.
>
>In the end a bitmap seems the best, since the code can be free of
>conditionals and fairly compact:
>- at start of routine (register numbers chosen randomly):
>	mfspr	r12,vrsave
>	oris	r0,r12,0x....	# mask of used bits
>	ori	r0,r0,0x....	# mask of used bits (only is using vr16-vr31)
>	stw	r12,somewhere on the stack
>	mtspr	vrsave,r0
>
>and the end:
>	lwz	r12,somewhere on the stack
>	mtspr	vrsave,r12

Looks clean enough to me.


[...]
>Yes, cntlzw on a vrsave copy (after a few simple manipulations) is your
>friend. Besides this the ABI separates two ranges: R0 to R13 and R14
>to R31 (I could be off by one). Optimize for the common case, find the
>first set bit with index >=14 and last set bit with index <=13 and save
>only these 2 ranges. Optimizing for more complex cases is not worth the
>trouble, just ensure that they work properly.

Indeed.


>> Doing it that way would also somewhat optimize VRSAVE=0, since both the
>> leftmost and rightmost bits are 0, it would pass right through the
>> left-save and right-save half of the optimized register save.
>
>I would also optimize speecifically for the vrsave=0, a compare and a
>conditional branch are not that costly, especially if the branch is done
>well after the branch, with all the bitmap manipulation in between:
>
>	mfspr	r3,vrsave
>	cpmwi	cr1,r3,0
>	andis.	r4,r3,0,0xfffc
>	rlwinm	r5,r3,0,0x0003ffff
>	neg	r6,r4
>	cntlzw	r5,r5		# first register of r14..r31 to save
>	and	r4,r4,r6
>	cntlzw	r4,r4		# last register of r0..r13 to save
>	beq	cr1,nothing_to_save
>
>It's not finished: you've to setup registers to addres the save area and
>compute a branch inside the save routine to actually perform the save
>(backwards for r0..r13, forwards for r14..r31).

Yeah, one extra branch certainly won't kill anyone.


[.. clearing unused registers ..]
>Well, after having a moore detailed look at Altivec, I missed a shift
>by immdieate amount in bits to make the code as compact as possible. There
>are probably tricks to work around this, I might have started with the
>wrong idea on the way to implement this...

Hmm, I just re-read the altivec spec sheet and, though I wouldn't call
myself an expert on PPC, it would seem that there's 3 ways to clear the
registers.

The first way would be to use a bunch of branch conditionals, which we
probably want to avoid.

The second way would be to calculate a 0 or -1 entirely within the vector
unit, which would both use a bunch of vector registers, and probably be
rather messy, as it's not really what the vector unit is designed for.

The third would be to calculate a 0 or -1 in the GPRs, then copy and splat
it into a vector register. Unfortunaltey it would appear that copying a
value from a GPR to a Vector register can only be done by writing the value
to memory, then reading it back in again, which isn't very pretty at all.


Oh well, time to watch Southpark, filmed in hella-cool ((( Spooooky-vision
))) ;)


--
Tony Mantler         Renaissance Nerd Extraordinaire         eek@escape.ca
Winnipeg, Manitoba, Canada                       http://www.escape.ca/~eek


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~1999-10-30  4:14 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
1999-10-25 20:51 question about altivec registers Jim Terman
1999-10-25 22:27 ` Claude Robitaille
1999-10-25 22:31   ` Jim Terman
1999-10-25 22:44     ` erik cameron
1999-10-25 23:28     ` Claude Robitaille
     [not found]     ` <Pine.LNX.4.10.9910251916060.5902-100000@modemcable220.93-200-24.mtl.mc.vi deotron.net>
1999-10-25 23:53       ` Rob Barris
1999-10-26 18:22         ` Geert Uytterhoeven
1999-10-26 22:13           ` Rob Barris
1999-10-26 22:38           ` Tom Vier
1999-10-26 22:03         ` Tom Vier
1999-10-26  4:42 ` Kumar Gala
1999-10-26 21:52   ` Jim Terman
1999-10-26 22:43     ` Kumar Gala
1999-10-27  8:58       ` Adrian Cox
1999-10-27 13:21         ` Gabriel Paubert
1999-10-27 16:05           ` Geert Uytterhoeven
1999-10-27 18:23           ` Kumar Gala
1999-10-27 22:39           ` Tony Mantler
1999-10-28 11:01             ` Gabriel Paubert
1999-10-28 21:20               ` Tony Mantler
1999-10-29 11:58                 ` Benjamin Herrenschmidt
1999-10-29 12:49                 ` Gabriel Paubert
1999-10-30  4:14                   ` Tony Mantler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).