linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* AltiVec in the kernel
@ 2006-07-18 12:48 Matt Sealey
  2006-07-18 13:53 ` Kumar Gala
  2006-07-18 17:43 ` Paul Mackerras
  0 siblings, 2 replies; 31+ messages in thread
From: Matt Sealey @ 2006-07-18 12:48 UTC (permalink / raw)
  To: linuxppc-dev


Once upon a time we were all told this wouldn't work for some reason,
but a lot of documentation now hints that it does actually work and
for instance there is a RAID5/6 driver (for G5) which uses AltiVec
in a kernel context.

But I didn't find any definitive documentation on how one goes about
it. The largest clue I found was in Documentation/cpu_features.txt:

	#ifdef CONFIG_ALTIVEC
	BEGIN_FTR_SECTION
		mfspr	r22,SPRN_VRSAVE		/* if G4, save vrsave register value */
		stw	r22,THREAD_VRSAVE(r23)
	END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
	#endif /* CONFIG_ALTIVEC */

So we can use AltiVec by implementing this kind of wrapper around
kernel functions which may use AltiVec?

In the code above is there ANY significance of r22 and r23 other
than that they are fairly high up and probably marked as "will
be trashed" by all the relevant ABIs and so?

Just curious, as I would like to investigate writing some docs at
least on this (in article fashion) to go with PPCZone, Libfreevec
and so on. I think there is a problem here in that simply developers
who may be interested in doing this kind of optimized code do not
know where to start (and we are thinking from a point of view of
also teaching sessions too, like we did at FTF Frankfurt 2004, so
after we teach them what AltiVec is etc. we demonstrate application
AND kernel functionality and the quirks associated with it).

-- 
Matt Sealey <matt@genesi-usa.com>
Manager, Genesi, Developer Relations

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-18 12:48 AltiVec in the kernel Matt Sealey
@ 2006-07-18 13:53 ` Kumar Gala
  2006-07-18 15:10   ` Matt Sealey
  2006-07-18 17:43 ` Paul Mackerras
  1 sibling, 1 reply; 31+ messages in thread
From: Kumar Gala @ 2006-07-18 13:53 UTC (permalink / raw)
  To: matt; +Cc: linuxppc-dev list, Paul Mackerras


On Jul 18, 2006, at 7:48 AM, Matt Sealey wrote:

>
> Once upon a time we were all told this wouldn't work for some reason,
> but a lot of documentation now hints that it does actually work and
> for instance there is a RAID5/6 driver (for G5) which uses AltiVec
> in a kernel context.

Using Altivec generally in the kernel is still something that is not  
recommended.  The key to using it is in disabling preemption, this  
ensures that when the code is done the Altivec register state is back  
to how the kernel found it.

	preempt_disable();
	enable_kernel_altivec();

	raid6_altivec$#_gen_syndrome_real(disks, bytes, ptrs);

	preempt_enable();

> But I didn't find any definitive documentation on how one goes about
> it. The largest clue I found was in Documentation/cpu_features.txt:
>
> 	#ifdef CONFIG_ALTIVEC
> 	BEGIN_FTR_SECTION
> 		mfspr	r22,SPRN_VRSAVE		/* if G4, save vrsave register value */
> 		stw	r22,THREAD_VRSAVE(r23)
> 	END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
> 	#endif /* CONFIG_ALTIVEC */
>
> So we can use AltiVec by implementing this kind of wrapper around
> kernel functions which may use AltiVec?
>
> In the code above is there ANY significance of r22 and r23 other
> than that they are fairly high up and probably marked as "will
> be trashed" by all the relevant ABIs and so?

I'd guess those were the registers used by the code this was snipped  
from.

> Just curious, as I would like to investigate writing some docs at
> least on this (in article fashion) to go with PPCZone, Libfreevec
> and so on. I think there is a problem here in that simply developers
> who may be interested in doing this kind of optimized code do not
> know where to start (and we are thinking from a point of view of
> also teaching sessions too, like we did at FTF Frankfurt 2004, so
> after we teach them what AltiVec is etc. we demonstrate application
> AND kernel functionality and the quirks associated with it).

I'm pretty sure Paul looked into using AltiVec for memory operations  
in the kernel and didn't see a significant benefit to it.

- kumar

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: AltiVec in the kernel
  2006-07-18 13:53 ` Kumar Gala
@ 2006-07-18 15:10   ` Matt Sealey
  2006-07-18 17:56     ` Paul Mackerras
  2006-07-18 18:39     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 31+ messages in thread
From: Matt Sealey @ 2006-07-18 15:10 UTC (permalink / raw)
  To: 'Kumar Gala'
  Cc: 'linuxppc-dev list', 'Paul Mackerras'

 

> -----Original Message-----
> From: Kumar Gala [mailto:galak@kernel.crashing.org] 
> Sent: Tuesday, July 18, 2006 8:53 AM
> To: matt@genesi-usa.com
> Cc: linuxppc-dev list; Paul Mackerras
> Subject: Re: AltiVec in the kernel
> 
> 
> On Jul 18, 2006, at 7:48 AM, Matt Sealey wrote:
> 
> > for instance there is a RAID5/6 driver (for G5) which uses 
> > AltiVec in a kernel context.
> 
> Using Altivec generally in the kernel is still something that 
> is not recommended.  The key to using it is in disabling 
> preemption, this ensures that when the code is done the 
> Altivec register state is back to how the kernel found it.
> 
> 	preempt_disable();
> 	enable_kernel_altivec();
> 
> 	raid6_altivec$#_gen_syndrome_real(disks, bytes, ptrs);
> 
> 	preempt_enable();

Why isn't it recommended?

For instance on FreeBSD and other operating systems they have
designed the functionality in there as it would be a feature
people would want to use. QNX uses AltiVec to perform the
context switch and message passing and keep latency down.

Restricting AltiVec to userspace code (applications..) really
means you are barely ever using it. Kernel functions and
drivers are called every second of every day.. it's about
making AltiVec really used and not having the unit sit twiddling
it's thumbs until you REALLY NEED TO DECODE A JPEG VERY FAST.

There are thousands of things it could be doing. One example
could be.. in-kernel compression and encryption subroutines.

> > teach them what AltiVec is etc. we demonstrate application 
> > AND kernel  functionality and the quirks associated with it).
> 
> I'm pretty sure Paul looked into using AltiVec for memory 
> operations in the kernel and didn't see a significant benefit to it.

We had our own guy look at it and he presented some significant
performance improvements. One problem was, though, that the best
improvement in theory came from a function which needed to be
called very early in kernel boot, well before AltiVec was
enabled, and everything else is marginal at best (1.n times
improvement, but it is still 0.n more than 1.0). I am not clear
on this and cannot find my discussion on the subject in my logs
and email backups, so. I will leave it for now. 

There is also plenty of example code (libmotovec, Freescale
Application Notes) which improve things like TCP checksumming
and so on using AltiVec. These patches are even used in EEMBC
benchmarks to boost the scores.

There is also plenty of examples of userspace code (as before,
checksumming, encryption, compression/decompression) which has
been improved. libfreevec includes some changes to the zlib
window functions. For example the kernel includes an MD5, SHA,
zlib compression framework.. mostly ported userspace code and
standard libraries. Would these not be candidates? The development
and speed improvements are even capable of being tested in
userspace (and this is a GREAT teaching aid also; show how to
improve some userspace app. Then show the differences it needed
to go into the kernel. Benchmark both. Detail result.)

I think there are thousands of places where AltiVec could be
used - even sparingly - to provide good performance improvements.
>From your reply I suspect that these would be places which do
not rely on the effects preemption has on performance (i.e.
you trade preemption for AltiVec and gain).

I don't think people investigate it too much because the first
thing they hit is lack of documentation, and then "well we don't
really recommend it". I think this makes Linux the worst OS a
developer would want to run on a G4 and G5, then? :D

-- 
Matt Sealey <matt@genesi-usa.com>
Manager, Genesi, Developer Relations

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-18 12:48 AltiVec in the kernel Matt Sealey
  2006-07-18 13:53 ` Kumar Gala
@ 2006-07-18 17:43 ` Paul Mackerras
  1 sibling, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2006-07-18 17:43 UTC (permalink / raw)
  To: matt; +Cc: linuxppc-dev

Matt Sealey writes:

> Once upon a time we were all told this wouldn't work for some reason,
> but a lot of documentation now hints that it does actually work and
> for instance there is a RAID5/6 driver (for G5) which uses AltiVec
> in a kernel context.

It's possible, with some restrictions, basically the same restrictions
on using floating point in the kernel.

Kernel use of altivec interacts with the lazy altivec context switch
that we do on UP kernels, and the fact that the kernel context switch
doesn't save/restore the altivec state.  That means that before using
altivec in the kernel you may have to save away the altivec state, and
you have to make sure you don't sleep or get preempted while using
altivec.

> But I didn't find any definitive documentation on how one goes about
> it. The largest clue I found was in Documentation/cpu_features.txt:
> 
> 	#ifdef CONFIG_ALTIVEC
> 	BEGIN_FTR_SECTION
> 		mfspr	r22,SPRN_VRSAVE		/* if G4, save vrsave register value */
> 		stw	r22,THREAD_VRSAVE(r23)
> 	END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
> 	#endif /* CONFIG_ALTIVEC */
> 
> So we can use AltiVec by implementing this kind of wrapper around
> kernel functions which may use AltiVec?

No, that's irrelevant; that just has to do with the VRSAVE register,
not the altivec state.  In fact VRSAVE isn't actually even part of the
altivec state.

> In the code above is there ANY significance of r22 and r23 other
> than that they are fairly high up and probably marked as "will
> be trashed" by all the relevant ABIs and so?

I hope we do a bit better than "probably" ... :)  No, there is no
particular significance to the choice of r22 and r23.  If you read the
code you will see that those registers are saved at the beginning of
the context switch routine and restored (from the new process's stack)
at the end.

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: AltiVec in the kernel
  2006-07-18 15:10   ` Matt Sealey
@ 2006-07-18 17:56     ` Paul Mackerras
  2006-07-19 18:10       ` Linas Vepstas
  2006-07-18 18:39     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 31+ messages in thread
From: Paul Mackerras @ 2006-07-18 17:56 UTC (permalink / raw)
  To: matt; +Cc: 'linuxppc-dev list'

Matt Sealey writes:

> Why isn't it recommended?

Because the overhead of saving away the user altivec state and
restoring it can easily overwhelm any advantage you get from using
altivec.

> We had our own guy look at it and he presented some significant
> performance improvements. One problem was, though, that the best
> improvement in theory came from a function which needed to be
> called very early in kernel boot, well before AltiVec was
> enabled, and everything else is marginal at best (1.n times
> improvement, but it is still 0.n more than 1.0). I am not clear
> on this and cannot find my discussion on the subject in my logs
> and email backups, so. I will leave it for now. 

I tried using altivec for memory copies, and while I was able to show
an improvement in speed of copying stuff that was hot in the cache,
there was no overall improvement in the context of everything else the
kernel does.  In other words, the things being copied were generally
not hot in the cache, and the CPU was able to saturate the memory
bandwidth using ordinary loads and stores.

> There is also plenty of example code (libmotovec, Freescale
> Application Notes) which improve things like TCP checksumming
> and so on using AltiVec. These patches are even used in EEMBC
> benchmarks to boost the scores.

TCP checksumming is simple enough that it is limited by memory
bandwidth rather than computation speed.  This is another example
where you can show an improvement on a microbenchmark because the data
is hot in the cache, but the improvement doesn't translate into any
real improvement in a real application.

> There is also plenty of examples of userspace code (as before,
> checksumming, encryption, compression/decompression) which has
> been improved. libfreevec includes some changes to the zlib
> window functions. For example the kernel includes an MD5, SHA,
> zlib compression framework.. mostly ported userspace code and
> standard libraries. Would these not be candidates?

A lot of compression and encryption algorithms, by their very nature,
are very difficult to parallelize enough to get any significant
improvement from altivec.  I looked at SHA1 for instance, and the
sequential dependencies in the computation are such that it is
practically impossible to find a way to do 4 things in parallel.  The
sequential dependencies are of course a critical part of the way that
SHA1 ensures that a small change in any part of the input data results
in substantial changes in every byte of the output.

> I think there are thousands of places where AltiVec could be
> used - even sparingly - to provide good performance improvements.

I think that there are actually very few places in the kernel where we
are doing something which is parallelizable, sufficiently
compute-intensive, and not bound by memory bandwidth, to be worth
using altivec.

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: AltiVec in the kernel
  2006-07-18 15:10   ` Matt Sealey
  2006-07-18 17:56     ` Paul Mackerras
@ 2006-07-18 18:39     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 31+ messages in thread
From: Benjamin Herrenschmidt @ 2006-07-18 18:39 UTC (permalink / raw)
  To: matt; +Cc: 'linuxppc-dev list', 'Paul Mackerras'


> I don't think people investigate it too much because the first
> thing they hit is lack of documentation, and then "well we don't
> really recommend it". I think this makes Linux the worst OS a
> developer would want to run on a G4 and G5, then? :D

It's not recommended for the same reason the FPU isn't used in the
kernel and x86 doesn't use SSE / MMX there neither except in a few
places where it does make sense like the RAID code. It's possible that
it might be interesting to do it for some of the crypto modules as well
and we certainly welcome any patch using altivec to improve some other
aspect of the kernel provided that it does indeed... improve
performances :)

Part of the problem is the cost of enabling/disabling it and
saving/restoring the vector registers that get clobbered when using it.

Essentially, the kernel entry only saves and restores GPRs. Not FPRs,
not VRs. This is done to keep the cost of kernel entry low. Which means
that at any given point in time, the altivec and FPU units contain
whatever context last used by userland. If the kernel wants to use it
for it's own, in thus needs to flush that context to the thread struct
(which also means that the unit will be disabled on the way back to
userland and re-faulted in when used again). That's what
enable_kernel_altivec() does (and the similar enable_kernel_fp()). This
cannot happen at interrupt time though and you shouldn't be holding
locks thus it may be a problem with some of the crypto stuffs as I think
they can be used in some weird code path. It's also important that no
scheduling happen until you are done with the unit, which is why you
have to disable preemption, since otherwise, the unit could be re-used
by userland behind your back.

Another alternative which can work at interrupt time, but requires a bit
of assembly hackery, is to manually enable MSR:VEC (if not already set)
and save and restore all the altivec registers modified by the code.

Ben.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-18 17:56     ` Paul Mackerras
@ 2006-07-19 18:10       ` Linas Vepstas
  2006-07-19 18:19         ` Paul Mackerras
  2006-07-20 12:31         ` Matt Sealey
  0 siblings, 2 replies; 31+ messages in thread
From: Linas Vepstas @ 2006-07-19 18:10 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: 'linuxppc-dev list'


On Wed, Jul 19, 2006 at 03:56:10AM +1000, Paul Mackerras wrote:
> A lot of compression and encryption algorithms, by their very nature,
> are very difficult to parallelize enough to get any significant
> improvement from altivec.  I looked at SHA1 for instance, and the
> sequential dependencies in the computation are such that it is
> practically impossible to find a way to do 4 things in parallel.  The
> sequential dependencies are of course a critical part of the way that
> SHA1 ensures that a small change in any part of the input data results
> in substantial changes in every byte of the output.

But perhaps, in principle, couldn't one run four independent streams 
in parallel?  Thus, for example, on an SSL-enabled web server, one 
could service multiple encryption/decryption threads at once. 

In practice, I don't beleive the infrastructure for that kind of
parallelism is in place. I'm struggling to find a reason to develop
that kind of infrastructure. Mumble something about Cell. 

> I think that there are actually very few places in the kernel where we
> are doing something which is parallelizable, sufficiently
> compute-intensive, and not bound by memory bandwidth, to be worth
> using altivec.

Yes.

As to non-kernel applications, is there anything for GMP (the 
Gnu Multi-Precision library, an arbitrary-precision math library)
on the Altivec? How aout the Cell?

--linas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-19 18:10       ` Linas Vepstas
@ 2006-07-19 18:19         ` Paul Mackerras
  2006-07-19 18:38           ` Johannes Berg
  2006-07-20 12:31         ` Matt Sealey
  1 sibling, 1 reply; 31+ messages in thread
From: Paul Mackerras @ 2006-07-19 18:19 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: 'linuxppc-dev list'

Linas Vepstas writes:

> But perhaps, in principle, couldn't one run four independent streams 
> in parallel?  Thus, for example, on an SSL-enabled web server, one 
> could service multiple encryption/decryption threads at once. 

Generally that would work.  If one had 4 separate streams to compute a
SHA1 of, one could do all 4 at once with altivec.  It would have to be
4 separate streams though, not 4 parts of a single stream.

> As to non-kernel applications, is there anything for GMP (the 
> Gnu Multi-Precision library, an arbitrary-precision math library)
> on the Altivec? How aout the Cell?

I don't really know, sorry.

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-19 18:19         ` Paul Mackerras
@ 2006-07-19 18:38           ` Johannes Berg
  2006-07-19 18:57             ` Linas Vepstas
  0 siblings, 1 reply; 31+ messages in thread
From: Johannes Berg @ 2006-07-19 18:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: 'linuxppc-dev list'

[-- Attachment #1: Type: text/plain, Size: 749 bytes --]

On Thu, 2006-07-20 at 04:19 +1000, Paul Mackerras wrote:
> Linas Vepstas writes:
> 
> > But perhaps, in principle, couldn't one run four independent streams 
> > in parallel?  Thus, for example, on an SSL-enabled web server, one 
> > could service multiple encryption/decryption threads at once. 
> 
> Generally that would work.  If one had 4 separate streams to compute a
> SHA1 of, one could do all 4 at once with altivec.  It would have to be
> 4 separate streams though, not 4 parts of a single stream.

I'd think it'd be pretty hard to get a real benefit from this because
the data is going to come from 4 totally different places, hence you
can't just load a single vector register and get data for all 4
streams...

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-19 18:38           ` Johannes Berg
@ 2006-07-19 18:57             ` Linas Vepstas
  0 siblings, 0 replies; 31+ messages in thread
From: Linas Vepstas @ 2006-07-19 18:57 UTC (permalink / raw)
  To: Johannes Berg; +Cc: 'linuxppc-dev list', Paul Mackerras

On Wed, Jul 19, 2006 at 08:38:21PM +0200, Johannes Berg wrote:
> On Thu, 2006-07-20 at 04:19 +1000, Paul Mackerras wrote:
> > Linas Vepstas writes:
> > 
> > > But perhaps, in principle, couldn't one run four independent streams 
> > > in parallel?  Thus, for example, on an SSL-enabled web server, one 
> > > could service multiple encryption/decryption threads at once. 
> > 
> > Generally that would work.  If one had 4 separate streams to compute a
> > SHA1 of, one could do all 4 at once with altivec.  It would have to be
> > 4 separate streams though, not 4 parts of a single stream.
> 
> I'd think it'd be pretty hard to get a real benefit from this because
> the data is going to come from 4 totally different places, hence you
> can't just load a single vector register and get data for all 4
> streams...

Dohh. Right. I actually thought that while writing the email, and then
it eveporated from my head before I hit the send button.  One would 
have to copy the incoming data into vectors, and the mem access latency
would probably overwhelm the performance gain (per "hot cache" as Paul
discussed).

--linas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: AltiVec in the kernel
  2006-07-19 18:10       ` Linas Vepstas
  2006-07-19 18:19         ` Paul Mackerras
@ 2006-07-20 12:31         ` Matt Sealey
  2006-07-20 13:23           ` Kumar Gala
  2006-07-20 17:42           ` Linas Vepstas
  1 sibling, 2 replies; 31+ messages in thread
From: Matt Sealey @ 2006-07-20 12:31 UTC (permalink / raw)
  To: 'Linas Vepstas', 'Paul Mackerras'
  Cc: 'linuxppc-dev list'



> But perhaps, in principle, couldn't one run four independent 
> streams in parallel?  Thus, for example, on an SSL-enabled 
> web server, one could service multiple encryption/decryption 
> threads at once. 
> 
> In practice, I don't beleive the infrastructure for that kind 
> of parallelism is in place. I'm struggling to find a reason 
> to develop that kind of infrastructure. Mumble something about Cell. 

If not AltiVec there is potential to use some features which come
with AltiVec like the data stream functionality. Or even the standard
PPC cache control stuff would work.

What's the case in the kernel for the memcpy functions etc., are
they optimized for doing things like longword copies rather than
byte-per-byte etc.? We found glibc sucked for that.

-- 
Matt Sealey <matt@genesi-usa.com>
Manager, Genesi, Developer Relations

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-20 12:31         ` Matt Sealey
@ 2006-07-20 13:23           ` Kumar Gala
  2006-07-20 13:33             ` Matt Sealey
  2006-07-20 17:42           ` Linas Vepstas
  1 sibling, 1 reply; 31+ messages in thread
From: Kumar Gala @ 2006-07-20 13:23 UTC (permalink / raw)
  To: matt; +Cc: 'Paul Mackerras', 'linuxppc-dev list'


On Jul 20, 2006, at 7:31 AM, Matt Sealey wrote:

>
>
>> But perhaps, in principle, couldn't one run four independent
>> streams in parallel?  Thus, for example, on an SSL-enabled
>> web server, one could service multiple encryption/decryption
>> threads at once.
>>
>> In practice, I don't beleive the infrastructure for that kind
>> of parallelism is in place. I'm struggling to find a reason
>> to develop that kind of infrastructure. Mumble something about Cell.
>
> If not AltiVec there is potential to use some features which come
> with AltiVec like the data stream functionality. Or even the standard
> PPC cache control stuff would work.
>
> What's the case in the kernel for the memcpy functions etc., are
> they optimized for doing things like longword copies rather than
> byte-per-byte etc.? We found glibc sucked for that.

Matt, can I ask what exactly you are trying to accomplish?  There is  
a lot of work put into the kernel to ensure things are optimized.   
I'd say far more so than gets put into user space.

- k

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: AltiVec in the kernel
  2006-07-20 13:23           ` Kumar Gala
@ 2006-07-20 13:33             ` Matt Sealey
  0 siblings, 0 replies; 31+ messages in thread
From: Matt Sealey @ 2006-07-20 13:33 UTC (permalink / raw)
  To: 'Kumar Gala'
  Cc: 'Paul Mackerras', 'linuxppc-dev list'



> Matt, can I ask what exactly you are trying to accomplish?  There is  
> a lot of work put into the kernel to ensure things are optimized.   
> I'd say far more so than gets put into user space.

Just trying to find out what the general state of play is.

-- 
Matt Sealey <matt@genesi-usa.com>
Manager, Genesi, Developer Relations

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-20 12:31         ` Matt Sealey
  2006-07-20 13:23           ` Kumar Gala
@ 2006-07-20 17:42           ` Linas Vepstas
  2006-07-20 18:47             ` Brian D. Carlstrom
  1 sibling, 1 reply; 31+ messages in thread
From: Linas Vepstas @ 2006-07-20 17:42 UTC (permalink / raw)
  To: Matt Sealey; +Cc: 'linuxppc-dev list', 'Paul Mackerras'

On Thu, Jul 20, 2006 at 07:31:32AM -0500, Matt Sealey wrote:
> 
> What's the case in the kernel for the memcpy functions etc., are
> they optimized for doing things like longword copies rather than
> byte-per-byte etc.? 

arch/powerpc/lib/copy_32.S
arch/powerpc/lib/memcpy_64.S

Looks pretty darned optimized to me.

> We found glibc sucked for that.

Only because someone was asleep at the wheel, or there was a bug. 

When glibc gets ported to a new architecture, one of the earliest 
tasks is to create optimized versions of memcpy and the like. 
Presumably, on powerpc, this would have been done more than a 
decade ago; its hard for me to imagine that there'd be a problem 
there.  Now, I haven't looked at the code, but I just can't imagine 
how this would not have been found and fixed by now. Is there
really a problem wiht glibc performance on powerpc? I mean,
this is a pretty serious accusation, and something that should 
be fixed asap.

--linas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-20 17:42           ` Linas Vepstas
@ 2006-07-20 18:47             ` Brian D. Carlstrom
  2006-07-20 19:05               ` Olof Johansson
  0 siblings, 1 reply; 31+ messages in thread
From: Brian D. Carlstrom @ 2006-07-20 18:47 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: 'Paul Mackerras', 'linuxppc-dev list'

At Thu, 20 Jul 2006 12:42:55 -0500,
Linas Vepstas wrote:
> > We found glibc sucked for that.
> 
> Only because someone was asleep at the wheel, or there was a bug. 
> 
> When glibc gets ported to a new architecture, one of the earliest 
> tasks is to create optimized versions of memcpy and the like. 
> Presumably, on powerpc, this would have been done more than a 
> decade ago; its hard for me to imagine that there'd be a problem 
> there.  Now, I haven't looked at the code, but I just can't imagine 
> how this would not have been found and fixed by now. Is there
> really a problem wiht glibc performance on powerpc? I mean,
> this is a pretty serious accusation, and something that should 
> be fixed asap.

In the course of my work, I use powerpc architecture simulators. When
working on Mac OS X with a G5, I had to implement some of the basic
AltiVec specifically for use by their libc memcpy implementation. A
quick grep memcpy in the recent glibc sources on my linux/ppc box seems
to show no where near that level of optimization, but I admit that I
could have missed something. However, I would not be surprised that
glibc avoided AltiVec specific optimizations since it would add to the
complexity of supporting various architectures with one binary. On Mac
OS X, libc actually delegated a small number of libc calls such as
memcpy via a kernel managed page at the end of the address space that
setup which routines to use based on currently running architecture.

-bri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-20 18:47             ` Brian D. Carlstrom
@ 2006-07-20 19:05               ` Olof Johansson
  2006-07-20 21:56                 ` Brian D. Carlstrom
  0 siblings, 1 reply; 31+ messages in thread
From: Olof Johansson @ 2006-07-20 19:05 UTC (permalink / raw)
  To: Brian D. Carlstrom; +Cc: 'Paul Mackerras', 'linuxppc-dev list'

On Thu, Jul 20, 2006 at 11:47:04AM -0700, Brian D. Carlstrom wrote:

> A quick grep memcpy in the recent glibc sources on my linux/ppc box
> seems to show no where near that level of optimization, but I admit
> that I could have missed something.

http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html


-Olof

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-20 19:05               ` Olof Johansson
@ 2006-07-20 21:56                 ` Brian D. Carlstrom
  2006-07-20 22:39                   ` Daniel Ostrow
                                     ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Brian D. Carlstrom @ 2006-07-20 21:56 UTC (permalink / raw)
  To: Olof Johansson; +Cc: 'Paul Mackerras', 'linuxppc-dev list'

At Thu, 20 Jul 2006 14:05:23 -0500, Olof Johansson wrote:
> On Thu, Jul 20, 2006 at 11:47:04AM -0700, Brian D. Carlstrom wrote:
> > A quick grep memcpy in the recent glibc sources on my linux/ppc box
> > seems to show no where near that level of optimization, but I admit
> > that I could have missed something.
> 
> http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html

Very interesting. According to that page, the memcpy optimizations seem
to be using 64-bit operations and that 128-bit AltiVec operations are
still being solicited. 

I was encouraged to see the following: 

    If you need to build generic distributions (supporting several
    <cpu_types>) you can leverage the dl_procinfo support built into
    glibc. This mechanism allows for multiple versions of the core
    libraries (libc, libm, librt, libpthread, libpthread_db) to be
    stored in hardware/platform specific subdirectories under /lib[64].

However, I'm guessing this addon is not something found in common
distributions for PowerPC like Debian, Fedora, Gentoo, Ubuntu, ...

-bri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-20 21:56                 ` Brian D. Carlstrom
@ 2006-07-20 22:39                   ` Daniel Ostrow
  2006-07-21  6:35                   ` Olof Johansson
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Daniel Ostrow @ 2006-07-20 22:39 UTC (permalink / raw)
  To: Brian D. Carlstrom
  Cc: Olof Johansson, 'linuxppc-dev list',
	'Paul Mackerras'

On Thu, 2006-07-20 at 14:56 -0700, Brian D. Carlstrom wrote:
> At Thu, 20 Jul 2006 14:05:23 -0500, Olof Johansson wrote:
> > On Thu, Jul 20, 2006 at 11:47:04AM -0700, Brian D. Carlstrom wrote:
> > > A quick grep memcpy in the recent glibc sources on my linux/ppc box
> > > seems to show no where near that level of optimization, but I admit
> > > that I could have missed something.
> > 
> > http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html
> 
> Very interesting. According to that page, the memcpy optimizations seem
> to be using 64-bit operations and that 128-bit AltiVec operations are
> still being solicited. 
> 
> I was encouraged to see the following: 
> 
>     If you need to build generic distributions (supporting several
>     <cpu_types>) you can leverage the dl_procinfo support built into
>     glibc. This mechanism allows for multiple versions of the core
>     libraries (libc, libm, librt, libpthread, libpthread_db) to be
>     stored in hardware/platform specific subdirectories under /lib[64].
> 
> However, I'm guessing this addon is not something found in common
> distributions for PowerPC like Debian, Fedora, Gentoo, Ubuntu, ...

It has been part of Gentoo's glibc since 2.4 came out.

--Dan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-20 21:56                 ` Brian D. Carlstrom
  2006-07-20 22:39                   ` Daniel Ostrow
@ 2006-07-21  6:35                   ` Olof Johansson
  2006-07-21 14:42                   ` Matt Sealey
  2006-07-21 22:21                   ` Peter Bergner
  3 siblings, 0 replies; 31+ messages in thread
From: Olof Johansson @ 2006-07-21  6:35 UTC (permalink / raw)
  To: Brian D. Carlstrom; +Cc: 'linuxppc-dev list', 'Paul Mackerras'

On Thu, Jul 20, 2006 at 02:56:33PM -0700, Brian D. Carlstrom wrote:

> However, I'm guessing this addon is not something found in common
> distributions for PowerPC like Debian, Fedora, Gentoo, Ubuntu, ...

There's always lead time to get things into distros, that would still be
true if you modified glibc instead as well.


-Olof

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: AltiVec in the kernel
  2006-07-20 21:56                 ` Brian D. Carlstrom
  2006-07-20 22:39                   ` Daniel Ostrow
  2006-07-21  6:35                   ` Olof Johansson
@ 2006-07-21 14:42                   ` Matt Sealey
  2006-07-21 16:51                     ` Linas Vepstas
  2006-07-21 22:21                   ` Peter Bergner
  3 siblings, 1 reply; 31+ messages in thread
From: Matt Sealey @ 2006-07-21 14:42 UTC (permalink / raw)
  To: 'Brian D. Carlstrom', 'Olof Johansson'
  Cc: 'linuxppc-dev list', 'Paul Mackerras'

 

> -----Original Message-----
> From: linuxppc-dev-bounces+matt=genesi-usa.com@ozlabs.org 
> [mailto:linuxppc-dev-bounces+matt=genesi-usa.com@ozlabs.org] 
> On Behalf Of Brian D. Carlstrom
> Sent: Thursday, July 20, 2006 4:57 PM
> To: Olof Johansson
> Cc: 'Paul Mackerras'; 'linuxppc-dev list'
> Subject: Re: AltiVec in the kernel
> 
> At Thu, 20 Jul 2006 14:05:23 -0500, Olof Johansson wrote:
> > On Thu, Jul 20, 2006 at 11:47:04AM -0700, Brian D. Carlstrom wrote:
> > > A quick grep memcpy in the recent glibc sources on my 
> linux/ppc box 
> > > seems to show no where near that level of optimization, 
> but I admit 
> > > that I could have missed something.
> > 
> > http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html
> 
> Very interesting. According to that page, the memcpy 
> optimizations seem to be using 64-bit operations and that 
> 128-bit AltiVec operations are still being solicited. 

"Still"?

http://www.freevec.org/ 

Been there for months, before the glibc thing. Most of the functions
are ready. Anyone can bugfix this. The beauty of GPL. The ugly part
is.. we've had this there for months. Nobody has contributed a single
update or bugfix or even a performance test as far as I know.

> However, I'm guessing this addon is not something found in 
> common distributions for PowerPC like Debian, Fedora, Gentoo, 
> Ubuntu, ...

Indeed it's a cute feature but we were scared away by the glibc guys
when it came to glibc-ports (perhaps they just considered it not
ready, but we wanted it in there for the first release, which was
the next one). Hence freevec. Konstantinos will get back in a couple
weeks and post some updates.

The more interesting code is the MySQL stuff. All of this has been
developed by finding good examples of apps, profiling them and then
optimizing the top few functions that are most used.

-- 
Matt Sealey <matt@genesi-usa.com>
Manager, Genesi, Developer Relations

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-21 14:42                   ` Matt Sealey
@ 2006-07-21 16:51                     ` Linas Vepstas
  2006-07-21 18:08                       ` Matt Sealey
                                         ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Linas Vepstas @ 2006-07-21 16:51 UTC (permalink / raw)
  To: Matt Sealey
  Cc: 'Olof Johansson', 'linuxppc-dev list',
	'Paul Mackerras'

On Fri, Jul 21, 2006 at 09:42:32AM -0500, Matt Sealey wrote:
>  
> > > http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html
> > 
> > 128-bit AltiVec operations are still being solicited. 
> 
> "Still"?
> 
> http://www.freevec.org/ 
> 
> Been there for months, before the glibc thing. Most of the functions
> are ready. Anyone can bugfix this. The beauty of GPL. The ugly part
> is.. we've had this there for months. Nobody has contributed a single
> update or bugfix or even a performance test as far as I know.

Sounds like a problem of advertising and communications.  This is
kind of "under the radar" for most users and developers. It needs to
work out-of-the-box, most people, even those with interest in
performance, will not even be aware of the possibility to tne this.

It should be folded into glibc. It is up to the altivec product vendor
to nag the glibc folks into folding it in. This task could be as hard
as writing the code in the first place.

> Indeed it's a cute feature but we were scared away by the glibc guys

Many maintainers of core libraries have similar behaviour patterns.
Besides glibc, gcc and gsl come to mind. This is becase they get tired out
by naive eager-beavers who walk in with the greatest idea in the world,
make a big fuss about it, and the proceed to demonstrate that they have 
absolutely no clue of what they're talking about.  For every ten of 
those, there's maybe one legit idea. Worse, many of these "clueless 
newbies" come in the surprising shape of PhD's working outside thier 
specialty, and can convingingly sling jargon and authority for a while 
before its realized they're just... clueless.

If you've got good code, you'll just need to be persistent.

--linas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: AltiVec in the kernel
  2006-07-21 16:51                     ` Linas Vepstas
@ 2006-07-21 18:08                       ` Matt Sealey
  2006-07-22  3:09                         ` Segher Boessenkool
  2006-07-21 18:46                       ` Brian D. Carlstrom
  2006-07-21 21:30                       ` Hollis Blanchard
  2 siblings, 1 reply; 31+ messages in thread
From: Matt Sealey @ 2006-07-21 18:08 UTC (permalink / raw)
  To: 'Linas Vepstas'
  Cc: 'Olof Johansson', 'linuxppc-dev list',
	'Paul Mackerras'



> Sounds like a problem of advertising and communications.  
> This is kind of "under the radar" for most users and 
> developers. It needs to work out-of-the-box, most people, 
> even those with interest in performance, will not even be 
> aware of the possibility to tne this.

It's listed on every site we have, and on PenguinPPC.org too if I
recall (hi Hollis) it even got a sticky news item like a lot of the
stuff we do (thanks Hollis :).

Everyone who cares knows about it, I would think. Probably not
enough people care, is the problem.

> It should be folded into glibc. It is up to the altivec 
> product vendor to nag the glibc folks into folding it in. 

You mean Freescale? Or Genesi?

Freevec was being developed as a "perfect opportunity". glibc-ports
came to life and was something that code could be contributed to.
Since it was such a hassle dealing with the glibc guys, it ended up
being a seperate library for now.

> This task could be as hard as writing the code in the first place.

I think we could handle it if there were less stubborn mules maintaining
the most important software. I can think of one guy in particular.. but
I won't name him.

> Many maintainers of core libraries have similar behaviour patterns.
> Besides glibc, gcc and gsl come to mind. This is becase they 
> get tired out by naive eager-beavers who walk in with the 
> greatest idea in the world

I think this kind of behaviour stalls Open Source software,
because it unfairly treats those *with* clues.

<-us-> do you want the AltiVec code or not?
<them> Oh no because I am bored of dealing with people who only had ideas!!

It doesn't make much sense politically or technically.

So like I said we could have had this code in glibc when glibc-ports
first was conceptualised and then released, but there was just too
many mules in the way.

Check the freevec.org whitepapers section), Konstantinos is not just
"ideas", he proved out optimizations and then implemented them.

Is it his fault that they're not in glibc, because he's "stupid" or
"clueless"? :D

> If you've got good code, you'll just need to be persistent.

Personally I am pretty tired (in return) with angry-faced Open
Source developers deciding that "Open Source" is equivalent to
"My Source, Back Off, Your Patch Sucks". It is always the choice
of the lead developer (and/or copyright holder) to refuse
patches, but.. seriously.. a lot of Open Source development is
the wrong kind of dictatorship.

Cynicism aside.. :D

</rant>

-- 
Matt Sealey <matt@genesi-usa.com>
Manager, Genesi, Developer Relations

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-21 16:51                     ` Linas Vepstas
  2006-07-21 18:08                       ` Matt Sealey
@ 2006-07-21 18:46                       ` Brian D. Carlstrom
  2006-07-21 21:30                       ` Hollis Blanchard
  2 siblings, 0 replies; 31+ messages in thread
From: Brian D. Carlstrom @ 2006-07-21 18:46 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: 'Olof Johansson', 'Paul Mackerras',
	'linuxppc-dev list'

At Fri, 21 Jul 2006 11:51:30 -0500,
Linas Vepstas wrote:
> If you've got good code, you'll just need to be persistent.

While I agree with most of Matt's rant, I think Linas is right as well.
Hearing that code is already in a distribution like Gentoo makes it
easier to make the case that the code doesn't suck or is vaporware.

-bri
disclaimer: a PhD student working outside my specialty :)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-21 16:51                     ` Linas Vepstas
  2006-07-21 18:08                       ` Matt Sealey
  2006-07-21 18:46                       ` Brian D. Carlstrom
@ 2006-07-21 21:30                       ` Hollis Blanchard
  2 siblings, 0 replies; 31+ messages in thread
From: Hollis Blanchard @ 2006-07-21 21:30 UTC (permalink / raw)
  To: Linas Vepstas, Matt Sealey
  Cc: 'Olof Johansson', 'linuxppc-dev list',
	'Paul Mackerras', Konstantinos Margaritis


On Fri, 21 Jul 2006 11:51:30 -0500, "Linas Vepstas"
<linas@austin.ibm.com> said:
> On Fri, Jul 21, 2006 at 09:42:32AM -0500, Matt Sealey wrote:
> > http://www.freevec.org/ 
> > 
> > Been there for months, before the glibc thing. Most of the functions
> > are ready. Anyone can bugfix this. The beauty of GPL. The ugly part
> > is.. we've had this there for months. Nobody has contributed a single
> > update or bugfix or even a performance test as far as I know.
> 
> Sounds like a problem of advertising and communications.  This is
> kind of "under the radar" for most users and developers. It needs to
> work out-of-the-box, most people, even those with interest in
> performance, will not even be aware of the possibility to tne this.

It is difficult to make sure every OSS developer is notified of all work
they may be interested in...

However, I have noticed a trend where Genesi people seem to think
everybody pays attention to their websites (and the same could be said
for Debian and other subcultures). In this case there actually have been
other people aware of this project, but not very many. Considering all
the traffic about it on ppczone.org, people looking for exposure for
their project may want to look beyond PPCZone.

> It should be folded into glibc. It is up to the altivec product vendor
> to nag the glibc folks into folding it in. This task could be as hard
> as writing the code in the first place.

Konstantinos is aware of Steve's glibc project and has indicated he'll
try to contribute to it.

To be fair, probably not many people have heard of Steve's project
either. I doubt Konstantinos would have heard of it if I hadn't
mentioned it.

-Hollis

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-20 21:56                 ` Brian D. Carlstrom
                                     ` (2 preceding siblings ...)
  2006-07-21 14:42                   ` Matt Sealey
@ 2006-07-21 22:21                   ` Peter Bergner
  3 siblings, 0 replies; 31+ messages in thread
From: Peter Bergner @ 2006-07-21 22:21 UTC (permalink / raw)
  To: Brian D. Carlstrom
  Cc: Olof Johansson, 'linuxppc-dev list',
	'Paul Mackerras', Steve Munroe

On Thu, 2006-07-20 at 14:56 -0700, Brian D. Carlstrom wrote:
> At Thu, 20 Jul 2006 14:05:23 -0500, Olof Johansson wrote:
> > http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html
> 
> Very interesting. According to that page, the memcpy optimizations seem
> to be using 64-bit operations and that 128-bit AltiVec operations are
> still being solicited. 
> 
> I was encouraged to see the following: 
> 
>     If you need to build generic distributions (supporting several
>     <cpu_types>) you can leverage the dl_procinfo support built into
>     glibc. This mechanism allows for multiple versions of the core
>     libraries (libc, libm, librt, libpthread, libpthread_db) to be
>     stored in hardware/platform specific subdirectories under /lib[64].

Actually, this support is not limited to the core glibc routines or
the system lib directors /lib/ and /usr/lib/.  This works just as well
for third party shipped libraries in their own library trees as the
following example (on a power5 box) shows:

bergner@vervainp1:~/cpu-tuned-libs> pwd
/home/bergner/cpu-tuned-libs

bergner@vervainp1:~/cpu-tuned-libs> ls lib/ lib/power5/
lib/:
libfoo.so  power5/

lib/power5/:
libfoo.so

bergner@vervainp1:~/cpu-tuned-libs> gcc
-L/home/bergner/cpu-tuned-libs/lib -R/home/bergner/cpu-tuned-libs/lib
main.c -lfoo

bergner@vervainp1:~/cpu-tuned-libs> ldd a.out
        linux-vdso32.so.1 =>  (0x00100000)
        libfoo.so => /home/bergner/cpu-tuned-libs/lib/power5/libfoo.so
(0x0ffde000)
        libc.so.6 => /lib/power5/libc.so.6 (0x0fe69000)
        /lib/ld.so.1 (0xf7fe1000)

bergner@vervainp1:~/cpu-tuned-libs> ./a.out
Loaded the optimzed lib

bergner@vervainp1:~/cpu-tuned-libs> rm lib/power5/libfoo.so

bergner@vervainp1:~/cpu-tuned-libs> ldd a.out
        linux-vdso32.so.1 =>  (0x00100000)
        libfoo.so => /home/bergner/cpu-tuned-libs/lib/libfoo.so
(0x0ffde000)
        libc.so.6 => /lib/power5/libc.so.6 (0x0fe69000)
        /lib/ld.so.1 (0xf7fe1000)

bergner@vervainp1:~/cpu-tuned-libs> ./a.out
Loaded the unoptimzed lib


The runtime loader magic uses the AT_PLATFORM string value as
the subdirectory to search in under the .../lib/ or .../lib64/
library directory.  To find out what your AT_PLATFORM value is
on your current box, you can do:

bergner@vervainp1:~/cpu-tuned-libs> LD_SHOW_AUXV=1 /bin/true
AT_DCACHEBSIZE:  0x80
AT_ICACHEBSIZE:  0x80
AT_UCACHEBSIZE:  0x0
AT_SYSINFO_EHDR: 0x100000
AT_HWCAP:        power5 mmu fpu ppc64 ppc32
AT_PAGESZ:       4096
AT_CLKTCK:       100
AT_PHDR:         0x10000034
AT_PHENT:        32
AT_PHNUM:        9
AT_BASE:         0xf7fe1000
AT_FLAGS:        0x0
AT_ENTRY:        0x10000980
AT_UID:          1001
AT_EUID:         1001
AT_GID:          100
AT_EGID:         100
AT_SECURE:       0
AT_PLATFORM:     power5


> However, I'm guessing this addon is not something found in common
> distributions for PowerPC like Debian, Fedora, Gentoo, Ubuntu, ...

At last years GCC Developers Summit, one of the Ubuntu guys mentioned
he was interested in adding it to Ubuntu.  I haven't heard whether that
has shown up yet though.  It will be available in upcoming SUSE and
Red Hat enterprise distros.  I don't know about the others.  As Olof
mentioned, it can take some lead time for this to get picked up.
There's also the question of how many and which processors a distro
will ship cpu optimized libraries for.  Given all of the PowerPC
variants, they obviously can ship optimized libs for everything.

Peter

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2006-07-21 18:08                       ` Matt Sealey
@ 2006-07-22  3:09                         ` Segher Boessenkool
  2006-07-23 13:28                           ` Matt Sealey
  0 siblings, 1 reply; 31+ messages in thread
From: Segher Boessenkool @ 2006-07-22  3:09 UTC (permalink / raw)
  To: matt
  Cc: 'Olof Johansson', 'linuxppc-dev list',
	'Paul Mackerras'

> Freevec was being developed as a "perfect opportunity". glibc-ports
> came to life and was something that code could be contributed to.
> Since it was such a hassle dealing with the glibc guys, it ended up
> being a seperate library for now.

Do you have a pointer to an archive of that email thread?  I can't
remember it.

You could give Freevec a whole lot more exposure, to people who
might be more interested in it than the average glibc user, by
putting it into uClibc first.  Additional advantage is that you
don't have to care about forward/backward compatibility issues,
or even whether the platform a binary ends up running on actually
has AltiVec or not (uClibc gets tailored to the exact system it
runs on at compile time).  So you can focus on the routines you
want to speed up instead of on all the infrastructure stuff
required for glibc.

You'll have to update uClibc's PowerPC port first though (mostly
just copying stuff from recent glibc) -- it seems the libc AltiVec
support (for handling setjmp() etc.) isn't in there yet.

>> This task could be as hard as writing the code in the first place.

Not as hard.  Way, way harder instead.  Part of that is that the
code probably really isn't good enough yet, sorry.  And then there's
all the compatibility stuff, and symbol versioning, etc.  And the
communication issue, of course.


Segher

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: AltiVec in the kernel
  2006-07-22  3:09                         ` Segher Boessenkool
@ 2006-07-23 13:28                           ` Matt Sealey
  2006-07-23 21:37                             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 31+ messages in thread
From: Matt Sealey @ 2006-07-23 13:28 UTC (permalink / raw)
  To: 'Segher Boessenkool'
  Cc: 'Olof Johansson', 'linuxppc-dev list',
	'Paul Mackerras'


> You could give Freevec a whole lot more exposure, to people 
> who might be more interested in it than the average glibc 
> user, by putting it into uClibc first.

[snip]

> You'll have to update uClibc's PowerPC port first though 
> (mostly just copying stuff from recent glibc) -- it seems the 
> libc AltiVec support (for handling setjmp() etc.) isn't in there yet.

I remember a discussion from one of the Gentoo guys wanting to do this
with libfreevec.

Getting into Gentoo, though, is not difficult. The problem with this
is Gentoo is one Linux distribution. I would be more impressed if code
was in Debian or Ubuntu considering their exhausting lead times on
producing new package trees and accepting new code :D

-- 
Matt Sealey <matt@genesi-usa.com>
Manager, Genesi, Developer Relations

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: AltiVec in the kernel
  2006-07-23 13:28                           ` Matt Sealey
@ 2006-07-23 21:37                             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 31+ messages in thread
From: Benjamin Herrenschmidt @ 2006-07-23 21:37 UTC (permalink / raw)
  To: matt
  Cc: 'Olof Johansson', 'linuxppc-dev list',
	'Paul Mackerras'


> I remember a discussion from one of the Gentoo guys wanting to do this
> with libfreevec.
> 
> Getting into Gentoo, though, is not difficult. The problem with this
> is Gentoo is one Linux distribution. I would be more impressed if code
> was in Debian or Ubuntu considering their exhausting lead times on
> producing new package trees and accepting new code :D

It seems to me that the "problem" just doesn't exist at the moment...
libfreevec is nice, but it's unfinished, and the author is away for now
and thus not able to complete nor work on a port to glibc or others.

Once he's back, of course, it would be nice to have him complete the
work (and maybe get some outside help).

I'd like to also verify his methodology for measuring the performance
improvements, I'm not saying it's wrong, I want to make sure some of the
overhead of enabling altivec has been properly measured for various
usage patterns and thus possibly restrict the optimisations to patterns
where that matter, as an example, only use altivec for large memcpy's.

Once that's done, I don't see any good reason why it would be so hard to
include that work into glibc, or rather into the powerpc add-ons in a
first step and maybe then the whole into glibc. Maintainers rarely
rejects things just for the sake of doing so. If they do so, they
usually provide reasons, often boiling to implementation details, than
can then be fixed. Note also that in the case of submitting code to
glibc, there is a copyright assignment issue to be sorted out I think (I
don't know the details here).

I have the feeling that there is very little point to this thread. Let's
wait for Konstantinos to be back and submit his work, possibly to this
list at first for review, tests, etc... and then to the appropriate
maintainers. If there is a problem at that point, then we'll see how we
can address it.

Regards,
Ben.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* AltiVec in the kernel
@ 2009-12-11 11:45 Simon Richter
  2009-12-11 15:49 ` Arnd Bergmann
  0 siblings, 1 reply; 31+ messages in thread
From: Simon Richter @ 2009-12-11 11:45 UTC (permalink / raw)
  To: linuxppc-dev

Hi,

since there has been a thread on allowing the use of a coprocessor in
the kernel already: I am wondering if it'd make sense to use AltiVec for
AES in dm-crypt, and how difficult it would be to implement that.

I'm using a PegasosII which has a G4 running at 1 GHz; I get around 3
MB/s throughput when accessing harddisks. I think that could be
improved.

If I understand correctly, the actual encryption work runs in a kernel
thread, which is scheduled normally, so it ought to be possible to
enable AltiVec for that thread; am I missing something here?

   Simon

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2009-12-11 11:45 Simon Richter
@ 2009-12-11 15:49 ` Arnd Bergmann
  2009-12-16 22:11   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 31+ messages in thread
From: Arnd Bergmann @ 2009-12-11 15:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Simon Richter, Sebastian Siewior

On Friday 11 December 2009, Simon Richter wrote:
> Hi,
> 
> since there has been a thread on allowing the use of a coprocessor in
> the kernel already: I am wondering if it'd make sense to use AltiVec for
> AES in dm-crypt, and how difficult it would be to implement that.
> 
> I'm using a PegasosII which has a G4 running at 1 GHz; I get around 3
> MB/s throughput when accessing harddisks. I think that could be
> improved.
> 
> If I understand correctly, the actual encryption work runs in a kernel
> thread, which is scheduled normally, so it ought to be possible to
> enable AltiVec for that thread; am I missing something here?

Sebastian Siewior has implemented this some time ago:

http://old.nabble.com/-RFC-0-3--Experiments-with-AES-AltiVec,-part-2-tc10034255.html

You can try the old patches on your machine to see if they are any good
there.

	Arnd <><

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AltiVec in the kernel
  2009-12-11 15:49 ` Arnd Bergmann
@ 2009-12-16 22:11   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 31+ messages in thread
From: Sebastian Andrzej Siewior @ 2009-12-16 22:11 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Simon Richter, linuxppc-dev

* Arnd Bergmann | 2009-12-11 16:49:25 [+0100]:

>On Friday 11 December 2009, Simon Richter wrote:
>> Hi,
>> 
>> since there has been a thread on allowing the use of a coprocessor in
>> the kernel already: I am wondering if it'd make sense to use AltiVec for
>> AES in dm-crypt, and how difficult it would be to implement that.
>> 
>> I'm using a PegasosII which has a G4 running at 1 GHz; I get around 3
>> MB/s throughput when accessing harddisks. I think that could be
>> improved.
>> 
>> If I understand correctly, the actual encryption work runs in a kernel
>> thread, which is scheduled normally, so it ought to be possible to
>> enable AltiVec for that thread; am I missing something here?

dm-crypt is async these days so the patches Arnd mentioned could be
used actually :) I've never tested them with dm-crypt but it should
work. Back then I had around 20MiB/sec encryption and around 15 MiB/sec
for decryption on 4KiB page on a PS3 [0]. This was pure testing, no
subsystem was involved. dm-crypt will feed multiple 512 byte requests.
And according [1] 512 bytes are aren't slow :) However [2] says that
that AltiVec was always slower than the generic implementation. Maybe
PS3's AltiVec unit was slower than the average one because everyone was
focuesed on the SPUs. Maybe not and you get similar results. 

>Sebastian Siewior has implemented this some time ago:
>
>http://old.nabble.com/-RFC-0-3--Experiments-with-AES-AltiVec,-part-2-tc10034255.html
>
>You can try the old patches on your machine to see if they are any good
>there.
Ah you remember :)

[0] http://diploma-thesis.siewior.net/html/diplomarbeitch4.html#x12-46002r2
[1] http://diploma-thesis.siewior.net/html/diplomarbeitch4.html#x12-46004r4
[2] http://diploma-thesis.siewior.net/html/diplomarbeitch4.html#x12-47017r5
>
>	Arnd <><

Sebastian

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2009-12-16 22:28 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-18 12:48 AltiVec in the kernel Matt Sealey
2006-07-18 13:53 ` Kumar Gala
2006-07-18 15:10   ` Matt Sealey
2006-07-18 17:56     ` Paul Mackerras
2006-07-19 18:10       ` Linas Vepstas
2006-07-19 18:19         ` Paul Mackerras
2006-07-19 18:38           ` Johannes Berg
2006-07-19 18:57             ` Linas Vepstas
2006-07-20 12:31         ` Matt Sealey
2006-07-20 13:23           ` Kumar Gala
2006-07-20 13:33             ` Matt Sealey
2006-07-20 17:42           ` Linas Vepstas
2006-07-20 18:47             ` Brian D. Carlstrom
2006-07-20 19:05               ` Olof Johansson
2006-07-20 21:56                 ` Brian D. Carlstrom
2006-07-20 22:39                   ` Daniel Ostrow
2006-07-21  6:35                   ` Olof Johansson
2006-07-21 14:42                   ` Matt Sealey
2006-07-21 16:51                     ` Linas Vepstas
2006-07-21 18:08                       ` Matt Sealey
2006-07-22  3:09                         ` Segher Boessenkool
2006-07-23 13:28                           ` Matt Sealey
2006-07-23 21:37                             ` Benjamin Herrenschmidt
2006-07-21 18:46                       ` Brian D. Carlstrom
2006-07-21 21:30                       ` Hollis Blanchard
2006-07-21 22:21                   ` Peter Bergner
2006-07-18 18:39     ` Benjamin Herrenschmidt
2006-07-18 17:43 ` Paul Mackerras
  -- strict thread matches above, loose matches on Subject: below --
2009-12-11 11:45 Simon Richter
2009-12-11 15:49 ` Arnd Bergmann
2009-12-16 22:11   ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).