* AltiVec in the kernel @ 2006-07-18 12:48 Matt Sealey 2006-07-18 13:53 ` Kumar Gala 2006-07-18 17:43 ` Paul Mackerras 0 siblings, 2 replies; 31+ messages in thread From: Matt Sealey @ 2006-07-18 12:48 UTC (permalink / raw) To: linuxppc-dev Once upon a time we were all told this wouldn't work for some reason, but a lot of documentation now hints that it does actually work and for instance there is a RAID5/6 driver (for G5) which uses AltiVec in a kernel context. But I didn't find any definitive documentation on how one goes about it. The largest clue I found was in Documentation/cpu_features.txt: #ifdef CONFIG_ALTIVEC BEGIN_FTR_SECTION mfspr r22,SPRN_VRSAVE /* if G4, save vrsave register value */ stw r22,THREAD_VRSAVE(r23) END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif /* CONFIG_ALTIVEC */ So we can use AltiVec by implementing this kind of wrapper around kernel functions which may use AltiVec? In the code above is there ANY significance of r22 and r23 other than that they are fairly high up and probably marked as "will be trashed" by all the relevant ABIs and so? Just curious, as I would like to investigate writing some docs at least on this (in article fashion) to go with PPCZone, Libfreevec and so on. I think there is a problem here in that simply developers who may be interested in doing this kind of optimized code do not know where to start (and we are thinking from a point of view of also teaching sessions too, like we did at FTF Frankfurt 2004, so after we teach them what AltiVec is etc. we demonstrate application AND kernel functionality and the quirks associated with it). -- Matt Sealey <matt@genesi-usa.com> Manager, Genesi, Developer Relations ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-18 12:48 AltiVec in the kernel Matt Sealey @ 2006-07-18 13:53 ` Kumar Gala 2006-07-18 15:10 ` Matt Sealey 2006-07-18 17:43 ` Paul Mackerras 1 sibling, 1 reply; 31+ messages in thread From: Kumar Gala @ 2006-07-18 13:53 UTC (permalink / raw) To: matt; +Cc: linuxppc-dev list, Paul Mackerras On Jul 18, 2006, at 7:48 AM, Matt Sealey wrote: > > Once upon a time we were all told this wouldn't work for some reason, > but a lot of documentation now hints that it does actually work and > for instance there is a RAID5/6 driver (for G5) which uses AltiVec > in a kernel context. Using Altivec generally in the kernel is still something that is not recommended. The key to using it is in disabling preemption, this ensures that when the code is done the Altivec register state is back to how the kernel found it. preempt_disable(); enable_kernel_altivec(); raid6_altivec$#_gen_syndrome_real(disks, bytes, ptrs); preempt_enable(); > But I didn't find any definitive documentation on how one goes about > it. The largest clue I found was in Documentation/cpu_features.txt: > > #ifdef CONFIG_ALTIVEC > BEGIN_FTR_SECTION > mfspr r22,SPRN_VRSAVE /* if G4, save vrsave register value */ > stw r22,THREAD_VRSAVE(r23) > END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) > #endif /* CONFIG_ALTIVEC */ > > So we can use AltiVec by implementing this kind of wrapper around > kernel functions which may use AltiVec? > > In the code above is there ANY significance of r22 and r23 other > than that they are fairly high up and probably marked as "will > be trashed" by all the relevant ABIs and so? I'd guess those were the registers used by the code this was snipped from. > Just curious, as I would like to investigate writing some docs at > least on this (in article fashion) to go with PPCZone, Libfreevec > and so on. I think there is a problem here in that simply developers > who may be interested in doing this kind of optimized code do not > know where to start (and we are thinking from a point of view of > also teaching sessions too, like we did at FTF Frankfurt 2004, so > after we teach them what AltiVec is etc. we demonstrate application > AND kernel functionality and the quirks associated with it). I'm pretty sure Paul looked into using AltiVec for memory operations in the kernel and didn't see a significant benefit to it. - kumar ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: AltiVec in the kernel 2006-07-18 13:53 ` Kumar Gala @ 2006-07-18 15:10 ` Matt Sealey 2006-07-18 17:56 ` Paul Mackerras 2006-07-18 18:39 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 31+ messages in thread From: Matt Sealey @ 2006-07-18 15:10 UTC (permalink / raw) To: 'Kumar Gala' Cc: 'linuxppc-dev list', 'Paul Mackerras' > -----Original Message----- > From: Kumar Gala [mailto:galak@kernel.crashing.org] > Sent: Tuesday, July 18, 2006 8:53 AM > To: matt@genesi-usa.com > Cc: linuxppc-dev list; Paul Mackerras > Subject: Re: AltiVec in the kernel > > > On Jul 18, 2006, at 7:48 AM, Matt Sealey wrote: > > > for instance there is a RAID5/6 driver (for G5) which uses > > AltiVec in a kernel context. > > Using Altivec generally in the kernel is still something that > is not recommended. The key to using it is in disabling > preemption, this ensures that when the code is done the > Altivec register state is back to how the kernel found it. > > preempt_disable(); > enable_kernel_altivec(); > > raid6_altivec$#_gen_syndrome_real(disks, bytes, ptrs); > > preempt_enable(); Why isn't it recommended? For instance on FreeBSD and other operating systems they have designed the functionality in there as it would be a feature people would want to use. QNX uses AltiVec to perform the context switch and message passing and keep latency down. Restricting AltiVec to userspace code (applications..) really means you are barely ever using it. Kernel functions and drivers are called every second of every day.. it's about making AltiVec really used and not having the unit sit twiddling it's thumbs until you REALLY NEED TO DECODE A JPEG VERY FAST. There are thousands of things it could be doing. One example could be.. in-kernel compression and encryption subroutines. > > teach them what AltiVec is etc. we demonstrate application > > AND kernel functionality and the quirks associated with it). > > I'm pretty sure Paul looked into using AltiVec for memory > operations in the kernel and didn't see a significant benefit to it. We had our own guy look at it and he presented some significant performance improvements. One problem was, though, that the best improvement in theory came from a function which needed to be called very early in kernel boot, well before AltiVec was enabled, and everything else is marginal at best (1.n times improvement, but it is still 0.n more than 1.0). I am not clear on this and cannot find my discussion on the subject in my logs and email backups, so. I will leave it for now. There is also plenty of example code (libmotovec, Freescale Application Notes) which improve things like TCP checksumming and so on using AltiVec. These patches are even used in EEMBC benchmarks to boost the scores. There is also plenty of examples of userspace code (as before, checksumming, encryption, compression/decompression) which has been improved. libfreevec includes some changes to the zlib window functions. For example the kernel includes an MD5, SHA, zlib compression framework.. mostly ported userspace code and standard libraries. Would these not be candidates? The development and speed improvements are even capable of being tested in userspace (and this is a GREAT teaching aid also; show how to improve some userspace app. Then show the differences it needed to go into the kernel. Benchmark both. Detail result.) I think there are thousands of places where AltiVec could be used - even sparingly - to provide good performance improvements. >From your reply I suspect that these would be places which do not rely on the effects preemption has on performance (i.e. you trade preemption for AltiVec and gain). I don't think people investigate it too much because the first thing they hit is lack of documentation, and then "well we don't really recommend it". I think this makes Linux the worst OS a developer would want to run on a G4 and G5, then? :D -- Matt Sealey <matt@genesi-usa.com> Manager, Genesi, Developer Relations ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: AltiVec in the kernel 2006-07-18 15:10 ` Matt Sealey @ 2006-07-18 17:56 ` Paul Mackerras 2006-07-19 18:10 ` Linas Vepstas 2006-07-18 18:39 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 31+ messages in thread From: Paul Mackerras @ 2006-07-18 17:56 UTC (permalink / raw) To: matt; +Cc: 'linuxppc-dev list' Matt Sealey writes: > Why isn't it recommended? Because the overhead of saving away the user altivec state and restoring it can easily overwhelm any advantage you get from using altivec. > We had our own guy look at it and he presented some significant > performance improvements. One problem was, though, that the best > improvement in theory came from a function which needed to be > called very early in kernel boot, well before AltiVec was > enabled, and everything else is marginal at best (1.n times > improvement, but it is still 0.n more than 1.0). I am not clear > on this and cannot find my discussion on the subject in my logs > and email backups, so. I will leave it for now. I tried using altivec for memory copies, and while I was able to show an improvement in speed of copying stuff that was hot in the cache, there was no overall improvement in the context of everything else the kernel does. In other words, the things being copied were generally not hot in the cache, and the CPU was able to saturate the memory bandwidth using ordinary loads and stores. > There is also plenty of example code (libmotovec, Freescale > Application Notes) which improve things like TCP checksumming > and so on using AltiVec. These patches are even used in EEMBC > benchmarks to boost the scores. TCP checksumming is simple enough that it is limited by memory bandwidth rather than computation speed. This is another example where you can show an improvement on a microbenchmark because the data is hot in the cache, but the improvement doesn't translate into any real improvement in a real application. > There is also plenty of examples of userspace code (as before, > checksumming, encryption, compression/decompression) which has > been improved. libfreevec includes some changes to the zlib > window functions. For example the kernel includes an MD5, SHA, > zlib compression framework.. mostly ported userspace code and > standard libraries. Would these not be candidates? A lot of compression and encryption algorithms, by their very nature, are very difficult to parallelize enough to get any significant improvement from altivec. I looked at SHA1 for instance, and the sequential dependencies in the computation are such that it is practically impossible to find a way to do 4 things in parallel. The sequential dependencies are of course a critical part of the way that SHA1 ensures that a small change in any part of the input data results in substantial changes in every byte of the output. > I think there are thousands of places where AltiVec could be > used - even sparingly - to provide good performance improvements. I think that there are actually very few places in the kernel where we are doing something which is parallelizable, sufficiently compute-intensive, and not bound by memory bandwidth, to be worth using altivec. Paul. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-18 17:56 ` Paul Mackerras @ 2006-07-19 18:10 ` Linas Vepstas 2006-07-19 18:19 ` Paul Mackerras 2006-07-20 12:31 ` Matt Sealey 0 siblings, 2 replies; 31+ messages in thread From: Linas Vepstas @ 2006-07-19 18:10 UTC (permalink / raw) To: Paul Mackerras; +Cc: 'linuxppc-dev list' On Wed, Jul 19, 2006 at 03:56:10AM +1000, Paul Mackerras wrote: > A lot of compression and encryption algorithms, by their very nature, > are very difficult to parallelize enough to get any significant > improvement from altivec. I looked at SHA1 for instance, and the > sequential dependencies in the computation are such that it is > practically impossible to find a way to do 4 things in parallel. The > sequential dependencies are of course a critical part of the way that > SHA1 ensures that a small change in any part of the input data results > in substantial changes in every byte of the output. But perhaps, in principle, couldn't one run four independent streams in parallel? Thus, for example, on an SSL-enabled web server, one could service multiple encryption/decryption threads at once. In practice, I don't beleive the infrastructure for that kind of parallelism is in place. I'm struggling to find a reason to develop that kind of infrastructure. Mumble something about Cell. > I think that there are actually very few places in the kernel where we > are doing something which is parallelizable, sufficiently > compute-intensive, and not bound by memory bandwidth, to be worth > using altivec. Yes. As to non-kernel applications, is there anything for GMP (the Gnu Multi-Precision library, an arbitrary-precision math library) on the Altivec? How aout the Cell? --linas ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-19 18:10 ` Linas Vepstas @ 2006-07-19 18:19 ` Paul Mackerras 2006-07-19 18:38 ` Johannes Berg 2006-07-20 12:31 ` Matt Sealey 1 sibling, 1 reply; 31+ messages in thread From: Paul Mackerras @ 2006-07-19 18:19 UTC (permalink / raw) To: Linas Vepstas; +Cc: 'linuxppc-dev list' Linas Vepstas writes: > But perhaps, in principle, couldn't one run four independent streams > in parallel? Thus, for example, on an SSL-enabled web server, one > could service multiple encryption/decryption threads at once. Generally that would work. If one had 4 separate streams to compute a SHA1 of, one could do all 4 at once with altivec. It would have to be 4 separate streams though, not 4 parts of a single stream. > As to non-kernel applications, is there anything for GMP (the > Gnu Multi-Precision library, an arbitrary-precision math library) > on the Altivec? How aout the Cell? I don't really know, sorry. Paul. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-19 18:19 ` Paul Mackerras @ 2006-07-19 18:38 ` Johannes Berg 2006-07-19 18:57 ` Linas Vepstas 0 siblings, 1 reply; 31+ messages in thread From: Johannes Berg @ 2006-07-19 18:38 UTC (permalink / raw) To: Paul Mackerras; +Cc: 'linuxppc-dev list' [-- Attachment #1: Type: text/plain, Size: 749 bytes --] On Thu, 2006-07-20 at 04:19 +1000, Paul Mackerras wrote: > Linas Vepstas writes: > > > But perhaps, in principle, couldn't one run four independent streams > > in parallel? Thus, for example, on an SSL-enabled web server, one > > could service multiple encryption/decryption threads at once. > > Generally that would work. If one had 4 separate streams to compute a > SHA1 of, one could do all 4 at once with altivec. It would have to be > 4 separate streams though, not 4 parts of a single stream. I'd think it'd be pretty hard to get a real benefit from this because the data is going to come from 4 totally different places, hence you can't just load a single vector register and get data for all 4 streams... johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-19 18:38 ` Johannes Berg @ 2006-07-19 18:57 ` Linas Vepstas 0 siblings, 0 replies; 31+ messages in thread From: Linas Vepstas @ 2006-07-19 18:57 UTC (permalink / raw) To: Johannes Berg; +Cc: 'linuxppc-dev list', Paul Mackerras On Wed, Jul 19, 2006 at 08:38:21PM +0200, Johannes Berg wrote: > On Thu, 2006-07-20 at 04:19 +1000, Paul Mackerras wrote: > > Linas Vepstas writes: > > > > > But perhaps, in principle, couldn't one run four independent streams > > > in parallel? Thus, for example, on an SSL-enabled web server, one > > > could service multiple encryption/decryption threads at once. > > > > Generally that would work. If one had 4 separate streams to compute a > > SHA1 of, one could do all 4 at once with altivec. It would have to be > > 4 separate streams though, not 4 parts of a single stream. > > I'd think it'd be pretty hard to get a real benefit from this because > the data is going to come from 4 totally different places, hence you > can't just load a single vector register and get data for all 4 > streams... Dohh. Right. I actually thought that while writing the email, and then it eveporated from my head before I hit the send button. One would have to copy the incoming data into vectors, and the mem access latency would probably overwhelm the performance gain (per "hot cache" as Paul discussed). --linas ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: AltiVec in the kernel 2006-07-19 18:10 ` Linas Vepstas 2006-07-19 18:19 ` Paul Mackerras @ 2006-07-20 12:31 ` Matt Sealey 2006-07-20 13:23 ` Kumar Gala 2006-07-20 17:42 ` Linas Vepstas 1 sibling, 2 replies; 31+ messages in thread From: Matt Sealey @ 2006-07-20 12:31 UTC (permalink / raw) To: 'Linas Vepstas', 'Paul Mackerras' Cc: 'linuxppc-dev list' > But perhaps, in principle, couldn't one run four independent > streams in parallel? Thus, for example, on an SSL-enabled > web server, one could service multiple encryption/decryption > threads at once. > > In practice, I don't beleive the infrastructure for that kind > of parallelism is in place. I'm struggling to find a reason > to develop that kind of infrastructure. Mumble something about Cell. If not AltiVec there is potential to use some features which come with AltiVec like the data stream functionality. Or even the standard PPC cache control stuff would work. What's the case in the kernel for the memcpy functions etc., are they optimized for doing things like longword copies rather than byte-per-byte etc.? We found glibc sucked for that. -- Matt Sealey <matt@genesi-usa.com> Manager, Genesi, Developer Relations ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-20 12:31 ` Matt Sealey @ 2006-07-20 13:23 ` Kumar Gala 2006-07-20 13:33 ` Matt Sealey 2006-07-20 17:42 ` Linas Vepstas 1 sibling, 1 reply; 31+ messages in thread From: Kumar Gala @ 2006-07-20 13:23 UTC (permalink / raw) To: matt; +Cc: 'Paul Mackerras', 'linuxppc-dev list' On Jul 20, 2006, at 7:31 AM, Matt Sealey wrote: > > >> But perhaps, in principle, couldn't one run four independent >> streams in parallel? Thus, for example, on an SSL-enabled >> web server, one could service multiple encryption/decryption >> threads at once. >> >> In practice, I don't beleive the infrastructure for that kind >> of parallelism is in place. I'm struggling to find a reason >> to develop that kind of infrastructure. Mumble something about Cell. > > If not AltiVec there is potential to use some features which come > with AltiVec like the data stream functionality. Or even the standard > PPC cache control stuff would work. > > What's the case in the kernel for the memcpy functions etc., are > they optimized for doing things like longword copies rather than > byte-per-byte etc.? We found glibc sucked for that. Matt, can I ask what exactly you are trying to accomplish? There is a lot of work put into the kernel to ensure things are optimized. I'd say far more so than gets put into user space. - k ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: AltiVec in the kernel 2006-07-20 13:23 ` Kumar Gala @ 2006-07-20 13:33 ` Matt Sealey 0 siblings, 0 replies; 31+ messages in thread From: Matt Sealey @ 2006-07-20 13:33 UTC (permalink / raw) To: 'Kumar Gala' Cc: 'Paul Mackerras', 'linuxppc-dev list' > Matt, can I ask what exactly you are trying to accomplish? There is > a lot of work put into the kernel to ensure things are optimized. > I'd say far more so than gets put into user space. Just trying to find out what the general state of play is. -- Matt Sealey <matt@genesi-usa.com> Manager, Genesi, Developer Relations ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-20 12:31 ` Matt Sealey 2006-07-20 13:23 ` Kumar Gala @ 2006-07-20 17:42 ` Linas Vepstas 2006-07-20 18:47 ` Brian D. Carlstrom 1 sibling, 1 reply; 31+ messages in thread From: Linas Vepstas @ 2006-07-20 17:42 UTC (permalink / raw) To: Matt Sealey; +Cc: 'linuxppc-dev list', 'Paul Mackerras' On Thu, Jul 20, 2006 at 07:31:32AM -0500, Matt Sealey wrote: > > What's the case in the kernel for the memcpy functions etc., are > they optimized for doing things like longword copies rather than > byte-per-byte etc.? arch/powerpc/lib/copy_32.S arch/powerpc/lib/memcpy_64.S Looks pretty darned optimized to me. > We found glibc sucked for that. Only because someone was asleep at the wheel, or there was a bug. When glibc gets ported to a new architecture, one of the earliest tasks is to create optimized versions of memcpy and the like. Presumably, on powerpc, this would have been done more than a decade ago; its hard for me to imagine that there'd be a problem there. Now, I haven't looked at the code, but I just can't imagine how this would not have been found and fixed by now. Is there really a problem wiht glibc performance on powerpc? I mean, this is a pretty serious accusation, and something that should be fixed asap. --linas ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-20 17:42 ` Linas Vepstas @ 2006-07-20 18:47 ` Brian D. Carlstrom 2006-07-20 19:05 ` Olof Johansson 0 siblings, 1 reply; 31+ messages in thread From: Brian D. Carlstrom @ 2006-07-20 18:47 UTC (permalink / raw) To: Linas Vepstas; +Cc: 'Paul Mackerras', 'linuxppc-dev list' At Thu, 20 Jul 2006 12:42:55 -0500, Linas Vepstas wrote: > > We found glibc sucked for that. > > Only because someone was asleep at the wheel, or there was a bug. > > When glibc gets ported to a new architecture, one of the earliest > tasks is to create optimized versions of memcpy and the like. > Presumably, on powerpc, this would have been done more than a > decade ago; its hard for me to imagine that there'd be a problem > there. Now, I haven't looked at the code, but I just can't imagine > how this would not have been found and fixed by now. Is there > really a problem wiht glibc performance on powerpc? I mean, > this is a pretty serious accusation, and something that should > be fixed asap. In the course of my work, I use powerpc architecture simulators. When working on Mac OS X with a G5, I had to implement some of the basic AltiVec specifically for use by their libc memcpy implementation. A quick grep memcpy in the recent glibc sources on my linux/ppc box seems to show no where near that level of optimization, but I admit that I could have missed something. However, I would not be surprised that glibc avoided AltiVec specific optimizations since it would add to the complexity of supporting various architectures with one binary. On Mac OS X, libc actually delegated a small number of libc calls such as memcpy via a kernel managed page at the end of the address space that setup which routines to use based on currently running architecture. -bri ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-20 18:47 ` Brian D. Carlstrom @ 2006-07-20 19:05 ` Olof Johansson 2006-07-20 21:56 ` Brian D. Carlstrom 0 siblings, 1 reply; 31+ messages in thread From: Olof Johansson @ 2006-07-20 19:05 UTC (permalink / raw) To: Brian D. Carlstrom; +Cc: 'Paul Mackerras', 'linuxppc-dev list' On Thu, Jul 20, 2006 at 11:47:04AM -0700, Brian D. Carlstrom wrote: > A quick grep memcpy in the recent glibc sources on my linux/ppc box > seems to show no where near that level of optimization, but I admit > that I could have missed something. http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html -Olof ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-20 19:05 ` Olof Johansson @ 2006-07-20 21:56 ` Brian D. Carlstrom 2006-07-20 22:39 ` Daniel Ostrow ` (3 more replies) 0 siblings, 4 replies; 31+ messages in thread From: Brian D. Carlstrom @ 2006-07-20 21:56 UTC (permalink / raw) To: Olof Johansson; +Cc: 'Paul Mackerras', 'linuxppc-dev list' At Thu, 20 Jul 2006 14:05:23 -0500, Olof Johansson wrote: > On Thu, Jul 20, 2006 at 11:47:04AM -0700, Brian D. Carlstrom wrote: > > A quick grep memcpy in the recent glibc sources on my linux/ppc box > > seems to show no where near that level of optimization, but I admit > > that I could have missed something. > > http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html Very interesting. According to that page, the memcpy optimizations seem to be using 64-bit operations and that 128-bit AltiVec operations are still being solicited. I was encouraged to see the following: If you need to build generic distributions (supporting several <cpu_types>) you can leverage the dl_procinfo support built into glibc. This mechanism allows for multiple versions of the core libraries (libc, libm, librt, libpthread, libpthread_db) to be stored in hardware/platform specific subdirectories under /lib[64]. However, I'm guessing this addon is not something found in common distributions for PowerPC like Debian, Fedora, Gentoo, Ubuntu, ... -bri ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-20 21:56 ` Brian D. Carlstrom @ 2006-07-20 22:39 ` Daniel Ostrow 2006-07-21 6:35 ` Olof Johansson ` (2 subsequent siblings) 3 siblings, 0 replies; 31+ messages in thread From: Daniel Ostrow @ 2006-07-20 22:39 UTC (permalink / raw) To: Brian D. Carlstrom Cc: Olof Johansson, 'linuxppc-dev list', 'Paul Mackerras' On Thu, 2006-07-20 at 14:56 -0700, Brian D. Carlstrom wrote: > At Thu, 20 Jul 2006 14:05:23 -0500, Olof Johansson wrote: > > On Thu, Jul 20, 2006 at 11:47:04AM -0700, Brian D. Carlstrom wrote: > > > A quick grep memcpy in the recent glibc sources on my linux/ppc box > > > seems to show no where near that level of optimization, but I admit > > > that I could have missed something. > > > > http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html > > Very interesting. According to that page, the memcpy optimizations seem > to be using 64-bit operations and that 128-bit AltiVec operations are > still being solicited. > > I was encouraged to see the following: > > If you need to build generic distributions (supporting several > <cpu_types>) you can leverage the dl_procinfo support built into > glibc. This mechanism allows for multiple versions of the core > libraries (libc, libm, librt, libpthread, libpthread_db) to be > stored in hardware/platform specific subdirectories under /lib[64]. > > However, I'm guessing this addon is not something found in common > distributions for PowerPC like Debian, Fedora, Gentoo, Ubuntu, ... It has been part of Gentoo's glibc since 2.4 came out. --Dan ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-20 21:56 ` Brian D. Carlstrom 2006-07-20 22:39 ` Daniel Ostrow @ 2006-07-21 6:35 ` Olof Johansson 2006-07-21 14:42 ` Matt Sealey 2006-07-21 22:21 ` Peter Bergner 3 siblings, 0 replies; 31+ messages in thread From: Olof Johansson @ 2006-07-21 6:35 UTC (permalink / raw) To: Brian D. Carlstrom; +Cc: 'linuxppc-dev list', 'Paul Mackerras' On Thu, Jul 20, 2006 at 02:56:33PM -0700, Brian D. Carlstrom wrote: > However, I'm guessing this addon is not something found in common > distributions for PowerPC like Debian, Fedora, Gentoo, Ubuntu, ... There's always lead time to get things into distros, that would still be true if you modified glibc instead as well. -Olof ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: AltiVec in the kernel 2006-07-20 21:56 ` Brian D. Carlstrom 2006-07-20 22:39 ` Daniel Ostrow 2006-07-21 6:35 ` Olof Johansson @ 2006-07-21 14:42 ` Matt Sealey 2006-07-21 16:51 ` Linas Vepstas 2006-07-21 22:21 ` Peter Bergner 3 siblings, 1 reply; 31+ messages in thread From: Matt Sealey @ 2006-07-21 14:42 UTC (permalink / raw) To: 'Brian D. Carlstrom', 'Olof Johansson' Cc: 'linuxppc-dev list', 'Paul Mackerras' > -----Original Message----- > From: linuxppc-dev-bounces+matt=genesi-usa.com@ozlabs.org > [mailto:linuxppc-dev-bounces+matt=genesi-usa.com@ozlabs.org] > On Behalf Of Brian D. Carlstrom > Sent: Thursday, July 20, 2006 4:57 PM > To: Olof Johansson > Cc: 'Paul Mackerras'; 'linuxppc-dev list' > Subject: Re: AltiVec in the kernel > > At Thu, 20 Jul 2006 14:05:23 -0500, Olof Johansson wrote: > > On Thu, Jul 20, 2006 at 11:47:04AM -0700, Brian D. Carlstrom wrote: > > > A quick grep memcpy in the recent glibc sources on my > linux/ppc box > > > seems to show no where near that level of optimization, > but I admit > > > that I could have missed something. > > > > http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html > > Very interesting. According to that page, the memcpy > optimizations seem to be using 64-bit operations and that > 128-bit AltiVec operations are still being solicited. "Still"? http://www.freevec.org/ Been there for months, before the glibc thing. Most of the functions are ready. Anyone can bugfix this. The beauty of GPL. The ugly part is.. we've had this there for months. Nobody has contributed a single update or bugfix or even a performance test as far as I know. > However, I'm guessing this addon is not something found in > common distributions for PowerPC like Debian, Fedora, Gentoo, > Ubuntu, ... Indeed it's a cute feature but we were scared away by the glibc guys when it came to glibc-ports (perhaps they just considered it not ready, but we wanted it in there for the first release, which was the next one). Hence freevec. Konstantinos will get back in a couple weeks and post some updates. The more interesting code is the MySQL stuff. All of this has been developed by finding good examples of apps, profiling them and then optimizing the top few functions that are most used. -- Matt Sealey <matt@genesi-usa.com> Manager, Genesi, Developer Relations ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-21 14:42 ` Matt Sealey @ 2006-07-21 16:51 ` Linas Vepstas 2006-07-21 18:08 ` Matt Sealey ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Linas Vepstas @ 2006-07-21 16:51 UTC (permalink / raw) To: Matt Sealey Cc: 'Olof Johansson', 'linuxppc-dev list', 'Paul Mackerras' On Fri, Jul 21, 2006 at 09:42:32AM -0500, Matt Sealey wrote: > > > > http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html > > > > 128-bit AltiVec operations are still being solicited. > > "Still"? > > http://www.freevec.org/ > > Been there for months, before the glibc thing. Most of the functions > are ready. Anyone can bugfix this. The beauty of GPL. The ugly part > is.. we've had this there for months. Nobody has contributed a single > update or bugfix or even a performance test as far as I know. Sounds like a problem of advertising and communications. This is kind of "under the radar" for most users and developers. It needs to work out-of-the-box, most people, even those with interest in performance, will not even be aware of the possibility to tne this. It should be folded into glibc. It is up to the altivec product vendor to nag the glibc folks into folding it in. This task could be as hard as writing the code in the first place. > Indeed it's a cute feature but we were scared away by the glibc guys Many maintainers of core libraries have similar behaviour patterns. Besides glibc, gcc and gsl come to mind. This is becase they get tired out by naive eager-beavers who walk in with the greatest idea in the world, make a big fuss about it, and the proceed to demonstrate that they have absolutely no clue of what they're talking about. For every ten of those, there's maybe one legit idea. Worse, many of these "clueless newbies" come in the surprising shape of PhD's working outside thier specialty, and can convingingly sling jargon and authority for a while before its realized they're just... clueless. If you've got good code, you'll just need to be persistent. --linas ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: AltiVec in the kernel 2006-07-21 16:51 ` Linas Vepstas @ 2006-07-21 18:08 ` Matt Sealey 2006-07-22 3:09 ` Segher Boessenkool 2006-07-21 18:46 ` Brian D. Carlstrom 2006-07-21 21:30 ` Hollis Blanchard 2 siblings, 1 reply; 31+ messages in thread From: Matt Sealey @ 2006-07-21 18:08 UTC (permalink / raw) To: 'Linas Vepstas' Cc: 'Olof Johansson', 'linuxppc-dev list', 'Paul Mackerras' > Sounds like a problem of advertising and communications. > This is kind of "under the radar" for most users and > developers. It needs to work out-of-the-box, most people, > even those with interest in performance, will not even be > aware of the possibility to tne this. It's listed on every site we have, and on PenguinPPC.org too if I recall (hi Hollis) it even got a sticky news item like a lot of the stuff we do (thanks Hollis :). Everyone who cares knows about it, I would think. Probably not enough people care, is the problem. > It should be folded into glibc. It is up to the altivec > product vendor to nag the glibc folks into folding it in. You mean Freescale? Or Genesi? Freevec was being developed as a "perfect opportunity". glibc-ports came to life and was something that code could be contributed to. Since it was such a hassle dealing with the glibc guys, it ended up being a seperate library for now. > This task could be as hard as writing the code in the first place. I think we could handle it if there were less stubborn mules maintaining the most important software. I can think of one guy in particular.. but I won't name him. > Many maintainers of core libraries have similar behaviour patterns. > Besides glibc, gcc and gsl come to mind. This is becase they > get tired out by naive eager-beavers who walk in with the > greatest idea in the world I think this kind of behaviour stalls Open Source software, because it unfairly treats those *with* clues. <-us-> do you want the AltiVec code or not? <them> Oh no because I am bored of dealing with people who only had ideas!! It doesn't make much sense politically or technically. So like I said we could have had this code in glibc when glibc-ports first was conceptualised and then released, but there was just too many mules in the way. Check the freevec.org whitepapers section), Konstantinos is not just "ideas", he proved out optimizations and then implemented them. Is it his fault that they're not in glibc, because he's "stupid" or "clueless"? :D > If you've got good code, you'll just need to be persistent. Personally I am pretty tired (in return) with angry-faced Open Source developers deciding that "Open Source" is equivalent to "My Source, Back Off, Your Patch Sucks". It is always the choice of the lead developer (and/or copyright holder) to refuse patches, but.. seriously.. a lot of Open Source development is the wrong kind of dictatorship. Cynicism aside.. :D </rant> -- Matt Sealey <matt@genesi-usa.com> Manager, Genesi, Developer Relations ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-21 18:08 ` Matt Sealey @ 2006-07-22 3:09 ` Segher Boessenkool 2006-07-23 13:28 ` Matt Sealey 0 siblings, 1 reply; 31+ messages in thread From: Segher Boessenkool @ 2006-07-22 3:09 UTC (permalink / raw) To: matt Cc: 'Olof Johansson', 'linuxppc-dev list', 'Paul Mackerras' > Freevec was being developed as a "perfect opportunity". glibc-ports > came to life and was something that code could be contributed to. > Since it was such a hassle dealing with the glibc guys, it ended up > being a seperate library for now. Do you have a pointer to an archive of that email thread? I can't remember it. You could give Freevec a whole lot more exposure, to people who might be more interested in it than the average glibc user, by putting it into uClibc first. Additional advantage is that you don't have to care about forward/backward compatibility issues, or even whether the platform a binary ends up running on actually has AltiVec or not (uClibc gets tailored to the exact system it runs on at compile time). So you can focus on the routines you want to speed up instead of on all the infrastructure stuff required for glibc. You'll have to update uClibc's PowerPC port first though (mostly just copying stuff from recent glibc) -- it seems the libc AltiVec support (for handling setjmp() etc.) isn't in there yet. >> This task could be as hard as writing the code in the first place. Not as hard. Way, way harder instead. Part of that is that the code probably really isn't good enough yet, sorry. And then there's all the compatibility stuff, and symbol versioning, etc. And the communication issue, of course. Segher ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: AltiVec in the kernel 2006-07-22 3:09 ` Segher Boessenkool @ 2006-07-23 13:28 ` Matt Sealey 2006-07-23 21:37 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 31+ messages in thread From: Matt Sealey @ 2006-07-23 13:28 UTC (permalink / raw) To: 'Segher Boessenkool' Cc: 'Olof Johansson', 'linuxppc-dev list', 'Paul Mackerras' > You could give Freevec a whole lot more exposure, to people > who might be more interested in it than the average glibc > user, by putting it into uClibc first. [snip] > You'll have to update uClibc's PowerPC port first though > (mostly just copying stuff from recent glibc) -- it seems the > libc AltiVec support (for handling setjmp() etc.) isn't in there yet. I remember a discussion from one of the Gentoo guys wanting to do this with libfreevec. Getting into Gentoo, though, is not difficult. The problem with this is Gentoo is one Linux distribution. I would be more impressed if code was in Debian or Ubuntu considering their exhausting lead times on producing new package trees and accepting new code :D -- Matt Sealey <matt@genesi-usa.com> Manager, Genesi, Developer Relations ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: AltiVec in the kernel 2006-07-23 13:28 ` Matt Sealey @ 2006-07-23 21:37 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 31+ messages in thread From: Benjamin Herrenschmidt @ 2006-07-23 21:37 UTC (permalink / raw) To: matt Cc: 'Olof Johansson', 'linuxppc-dev list', 'Paul Mackerras' > I remember a discussion from one of the Gentoo guys wanting to do this > with libfreevec. > > Getting into Gentoo, though, is not difficult. The problem with this > is Gentoo is one Linux distribution. I would be more impressed if code > was in Debian or Ubuntu considering their exhausting lead times on > producing new package trees and accepting new code :D It seems to me that the "problem" just doesn't exist at the moment... libfreevec is nice, but it's unfinished, and the author is away for now and thus not able to complete nor work on a port to glibc or others. Once he's back, of course, it would be nice to have him complete the work (and maybe get some outside help). I'd like to also verify his methodology for measuring the performance improvements, I'm not saying it's wrong, I want to make sure some of the overhead of enabling altivec has been properly measured for various usage patterns and thus possibly restrict the optimisations to patterns where that matter, as an example, only use altivec for large memcpy's. Once that's done, I don't see any good reason why it would be so hard to include that work into glibc, or rather into the powerpc add-ons in a first step and maybe then the whole into glibc. Maintainers rarely rejects things just for the sake of doing so. If they do so, they usually provide reasons, often boiling to implementation details, than can then be fixed. Note also that in the case of submitting code to glibc, there is a copyright assignment issue to be sorted out I think (I don't know the details here). I have the feeling that there is very little point to this thread. Let's wait for Konstantinos to be back and submit his work, possibly to this list at first for review, tests, etc... and then to the appropriate maintainers. If there is a problem at that point, then we'll see how we can address it. Regards, Ben. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-21 16:51 ` Linas Vepstas 2006-07-21 18:08 ` Matt Sealey @ 2006-07-21 18:46 ` Brian D. Carlstrom 2006-07-21 21:30 ` Hollis Blanchard 2 siblings, 0 replies; 31+ messages in thread From: Brian D. Carlstrom @ 2006-07-21 18:46 UTC (permalink / raw) To: Linas Vepstas Cc: 'Olof Johansson', 'Paul Mackerras', 'linuxppc-dev list' At Fri, 21 Jul 2006 11:51:30 -0500, Linas Vepstas wrote: > If you've got good code, you'll just need to be persistent. While I agree with most of Matt's rant, I think Linas is right as well. Hearing that code is already in a distribution like Gentoo makes it easier to make the case that the code doesn't suck or is vaporware. -bri disclaimer: a PhD student working outside my specialty :) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-21 16:51 ` Linas Vepstas 2006-07-21 18:08 ` Matt Sealey 2006-07-21 18:46 ` Brian D. Carlstrom @ 2006-07-21 21:30 ` Hollis Blanchard 2 siblings, 0 replies; 31+ messages in thread From: Hollis Blanchard @ 2006-07-21 21:30 UTC (permalink / raw) To: Linas Vepstas, Matt Sealey Cc: 'Olof Johansson', 'linuxppc-dev list', 'Paul Mackerras', Konstantinos Margaritis On Fri, 21 Jul 2006 11:51:30 -0500, "Linas Vepstas" <linas@austin.ibm.com> said: > On Fri, Jul 21, 2006 at 09:42:32AM -0500, Matt Sealey wrote: > > http://www.freevec.org/ > > > > Been there for months, before the glibc thing. Most of the functions > > are ready. Anyone can bugfix this. The beauty of GPL. The ugly part > > is.. we've had this there for months. Nobody has contributed a single > > update or bugfix or even a performance test as far as I know. > > Sounds like a problem of advertising and communications. This is > kind of "under the radar" for most users and developers. It needs to > work out-of-the-box, most people, even those with interest in > performance, will not even be aware of the possibility to tne this. It is difficult to make sure every OSS developer is notified of all work they may be interested in... However, I have noticed a trend where Genesi people seem to think everybody pays attention to their websites (and the same could be said for Debian and other subcultures). In this case there actually have been other people aware of this project, but not very many. Considering all the traffic about it on ppczone.org, people looking for exposure for their project may want to look beyond PPCZone. > It should be folded into glibc. It is up to the altivec product vendor > to nag the glibc folks into folding it in. This task could be as hard > as writing the code in the first place. Konstantinos is aware of Steve's glibc project and has indicated he'll try to contribute to it. To be fair, probably not many people have heard of Steve's project either. I doubt Konstantinos would have heard of it if I hadn't mentioned it. -Hollis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-20 21:56 ` Brian D. Carlstrom ` (2 preceding siblings ...) 2006-07-21 14:42 ` Matt Sealey @ 2006-07-21 22:21 ` Peter Bergner 3 siblings, 0 replies; 31+ messages in thread From: Peter Bergner @ 2006-07-21 22:21 UTC (permalink / raw) To: Brian D. Carlstrom Cc: Olof Johansson, 'linuxppc-dev list', 'Paul Mackerras', Steve Munroe On Thu, 2006-07-20 at 14:56 -0700, Brian D. Carlstrom wrote: > At Thu, 20 Jul 2006 14:05:23 -0500, Olof Johansson wrote: > > http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html > > Very interesting. According to that page, the memcpy optimizations seem > to be using 64-bit operations and that 128-bit AltiVec operations are > still being solicited. > > I was encouraged to see the following: > > If you need to build generic distributions (supporting several > <cpu_types>) you can leverage the dl_procinfo support built into > glibc. This mechanism allows for multiple versions of the core > libraries (libc, libm, librt, libpthread, libpthread_db) to be > stored in hardware/platform specific subdirectories under /lib[64]. Actually, this support is not limited to the core glibc routines or the system lib directors /lib/ and /usr/lib/. This works just as well for third party shipped libraries in their own library trees as the following example (on a power5 box) shows: bergner@vervainp1:~/cpu-tuned-libs> pwd /home/bergner/cpu-tuned-libs bergner@vervainp1:~/cpu-tuned-libs> ls lib/ lib/power5/ lib/: libfoo.so power5/ lib/power5/: libfoo.so bergner@vervainp1:~/cpu-tuned-libs> gcc -L/home/bergner/cpu-tuned-libs/lib -R/home/bergner/cpu-tuned-libs/lib main.c -lfoo bergner@vervainp1:~/cpu-tuned-libs> ldd a.out linux-vdso32.so.1 => (0x00100000) libfoo.so => /home/bergner/cpu-tuned-libs/lib/power5/libfoo.so (0x0ffde000) libc.so.6 => /lib/power5/libc.so.6 (0x0fe69000) /lib/ld.so.1 (0xf7fe1000) bergner@vervainp1:~/cpu-tuned-libs> ./a.out Loaded the optimzed lib bergner@vervainp1:~/cpu-tuned-libs> rm lib/power5/libfoo.so bergner@vervainp1:~/cpu-tuned-libs> ldd a.out linux-vdso32.so.1 => (0x00100000) libfoo.so => /home/bergner/cpu-tuned-libs/lib/libfoo.so (0x0ffde000) libc.so.6 => /lib/power5/libc.so.6 (0x0fe69000) /lib/ld.so.1 (0xf7fe1000) bergner@vervainp1:~/cpu-tuned-libs> ./a.out Loaded the unoptimzed lib The runtime loader magic uses the AT_PLATFORM string value as the subdirectory to search in under the .../lib/ or .../lib64/ library directory. To find out what your AT_PLATFORM value is on your current box, you can do: bergner@vervainp1:~/cpu-tuned-libs> LD_SHOW_AUXV=1 /bin/true AT_DCACHEBSIZE: 0x80 AT_ICACHEBSIZE: 0x80 AT_UCACHEBSIZE: 0x0 AT_SYSINFO_EHDR: 0x100000 AT_HWCAP: power5 mmu fpu ppc64 ppc32 AT_PAGESZ: 4096 AT_CLKTCK: 100 AT_PHDR: 0x10000034 AT_PHENT: 32 AT_PHNUM: 9 AT_BASE: 0xf7fe1000 AT_FLAGS: 0x0 AT_ENTRY: 0x10000980 AT_UID: 1001 AT_EUID: 1001 AT_GID: 100 AT_EGID: 100 AT_SECURE: 0 AT_PLATFORM: power5 > However, I'm guessing this addon is not something found in common > distributions for PowerPC like Debian, Fedora, Gentoo, Ubuntu, ... At last years GCC Developers Summit, one of the Ubuntu guys mentioned he was interested in adding it to Ubuntu. I haven't heard whether that has shown up yet though. It will be available in upcoming SUSE and Red Hat enterprise distros. I don't know about the others. As Olof mentioned, it can take some lead time for this to get picked up. There's also the question of how many and which processors a distro will ship cpu optimized libraries for. Given all of the PowerPC variants, they obviously can ship optimized libs for everything. Peter ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: AltiVec in the kernel 2006-07-18 15:10 ` Matt Sealey 2006-07-18 17:56 ` Paul Mackerras @ 2006-07-18 18:39 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 31+ messages in thread From: Benjamin Herrenschmidt @ 2006-07-18 18:39 UTC (permalink / raw) To: matt; +Cc: 'linuxppc-dev list', 'Paul Mackerras' > I don't think people investigate it too much because the first > thing they hit is lack of documentation, and then "well we don't > really recommend it". I think this makes Linux the worst OS a > developer would want to run on a G4 and G5, then? :D It's not recommended for the same reason the FPU isn't used in the kernel and x86 doesn't use SSE / MMX there neither except in a few places where it does make sense like the RAID code. It's possible that it might be interesting to do it for some of the crypto modules as well and we certainly welcome any patch using altivec to improve some other aspect of the kernel provided that it does indeed... improve performances :) Part of the problem is the cost of enabling/disabling it and saving/restoring the vector registers that get clobbered when using it. Essentially, the kernel entry only saves and restores GPRs. Not FPRs, not VRs. This is done to keep the cost of kernel entry low. Which means that at any given point in time, the altivec and FPU units contain whatever context last used by userland. If the kernel wants to use it for it's own, in thus needs to flush that context to the thread struct (which also means that the unit will be disabled on the way back to userland and re-faulted in when used again). That's what enable_kernel_altivec() does (and the similar enable_kernel_fp()). This cannot happen at interrupt time though and you shouldn't be holding locks thus it may be a problem with some of the crypto stuffs as I think they can be used in some weird code path. It's also important that no scheduling happen until you are done with the unit, which is why you have to disable preemption, since otherwise, the unit could be re-used by userland behind your back. Another alternative which can work at interrupt time, but requires a bit of assembly hackery, is to manually enable MSR:VEC (if not already set) and save and restore all the altivec registers modified by the code. Ben. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2006-07-18 12:48 AltiVec in the kernel Matt Sealey 2006-07-18 13:53 ` Kumar Gala @ 2006-07-18 17:43 ` Paul Mackerras 1 sibling, 0 replies; 31+ messages in thread From: Paul Mackerras @ 2006-07-18 17:43 UTC (permalink / raw) To: matt; +Cc: linuxppc-dev Matt Sealey writes: > Once upon a time we were all told this wouldn't work for some reason, > but a lot of documentation now hints that it does actually work and > for instance there is a RAID5/6 driver (for G5) which uses AltiVec > in a kernel context. It's possible, with some restrictions, basically the same restrictions on using floating point in the kernel. Kernel use of altivec interacts with the lazy altivec context switch that we do on UP kernels, and the fact that the kernel context switch doesn't save/restore the altivec state. That means that before using altivec in the kernel you may have to save away the altivec state, and you have to make sure you don't sleep or get preempted while using altivec. > But I didn't find any definitive documentation on how one goes about > it. The largest clue I found was in Documentation/cpu_features.txt: > > #ifdef CONFIG_ALTIVEC > BEGIN_FTR_SECTION > mfspr r22,SPRN_VRSAVE /* if G4, save vrsave register value */ > stw r22,THREAD_VRSAVE(r23) > END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) > #endif /* CONFIG_ALTIVEC */ > > So we can use AltiVec by implementing this kind of wrapper around > kernel functions which may use AltiVec? No, that's irrelevant; that just has to do with the VRSAVE register, not the altivec state. In fact VRSAVE isn't actually even part of the altivec state. > In the code above is there ANY significance of r22 and r23 other > than that they are fairly high up and probably marked as "will > be trashed" by all the relevant ABIs and so? I hope we do a bit better than "probably" ... :) No, there is no particular significance to the choice of r22 and r23. If you read the code you will see that those registers are saved at the beginning of the context switch routine and restored (from the new process's stack) at the end. Paul. ^ permalink raw reply [flat|nested] 31+ messages in thread
* AltiVec in the kernel @ 2009-12-11 11:45 Simon Richter 2009-12-11 15:49 ` Arnd Bergmann 0 siblings, 1 reply; 31+ messages in thread From: Simon Richter @ 2009-12-11 11:45 UTC (permalink / raw) To: linuxppc-dev Hi, since there has been a thread on allowing the use of a coprocessor in the kernel already: I am wondering if it'd make sense to use AltiVec for AES in dm-crypt, and how difficult it would be to implement that. I'm using a PegasosII which has a G4 running at 1 GHz; I get around 3 MB/s throughput when accessing harddisks. I think that could be improved. If I understand correctly, the actual encryption work runs in a kernel thread, which is scheduled normally, so it ought to be possible to enable AltiVec for that thread; am I missing something here? Simon ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2009-12-11 11:45 Simon Richter @ 2009-12-11 15:49 ` Arnd Bergmann 2009-12-16 22:11 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 31+ messages in thread From: Arnd Bergmann @ 2009-12-11 15:49 UTC (permalink / raw) To: linuxppc-dev; +Cc: Simon Richter, Sebastian Siewior On Friday 11 December 2009, Simon Richter wrote: > Hi, > > since there has been a thread on allowing the use of a coprocessor in > the kernel already: I am wondering if it'd make sense to use AltiVec for > AES in dm-crypt, and how difficult it would be to implement that. > > I'm using a PegasosII which has a G4 running at 1 GHz; I get around 3 > MB/s throughput when accessing harddisks. I think that could be > improved. > > If I understand correctly, the actual encryption work runs in a kernel > thread, which is scheduled normally, so it ought to be possible to > enable AltiVec for that thread; am I missing something here? Sebastian Siewior has implemented this some time ago: http://old.nabble.com/-RFC-0-3--Experiments-with-AES-AltiVec,-part-2-tc10034255.html You can try the old patches on your machine to see if they are any good there. Arnd <>< ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: AltiVec in the kernel 2009-12-11 15:49 ` Arnd Bergmann @ 2009-12-16 22:11 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 31+ messages in thread From: Sebastian Andrzej Siewior @ 2009-12-16 22:11 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Simon Richter, linuxppc-dev * Arnd Bergmann | 2009-12-11 16:49:25 [+0100]: >On Friday 11 December 2009, Simon Richter wrote: >> Hi, >> >> since there has been a thread on allowing the use of a coprocessor in >> the kernel already: I am wondering if it'd make sense to use AltiVec for >> AES in dm-crypt, and how difficult it would be to implement that. >> >> I'm using a PegasosII which has a G4 running at 1 GHz; I get around 3 >> MB/s throughput when accessing harddisks. I think that could be >> improved. >> >> If I understand correctly, the actual encryption work runs in a kernel >> thread, which is scheduled normally, so it ought to be possible to >> enable AltiVec for that thread; am I missing something here? dm-crypt is async these days so the patches Arnd mentioned could be used actually :) I've never tested them with dm-crypt but it should work. Back then I had around 20MiB/sec encryption and around 15 MiB/sec for decryption on 4KiB page on a PS3 [0]. This was pure testing, no subsystem was involved. dm-crypt will feed multiple 512 byte requests. And according [1] 512 bytes are aren't slow :) However [2] says that that AltiVec was always slower than the generic implementation. Maybe PS3's AltiVec unit was slower than the average one because everyone was focuesed on the SPUs. Maybe not and you get similar results. >Sebastian Siewior has implemented this some time ago: > >http://old.nabble.com/-RFC-0-3--Experiments-with-AES-AltiVec,-part-2-tc10034255.html > >You can try the old patches on your machine to see if they are any good >there. Ah you remember :) [0] http://diploma-thesis.siewior.net/html/diplomarbeitch4.html#x12-46002r2 [1] http://diploma-thesis.siewior.net/html/diplomarbeitch4.html#x12-46004r4 [2] http://diploma-thesis.siewior.net/html/diplomarbeitch4.html#x12-47017r5 > > Arnd <>< Sebastian ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2009-12-16 22:28 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-18 12:48 AltiVec in the kernel Matt Sealey 2006-07-18 13:53 ` Kumar Gala 2006-07-18 15:10 ` Matt Sealey 2006-07-18 17:56 ` Paul Mackerras 2006-07-19 18:10 ` Linas Vepstas 2006-07-19 18:19 ` Paul Mackerras 2006-07-19 18:38 ` Johannes Berg 2006-07-19 18:57 ` Linas Vepstas 2006-07-20 12:31 ` Matt Sealey 2006-07-20 13:23 ` Kumar Gala 2006-07-20 13:33 ` Matt Sealey 2006-07-20 17:42 ` Linas Vepstas 2006-07-20 18:47 ` Brian D. Carlstrom 2006-07-20 19:05 ` Olof Johansson 2006-07-20 21:56 ` Brian D. Carlstrom 2006-07-20 22:39 ` Daniel Ostrow 2006-07-21 6:35 ` Olof Johansson 2006-07-21 14:42 ` Matt Sealey 2006-07-21 16:51 ` Linas Vepstas 2006-07-21 18:08 ` Matt Sealey 2006-07-22 3:09 ` Segher Boessenkool 2006-07-23 13:28 ` Matt Sealey 2006-07-23 21:37 ` Benjamin Herrenschmidt 2006-07-21 18:46 ` Brian D. Carlstrom 2006-07-21 21:30 ` Hollis Blanchard 2006-07-21 22:21 ` Peter Bergner 2006-07-18 18:39 ` Benjamin Herrenschmidt 2006-07-18 17:43 ` Paul Mackerras -- strict thread matches above, loose matches on Subject: below -- 2009-12-11 11:45 Simon Richter 2009-12-11 15:49 ` Arnd Bergmann 2009-12-16 22:11 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).