From: Konstantinos Margaritis <markos@codex.gr>
To: rsa@us.ibm.com
Cc: linuxppc-dev@ozlabs.org
Subject: Re: libfreevec benchmarks
Date: Sun, 24 Aug 2008 11:03:56 +0300 [thread overview]
Message-ID: <200808241103.58485.markos@codex.gr> (raw)
In-Reply-To: <1219427047.7917.38.camel@localhost>
=CE=A3=CF=84=CE=B9=CF=82 Friday 22 August 2008 20:44:07 =CE=BF/=CE=B7 Ryan =
S. Arnold =CE=AD=CE=B3=CF=81=CE=B1=CF=88=CE=B5:
> Do you have FSF (Free Software Foundation) copyright assignment yet?
Copyright assignment is not the issue, if there was interest in the first=20
place, that would never had deterred me.
> How've you implemented the optimizations?
Scalar for small sizes, AltiVec for larger (>16 bytes, depending on the=20
routine).
> Optimizations for individual architectures should follow the powerpc-cpu
> precedent for providing these routines, e.g.
>
> sysdeps/powerpc/powerpc32/power6/memcpy.S
> sysdeps/powerpc/powerpc64/power6/memcpy.S
That's the idea I got, but so far I understood that only 64-bit PowerPC/POW=
ER=20
cpus are supported, what about 32-bit cpus? libfreevec isn't ported to 64-b=
it=20
yet (though I will finish that soon). Would it be enough to have one dir li=
ke=20
eg:
sysdeps/powerpc/powerpc32/altivec/
or would I have to refer to specific CPU models? eg 74xx? And use Implies f=
or=20
the rest?
> Today, if glibc is configure with --with-cpu=3D970 it will actually
> default to the power optimizations for the string routines, as indicated
> by the sysdeps/powerpc/powerpc[32|64]/970/Implies files. It'd be worth
> verifying that your baseline glibc runs are against existing optimized
> versions of glibc. If they're not then this is a fault of the distro
> you're testing on.
Well, I used Debian Lenny and OpenSuse 11.0 (using glibc 2.7 and glibc2.8=20
resp. If it doesn't work as supposed, these are two popular distros with a=
=20
broken glibc, which I would think it's not very likely.
> I'm not aware of the status of some of the embedded PowerPC processors
> with-regard to powerpc-cpu optimizations.
Would the G4 and 8610 fall under the "embedded" PowerPC category?
> Our research found that for some tasks on some PowerPC processors the
> expense of reserving the floating point pipeline for vector operations
> exceeds the benefit of using vector insns for the task.
Well, I would advise *strongly* against that, except for specific cases, no=
t=20
for OS-wide functions. For example, in a popular 3D application such as=20
Blender (or the Mesa 3D library), a lot of memory copying is done along wit=
h=20
lots of FPU math. If you use the FPU unit for plain memcpy/etc stuff, you=20
essentially forbid the app to use it for the important stuff, ie math, and =
in=20
the end you lose performance. On the other hand, the AltiVec unit remains=20
unused all the time, and it's certainly more capable and more generic than =
the=20
=46PU for most of the stuff -not to mention that inside the same app, the i=
ssue=20
of context switching becomes unimportant.=20
> Generally our optimizations tend to favor data an average of 12 bytes
> with 1000 byte max. We also favor aligned data and use the existing
> implementation as a model as a baseline for where we try to keep
> unaligned data performance from dropping below.
Please, check the graphs of most libfreevec functions for the sizes=20
12-1000bytes. Apart from strlen(), which is the only function that performs=
=20
better overall than libfreevec, most other functions offer the same=20
performance for sizes up to 48/96 bytes, but then performance increases=20
dramatically due to the use of the vector unit.
> This research would be a good candidate for selectively replacing some
> of the existing libm functionality. Do these results hold for all
> permutations of long double support? Do they hold for x86/x86_64 as
> well as PowerPC? I would suggest against a massive patch to libc-alpha
> and would instead recommend selective, individual replacement of
> fundamental routines to start with accompanied by exhaustive profile
> data. You have to show that you're dedicated to maintenance of these
> routines and you can't overwhelm the reviewers with massive patches.
=46or the moment, my focus is on 32-bit floats only, but the algorithm is t=
he=20
same for 64-bit/128-bit floating point numbers even. It will just use more=
=20
terms. And yes, as I said, it doesn't use AltiVec and is totally cross-
platform -just plain C- and very short code even. I tested the code on an=20
Athlon X2 again and I get even better performance than on the PowerPC CPUs.=
=20
=46or some reason, glibc -and freebsd libc for that matter as I did a look=
=20
around- use very complex source trees with no good reason. The implementati=
on=20
of a sinf() for example is no more than 20 C lines.=20
As for commitment, well I've been working on that stuff since 2004 (with a =
~2y=20
break because of other obligations, army, family, baby, etc :), but unless=
=20
IBM/Freescale choose to dump AltiVec altogether, I don't see myself stoppin=
g=20
working on it. To tell you the truth, the promotion of the vector unit by b=
oth=20
companies has been a disappointment in my eyes at least, so I might just as=
=20
well switch platform... But that won't happen yet anyway.
> Any submission to GLIBC is going to require that you and your code
> follow the GLIBC process or it'll probably be ignored. You can engage
> me directly via CC and I can help you understand how to integrate the
> code but I can't give you a free pass or do the work for you.
I never asked that. However, first it's more imporant to me to show that th=
e=20
code is worth including and then *if* it's proven worthy, then we can worry=
=20
about stuff like copyright assignment, etc.=20
> The new libc-help mailing list was also created as a place for people to
> learn the process and get the patches in a state where they're ready to
> be submitted to libc-alpha.
I will take a look, thanks for that info.
Konstantinos Margaritis
Codex
http://www.codex.gr
next prev parent reply other threads:[~2008-08-24 8:05 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-21 16:09 libfreevec benchmarks Konstantinos Margaritis
2008-08-22 17:44 ` Ryan S. Arnold
2008-08-22 17:50 ` Ryan S. Arnold
2008-08-24 8:03 ` Konstantinos Margaritis [this message]
2008-09-02 22:24 ` Ryan S. Arnold
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200808241103.58485.markos@codex.gr \
--to=markos@codex.gr \
--cc=linuxppc-dev@ozlabs.org \
--cc=rsa@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).