Re: libfreevec benchmarks - Konstantinos Margaritis

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Konstantinos Margaritis <markos@codex.gr>
To: rsa@us.ibm.com
Cc: linuxppc-dev@ozlabs.org
Subject: Re: libfreevec benchmarks
Date: Sun, 24 Aug 2008 11:03:56 +0300	[thread overview]
Message-ID: <200808241103.58485.markos@codex.gr> (raw)
In-Reply-To: <1219427047.7917.38.camel@localhost>

=CE=A3=CF=84=CE=B9=CF=82 Friday 22 August 2008 20:44:07 =CE=BF/=CE=B7 Ryan =
S. Arnold =CE=AD=CE=B3=CF=81=CE=B1=CF=88=CE=B5:
> Do you have FSF (Free Software Foundation) copyright assignment yet?

Copyright assignment is not the issue, if there was interest in the first=20
place, that would never had deterred me.

> How've you implemented the optimizations?

Scalar for small sizes, AltiVec for larger (>16 bytes, depending on the=20
routine).

> Optimizations for individual architectures should follow the powerpc-cpu
> precedent for providing these routines, e.g.
>
> sysdeps/powerpc/powerpc32/power6/memcpy.S
> sysdeps/powerpc/powerpc64/power6/memcpy.S

That's the idea I got, but so far I understood that only 64-bit PowerPC/POW=
ER=20
cpus are supported, what about 32-bit cpus? libfreevec isn't ported to 64-b=
it=20
yet (though I will finish that soon). Would it be enough to have one dir li=
ke=20
eg:

sysdeps/powerpc/powerpc32/altivec/

or would I have to refer to specific CPU models? eg 74xx? And use Implies f=
or=20
the rest?

> Today, if glibc is configure with --with-cpu=3D970 it will actually
> default to the power optimizations for the string routines, as indicated
> by the sysdeps/powerpc/powerpc[32|64]/970/Implies files.  It'd be worth
> verifying that your baseline glibc runs are against existing optimized
> versions of glibc.  If they're not then this is a fault of the distro
> you're testing on.

Well, I used Debian Lenny and OpenSuse 11.0 (using glibc 2.7 and glibc2.8=20
resp. If it doesn't work as supposed, these are two popular distros with a=
=20
broken glibc, which I would think it's not very likely.

> I'm not aware of the status of some of the embedded PowerPC processors
> with-regard to powerpc-cpu optimizations.

Would the G4 and 8610 fall under the "embedded" PowerPC category?

> Our research found that for some tasks on some PowerPC processors the
> expense of reserving the floating point pipeline for vector operations
> exceeds the benefit of using vector insns for the task.

Well, I would advise *strongly* against that, except for specific cases, no=
t=20
for OS-wide functions. For example, in a popular 3D application such as=20
Blender (or the Mesa 3D library), a lot of memory copying is done along wit=
h=20
lots of FPU math. If you use the FPU unit for plain memcpy/etc stuff, you=20
essentially forbid the app to use it for the important stuff, ie math, and =
in=20
the end you lose performance. On the other hand, the AltiVec unit remains=20
unused all the time, and it's certainly more capable and more generic than =
the=20
=46PU for most of the stuff -not to mention that inside the same app, the i=
ssue=20
of context switching becomes unimportant.=20

> Generally our optimizations tend to favor data an average of 12 bytes
> with 1000 byte max.  We also favor aligned data and use the existing
> implementation as a model as a baseline for where we try to keep
> unaligned data performance from dropping below.

Please, check the graphs of most libfreevec functions for the sizes=20
12-1000bytes. Apart from strlen(), which is the only function that performs=
=20
better overall than libfreevec, most other functions offer the same=20
performance for sizes up to 48/96 bytes, but then performance increases=20
dramatically due to the use of the vector unit.

> This research would be a good candidate for selectively replacing some
> of the existing libm functionality.  Do these results hold for all
> permutations of long double support?  Do they hold for x86/x86_64 as
> well as PowerPC?  I would suggest against a massive patch to libc-alpha
> and would instead recommend selective, individual replacement of
> fundamental routines to start with accompanied by exhaustive profile
> data.  You have to show that you're dedicated to maintenance of these
> routines and you can't overwhelm the reviewers with massive patches.

=46or the moment, my focus is on 32-bit floats only, but the algorithm is t=
he=20
same for 64-bit/128-bit floating point numbers even. It will just use more=
=20
terms. And yes, as I said, it doesn't use AltiVec and is totally cross-
platform -just plain C- and very short code even. I tested the code on an=20
Athlon X2 again and I get even better performance than on the PowerPC CPUs.=
=20
=46or some reason, glibc -and freebsd libc for that matter as I did a look=
=20
around- use very complex source trees with no good reason. The implementati=
on=20
of a sinf() for example is no more than 20 C lines.=20

As for commitment, well I've been working on that stuff since 2004 (with a =
~2y=20
break because of other obligations, army, family, baby, etc :), but unless=
=20
IBM/Freescale choose to dump AltiVec altogether, I don't see myself stoppin=
g=20
working on it. To tell you the truth, the promotion of the vector unit by b=
oth=20
companies has been a disappointment in my eyes at least, so I might just as=
=20
well switch platform... But that won't happen yet anyway.

> Any submission to GLIBC is going to require that you and your code
> follow the GLIBC process or it'll probably be ignored.  You can engage
> me directly via CC and I can help you understand how to integrate the
> code but I can't give you a free pass or do the work for you.

I never asked that. However, first it's more imporant to me to show that th=
e=20
code is worth including and then *if* it's proven worthy, then we can worry=
=20
about stuff like copyright assignment, etc.=20

> The new libc-help mailing list was also created as a place for people to
> learn the process and get the patches in a state where they're ready to
> be submitted to libc-alpha.

I will take a look, thanks for that info.

Konstantinos Margaritis
Codex
http://www.codex.gr

next prev parent reply	other threads:[~2008-08-24  8:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-21 16:09 libfreevec benchmarks Konstantinos Margaritis
2008-08-22 17:44 ` Ryan S. Arnold
2008-08-22 17:50   ` Ryan S. Arnold
2008-08-24  8:03   ` Konstantinos Margaritis [this message]
2008-09-02 22:24     ` Ryan S. Arnold

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200808241103.58485.markos@codex.gr \
    --to=markos@codex.gr \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=rsa@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).