From: Tom Musta <tommusta@gmail.com>
To: Richard Henderson <rth@twiddle.net>, qemu-devel@nongnu.org
Cc: qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds
Date: Wed, 13 Nov 2013 14:49:02 -0600 [thread overview]
Message-ID: <5283E5BE.1080809@gmail.com> (raw)
In-Reply-To: <527C2C92.7060006@twiddle.net>
On 11/7/2013 6:13 PM, Richard Henderson wrote:
> On 11/08/2013 09:30 AM, Richard Henderson wrote:
>> On 11/08/2013 09:28 AM, Richard Henderson wrote:
>>> On 11/07/2013 06:31 AM, Tom Musta wrote:
>>>> } \
>>>> + \
>>>> + if (r2sp) { \
>>>> + float32 tmp32 = float64_to_float32(xt_out.fld[i], \
>>>> + &env->fp_status); \
>>>> + xt_out.fld[i] = float32_to_float64(tmp32, &env->fp_status); \
>>>> + } \
>>>> + \
>>>
>>> You can't get correct results for a single-precision fma from a
>>> double-precision fma and merely rounding the results.
>>>
>>> See e.g. glibc's sysdeps/ieee754/dbl-64/s_fmaf.c.
>>
>> Blah, nevermind. That would be using separate add+mul in double-precision, not
>> using a double-precision fma primitive.
>
> Hmph. I was right the first time. See
>
>> http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/
>
> for example inputs that suffer from double-rounding.
>
> What's needed in each of the examples are infinite precision values containing
> 55 bits. This is easy to accomplish with fma.
>
> Two 23-bit inputs can create a product with 46 significant bits. One can
> append 23 more significant bits by choosing an exponent for the addend that
> does not overlap the product. Thus one can create (almost) every intermediate
> result with up to 69 consecutive bits (the exception being products without
> factors that can be represented in 23-bits).
>
> I'm too lazy to decompose the examples therein to actual single-precision
> inputs, but I'm certain it can be done.
>
> Thus you *do* need the round-to-zero plus inexact solution that glibc uses.
> (Or to perform the fma in 128-bits and round once, but I think that would be
> way more intrusive wrt the rest of the code, and more expensive than necessary.)
I have reviewed the code and the spec and I cannot see a flaw. The sequence is
effectively this:
- float64_muladd - performs proper FMA for 64 bit numbers)
- float64_to_float32 - converts to single precision, including proper rounding
- float32_to_float64
The implementation of float64_muladd would seem to provide enough mantissa bits
for proper handling of the case you describe. The only rounding occurs in the
second step.
I have also done quite a bit of random and targeted random testing using Power
hardware to produce expected results. The targeted random tests followed your
suggestion above: generate AxB + C where abs(exp(A) - exp(B)) = 23 and
abs(exp(A) - exp(C)) = 46. Several million test patterns have been generated
and played back through QEMU without any miscompares in the numerical results.
next prev parent reply other threads:[~2013-11-13 20:49 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 01/14] VSX Stage 4: Add VSX 2.07 Flag Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 02/14] VSX Stage 4: Refactor lxsdx Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 03/14] VSX Stage 4: Add lxsiwax, lxsiwzx and lxsspx Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 04/14] VSX Stage 4: Refactor stxsdx Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 05/14] VSX Stage 4: Add stxsiwx and stxsspx Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 06/14] VSX Stage 4: Add xsaddsp and xssubsp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 07/14] VSX Stage 4: Add xsmulsp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 08/14] VSX Stage 4: Add xsdivsp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 09/14] VSX Stage 4: Add xsresp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 10/14] VSX Stage 4: Add xssqrtsp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 11/14] VSX Stage 4: add xsrsqrtesp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds Tom Musta
2013-11-07 23:28 ` Richard Henderson
2013-11-07 23:30 ` Richard Henderson
2013-11-08 0:13 ` Richard Henderson
2013-11-13 20:49 ` Tom Musta [this message]
2013-11-13 23:14 ` Richard Henderson
2013-11-14 20:58 ` Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 13/14] VSX Stage 4: Add xscvsxdsp and xscvuxdsp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 14/14] VSX Stage 4: Add xxleqv, xxlnand and xxlorc Tom Musta
2013-11-08 0:23 ` [Qemu-devel] [PATCH 00/14] VSX Stage 4 Richard Henderson
2013-11-08 14:53 ` Tom Musta
2013-11-13 14:35 ` Tom Musta
2013-11-08 15:55 ` Andreas Färber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5283E5BE.1080809@gmail.com \
--to=tommusta@gmail.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).