From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:41576)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1Vo0GA-0000iI-Vn
	for qemu-devel@nongnu.org; Tue, 03 Dec 2013 19:24:17 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1Vo0G4-00040D-Lg
	for qemu-devel@nongnu.org; Tue, 03 Dec 2013 19:24:10 -0500
Sender: Richard Henderson <rth7680@gmail.com>
Message-ID: <529E75FA.1000501@twiddle.net>
Date: Wed, 04 Dec 2013 13:23:22 +1300
From: Richard Henderson <rth@twiddle.net>
MIME-Version: 1.0
References: <1386086305-8163-1-git-send-email-tommusta@gmail.com>
	<1386086305-8163-13-git-send-email-tommusta@gmail.com>
In-Reply-To: <1386086305-8163-13-git-send-email-tommusta@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [V3 PATCH 12/14] target-ppc: VSX Stage 4: Add
 Scalar SP Fused Multiply-Adds
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Tom Musta <tommusta@gmail.com>, qemu-devel@nongnu.org
Cc: qemu-ppc@nongnu.org

On 12/04/2013 04:58 AM, Tom Musta wrote:
> This patch adds the Single Precision VSX Scalar Fused Multiply-Add
> instructions: xsmaddasp, xsmaddmsp, xssubasp, xssubmsp, xsnmaddasp,
> xsnmaddmsp, xsnmsubasp, xsnmsubmsp.
> 
> The existing VSX_MADD() macro is modified to support rounding of the
> intermediate double precision result to single precision.
> 
> V2: Re-implemented per feedback from Richard Henderson.  In order to
> avoid double rounding and incorrect results, the operands must be
> converted to true single precision values and use the single precision
> fused multiply/add routine.
> 
> V3: Re-implemented per feedback from Richard Henderson.  The implementation
> now uses a round-to-odd algorithm to address subtle double rounding errors.
> 
> Signed-off-by: Tom Musta <tommusta@gmail.com>
> ---
>  target-ppc/fpu_helper.c |   84 ++++++++++++++++++++++++++++++----------------
>  target-ppc/helper.h     |    8 ++++
>  target-ppc/translate.c  |   16 +++++++++
>  3 files changed, 79 insertions(+), 29 deletions(-)
> 
> diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
> index 8825db2..077d057 100644
> --- a/target-ppc/fpu_helper.c
> +++ b/target-ppc/fpu_helper.c
> @@ -2192,7 +2192,7 @@ VSX_TSQRT(xvtsqrtsp, 4, float32, f32, -126, 23)
>   *   afrm  - A form (1=A, 0=M)
>   *   sfprf - set FPRF
>   */
> -#define VSX_MADD(op, nels, tp, fld, maddflgs, afrm, sfprf)                    \
> +#define VSX_MADD(op, nels, tp, fld, maddflgs, afrm, sfprf, r2sp)              \
>  void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
>  {                                                                             \
>      ppc_vsr_t xt_in, xa, xb, xt_out;                                          \
> @@ -2218,8 +2218,18 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
>      for (i = 0; i < nels; i++) {                                              \
>          float_status tstat = env->fp_status;                                  \
>          set_float_exception_flags(0, &tstat);                                 \
> -        xt_out.fld[i] = tp##_muladd(xa.fld[i], b->fld[i], c->fld[i],          \
> -                                     maddflgs, &tstat);                       \
> +        if (r2sp && (tstat.float_rounding_mode == float_round_nearest_even)) {\
> +            /* Avoid double rounding errors by rounding the intermediate */   \
> +            /* result to odd.                                            */   \
> +            set_float_rounding_mode(float_round_to_zero, &tstat);             \
> +            xt_out.fld[i] = tp##_muladd(xa.fld[i], b->fld[i], c->fld[i],      \
> +                                       maddflgs, &tstat);                     \
> +            xt_out.fld[i] |= (get_float_exception_flags(&tstat) &             \
> +                              float_flag_inexact) != 0;                       \
> +        } else {                                                              \
> +            xt_out.fld[i] = tp##_muladd(xa.fld[i], b->fld[i], c->fld[i],      \
> +                                        maddflgs, &tstat);                    \
> +        }                                                                     \
>          env->fp_status.float_exception_flags |= tstat.float_exception_flags;  \
>                                                                                \
>          if (unlikely(tstat.float_exception_flags & float_flag_invalid)) {     \
> @@ -2242,6 +2252,13 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
>                  fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXISI, sfprf);     \
>              }                                                                 \
>          }                                                                     \
> +                                                                              \
> +        if (r2sp) {                                                           \
> +            float32 tmp32 = float64_to_float32(xt_out.fld[i],                 \
> +                                               &env->fp_status);              \
> +            xt_out.fld[i] = float32_to_float64(tmp32, &env->fp_status);       \
> +        }                                                                     \
> +                                                                              \

helper_frsp

Otherwise,

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~