[PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: qemu-arm@nongnu.org, peter.maydell@linaro.org, qemu-stable@nongnu.org
Subject: [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection
Date: Wed,  2 Jul 2025 06:22:11 -0600	[thread overview]
Message-ID: <20250702122213.758588-9-richard.henderson@linaro.org> (raw)
In-Reply-To: <20250702122213.758588-1-richard.henderson@linaro.org>

Implement FPProcessNaNs4 within f16_dotadd, rather than
simply letting NaNs propagate through the function.

Cc: qemu-stable@nongnu.org
Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/sme_helper.c | 62 +++++++++++++++++++++++++++----------
 1 file changed, 46 insertions(+), 16 deletions(-)

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index de0c6e54d4..8f33387e4b 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -1005,25 +1005,55 @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2,
      *  - we have pre-set-up copy of s_std which is set to round-to-odd,
      *    for the multiply (see below)
      */
-    float64 e1r = float16_to_float64(e1 & 0xffff, true, s_f16);
-    float64 e1c = float16_to_float64(e1 >> 16, true, s_f16);
-    float64 e2r = float16_to_float64(e2 & 0xffff, true, s_f16);
-    float64 e2c = float16_to_float64(e2 >> 16, true, s_f16);
-    float64 t64;
+    float16 h1r = e1 & 0xffff;
+    float16 h1c = e1 >> 16;
+    float16 h2r = e2 & 0xffff;
+    float16 h2c = e2 >> 16;
     float32 t32;
 
-    /*
-     * The ARM pseudocode function FPDot performs both multiplies
-     * and the add with a single rounding operation.  Emulate this
-     * by performing the first multiply in round-to-odd, then doing
-     * the second multiply as fused multiply-add, and rounding to
-     * float32 all in one step.
-     */
-    t64 = float64_mul(e1r, e2r, s_odd);
-    t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std);
+    /* C.f. FPProcessNaNs4 */
+    if (float16_is_any_nan(h1r) || float16_is_any_nan(h1c) ||
+        float16_is_any_nan(h2r) || float16_is_any_nan(h2c)) {
+        float16 t16;
 
-    /* This conversion is exact, because we've already rounded. */
-    t32 = float64_to_float32(t64, s_std);
+        if (float16_is_signaling_nan(h1r, s_f16)) {
+            t16 = h1r;
+        } else if (float16_is_signaling_nan(h1c, s_f16)) {
+            t16 = h1c;
+        } else if (float16_is_signaling_nan(h2r, s_f16)) {
+            t16 = h2r;
+        } else if (float16_is_signaling_nan(h2c, s_f16)) {
+            t16 = h2c;
+        } else if (float16_is_any_nan(h1r)) {
+            t16 = h1r;
+        } else if (float16_is_any_nan(h1c)) {
+            t16 = h1c;
+        } else if (float16_is_any_nan(h2r)) {
+            t16 = h2r;
+        } else {
+            t16 = h2c;
+        }
+        t32 = float16_to_float32(t16, true, s_f16);
+    } else {
+        float64 e1r = float16_to_float64(h1r, true, s_f16);
+        float64 e1c = float16_to_float64(h1c, true, s_f16);
+        float64 e2r = float16_to_float64(h2r, true, s_f16);
+        float64 e2c = float16_to_float64(h2c, true, s_f16);
+        float64 t64;
+
+        /*
+         * The ARM pseudocode function FPDot performs both multiplies
+         * and the add with a single rounding operation.  Emulate this
+         * by performing the first multiply in round-to-odd, then doing
+         * the second multiply as fused multiply-add, and rounding to
+         * float32 all in one step.
+         */
+        t64 = float64_mul(e1r, e2r, s_odd);
+        t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std);
+
+        /* This conversion is exact, because we've already rounded. */
+        t32 = float64_to_float32(t64, s_std);
+    }
 
     /* The final accumulation step is not fused. */
     return float32_add(sum, t32, s_std);
-- 
2.43.0

next prev parent reply	other threads:[~2025-07-02 12:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
2025-07-02 12:22 ` [PATCH v3 01/10] target/arm: Fix SME vs AdvSIMD exception priority Richard Henderson
2025-07-02 12:22 ` [PATCH v3 02/10] target/arm: Fix sve_access_check for SME Richard Henderson
2025-07-02 12:22 ` [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN Richard Henderson
2025-07-03  9:08   ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode Richard Henderson
2025-07-02 14:11   ` Richard Henderson
2025-07-03  9:14   ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL Richard Henderson
2025-07-03  9:08   ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small Richard Henderson
2025-07-02 18:15   ` Richard Henderson
2025-07-03  9:10   ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands Richard Henderson
2025-07-03  9:09   ` Peter Maydell
2025-07-02 12:22 ` Richard Henderson [this message]
2025-07-03  9:12   ` [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection Peter Maydell
2025-07-02 12:22 ` [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf " Richard Henderson
2025-07-03  9:13   ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch Richard Henderson
2025-07-02 13:47   ` Alex Bennée

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:de0c6e54d dfblob:8f33387e4 )
 OR (
bs:"[PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250702122213.758588-9-richard.henderson@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).