[PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES
@ 2025-01-24 16:27 Peter Maydell
  2025-01-24 16:27 ` [PATCH 01/76] target/i386: Do not raise Invalid for 0 * Inf + QNaN Peter Maydell
                   ` (77 more replies)
  0 siblings, 78 replies; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

This patchset implements emulation of the Arm FEAT_AFP and FEAT_RPRES
extensions, which are floating-point related. It's based on the
small i386 bugfix series I sent out a while back:

Based-on: 20250116112536.4117889-1-peter.maydell@linaro.org
("target/i386: Fix 0 * Inf + QNaN regression")

(It would also have been based on an initial refactoring series
I sent out on Monday, but AFAICT the list just ate those emails
and they never arrived anywhere :-(  So you get a bigger series
here than I'd hoped.)

If you'd rather have these patches as a git branch:
 https://git.linaro.org/people/pmaydell/qemu-arm.git  feat-afp
with human readable web view at:
 https://git.linaro.org/people/peter.maydell/qemu-arm.git/log/?h=feat-afp


FEAT_AFP defines three new control bits in the FPCR, whose
operations are basically independent of each other:
 * FPCR.AH: "alternate floating point mode"; this changes floating
   point behaviour in a variety of ways, including:
    - the sign of a default NaN is 1, not 0
    - if FPCR.FZ is also 1, denormals detected after rounding
      with an unbounded exponent has been applied are flushed to zero
    - FPCR.FZ does not cause denormalized inputs to be flushed to zero
    - miscellaneous other corner-case behaviour changes
 * FPCR.FIZ: flush denormalized numbers to zero on input for
   most instructions
 * FPCR.NEP: makes scalar SIMD operations merge the result with
   higher vector elements in one of the source registers, instead
   of zeroing the higher elements of the destination

FEAT_RPRES makes single-precision FRECPE and FRSQRTE use a 12-bit
mantissa precision instead of 8-bit when FPCR.AH is set.

Because FPCR.AH implies quite a lot of changes to corner cases
of floating point handling, the resulting patchseries is regrettably
quite big.

Structure of the patchseries:
 * patch 1 fixes a silly bug in arm_reset_sve_state() which only
   has a major bad effect once FEAT_AFP is implemented
 * patches 2-16 are a refactoring which splits the existing
   fp_status and fp_status_f16 so that each have separate a32 and
   a64 versions. We need this because the FEAT_AFP bits only have
   an effect for A64 insns, not A32 insns
 * patches 17-22 add some more functionality to softfloat that we
   need for FEAT_AFP:
    - an exception flag float_flag_input_denormal_used is set when
      an input to an fp op is denormal, is not squashed to zero,
      and is actually consumed (i.e. not an invalid operation or
      an operation where the other input was a NaN)
    - a control setting float_detect_ftz which lets the target
      control whether flush-to-zero of outputs should be done
      before or after rounding
   (Both these are needed for correct x86 FP emulation, incidentally.)
 * patches 23-28 define the FPCR bits and implement the parts of the
   functionality which can be handled by setting softfloat control
   knobs and adjusting how we handle softfloat exception flags.
   (This includes all of the FPCR.FIZ behaviour.)
 * patches 29-33 implement FPCR.AH handling of a small group of
   insns (FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, BFCVT*, BFMLAL*,
   BFMLSL*) which must:
    - never update FPSR exception flags
    - always round-to-nearest-even
    - always flush single and double denormal inputs and outputs to zero
   We implement this via some new float_status fields that we use for
   this group of insns.
 * patches 34-42 implement the FPCR.NEP "merge high vector elements of
   a source register with  the result of a scalar operation" behaviour
 * patches 43-49 implement FPCR.AH semantics for FMIN and FMAX:
    - comparing two zeroes (even of different sign) or comparing a NaN
      with anything always returns the second argument (possibly
      squashed to zero)
    - denormal outputs are not squashed to zero regardless of FZ or FZ16
 * patches 50-65 implement FPCR.AH semantics for abs and neg of floating
   point values: they must not change the sign bit of a NaN. This applies
   not just to the ABS and NEG insns but to any other insn whose
   pseudocode has it doing an FPAbs() or FPNeg() operation (e.g.
   FMLS, FRECPS, FTSSEL).
 * at this point patch 66 can enable FEAT_AFP for -cpu max
 * patches 67-70 implement FEAT_RPRES

I have also some patchs which make target/i386 use the "detect
flush to zero after rounding" and "report when input denormal is
consumed" softfloat features added here; I don't include them in
this patchset (though you can find them in that git branch I
mentioned earlier) becaus I haven't done as much testing on the
i386 side and in any case this patchset is already pretty long.
I expect I'll send them out when this series has been merged.


thanks
-- PMM


Peter Maydell (76):
  target/i386: Do not raise Invalid for 0 * Inf + QNaN
  tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases
  target/arm: arm_reset_sve_state() should set FPSR, not FPCR
  target/arm: Use FPSR_ constants in vfp_exceptbits_from_host()
  target/arm: Use uint32_t in vfp_exceptbits_from_host()
  target/arm: Define new fp_status_a32 and fp_status_a64
  target/arm: Use vfp.fp_status_a64 in A64-only helper functions
  target/arm: Use fp_status_a32 in vjvct helper
  target/arm: Use fp_status_a32 in vfp_cmp helpers
  target/arm: Use FPST_FPCR_A32 in A32 decoder
  target/arm: Use FPST_FPCR_A64 in A64 decoder
  target/arm: Remove now-unused vfp.fp_status and FPST_FPCR
  target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64
  target/arm: Use fp_status_f16_a32 in AArch32-only helpers
  target/arm: Use fp_status_f16_a64 in AArch64-only helpers
  target/arm: Use FPST_FPCR_F16_A32 in A32 decoder
  target/arm: Use FPST_FPCR_F16_A64 in A64 decoder
  target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16
  fpu: Rename float_flag_input_denormal to
    float_flag_input_denormal_flushed
  fpu: Rename float_flag_output_denormal to
    float_flag_output_denormal_flushed
  fpu: Fix a comment in softfloat-types.h
  fpu: Add float_class_denormal
  fpu: Implement float_flag_input_denormal_used
  fpu: allow flushing of output denormals to be after rounding
  target/arm: Remove redundant advsimd float16 helpers
  target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions
  target/arm: Define FPCR AH, FIZ, NEP bits
  target/arm: Implement FPCR.FIZ handling
  target/arm: Adjust FP behaviour for FPCR.AH = 1
  target/arm: Adjust exception flag handling for AH = 1
  target/arm: Add FPCR.AH to tbflags
  target/arm: Set up float_status to use for FPCR.AH=1 behaviour
  target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE,
    FRSQRTS
  target/arm: Use FPST_FPCR_AH for BFCVT* insns
  target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
  target/arm: Add FPCR.NEP to TBFLAGS
  target/arm: Define and use new write_fp_*reg_merging() functions
  target/arm: Handle FPCR.NEP for 3-input scalar operations
  target/arm: Handle FPCR.NEP for BFCVT scalar
  target/arm: Handle FPCR.NEP for 1-input scalar operations
  target/arm: Handle FPCR.NEP in do_cvtf_scalar()
  target/arm: Handle FPCR.NEP for scalar FABS and FNEG
  target/arm: Handle FPCR.NEP for FCVTXN (scalar)
  target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
  target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
  target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
  target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
  target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
  target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
  target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
  target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
  target/arm: Implement FPCR.AH handling of negation of NaN
  target/arm: Implement FPCR.AH handling for scalar FABS and FABD
  target/arm: Handle FPCR.AH in vector FABD
  target/arm: Handle FPCR.AH in SVE FNEG
  target/arm: Handle FPCR.AH in SVE FABS
  target/arm: Handle FPCR.AH in SVE FABD
  target/arm: Handle FPCR.AH in negation steps in FCADD
  target/arm: Handle FPCR.AH in negation steps in SVE FCADD
  target/arm: Handle FPCR.AH in FMLSL
  target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
  target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
  target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
  target/arm: Handle FPCR.AH in negation in FMLS (vector)
  target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
  target/arm: Handle FPCR.AH in SVE FTSSEL
  target/arm: Handle FPCR.AH in SVE FTMAD
  target/arm: Enable FEAT_AFP for '-cpu max'
  target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
  target/arm: Implement increased precision FRECPE
  target/arm: Implement increased precision FRSQRTE
  target/arm: Enable FEAT_RPRES for -cpu max
  target/i386: Detect flush-to-zero after rounding
  target/i386: Use correct type for get_float_exception_flags() values
  target/i386: Wire up MXCSR.DE and FPUS.DE correctly
  tests/tcg/x86_64/fma: add test for exact-denormal output

 docs/system/arm/emulation.rst    |   2 +
 include/fpu/softfloat-helpers.h  |  11 +
 include/fpu/softfloat-types.h    |  51 +-
 target/arm/cpu-features.h        |  10 +
 target/arm/cpu.h                 |  32 +-
 target/arm/helper.h              |  12 +
 target/arm/internals.h           |   6 +
 target/arm/tcg/helper-a64.h      |  21 +-
 target/arm/tcg/helper-sve.h      | 120 +++++
 target/arm/tcg/translate.h       |  63 ++-
 target/i386/ops_sse.h            |  16 +-
 target/mips/fpu_helper.h         |   6 +
 fpu/softfloat.c                  |  71 ++-
 target/alpha/cpu.c               |   7 +
 target/arm/cpu.c                 |  32 +-
 target/arm/helper.c              |   4 +-
 target/arm/tcg/cpu64.c           |   2 +
 target/arm/tcg/helper-a64.c      | 173 ++++---
 target/arm/tcg/hflags.c          |  13 +
 target/arm/tcg/sme_helper.c      |   6 +-
 target/arm/tcg/sve_helper.c      | 301 ++++++++---
 target/arm/tcg/translate-a64.c   | 850 ++++++++++++++++++++++++-------
 target/arm/tcg/translate-sme.c   |   4 +-
 target/arm/tcg/translate-sve.c   | 280 ++++++----
 target/arm/tcg/translate-vfp.c   |  78 +--
 target/arm/tcg/vec_helper.c      | 174 ++++++-
 target/arm/vfp_helper.c          | 369 +++++++++++---
 target/hppa/fpu_helper.c         |  11 +
 target/i386/tcg/fpu_helper.c     | 110 ++--
 target/m68k/fpu_helper.c         |   2 +-
 target/mips/msa.c                |   9 +
 target/mips/tcg/msa_helper.c     |   4 +-
 target/ppc/cpu_init.c            |   3 +
 target/rx/cpu.c                  |   8 +
 target/rx/op_helper.c            |   4 +-
 target/sh4/cpu.c                 |   8 +
 target/tricore/fpu_helper.c      |   6 +-
 target/tricore/helper.c          |   1 +
 tests/fp/fp-bench.c              |   1 +
 tests/tcg/x86_64/fma.c           | 116 +++++
 fpu/softfloat-parts.c.inc        | 136 ++++-
 tests/tcg/x86_64/Makefile.target |   1 +
 42 files changed, 2443 insertions(+), 691 deletions(-)
 create mode 100644 tests/tcg/x86_64/fma.c

-- 
2.34.1



^ permalink raw reply	[flat|nested] 167+ messages in thread

* [PATCH 01/76] target/i386: Do not raise Invalid for 0 * Inf + QNaN
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-24 16:27 ` [PATCH 02/76] tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases Peter Maydell
                   ` (76 subsequent siblings)
  77 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

In commit 8adcff4ae7 ("fpu: handle raising Invalid for infzero in
pick_nan_muladd") we changed the handling of 0 * Inf + QNaN to always
raise the Invalid exception regardless of target architecture.  (This
was a change affecting hppa, i386, sh4 and tricore.) However, this
was incorrect for i386, which documents in the SDM section 14.5.2
that for the 0 * Inf + NaN case that it will only raise the Invalid
exception when the input is an SNaN.  (This is permitted by the IEEE
754-2008 specification, which documents that whether we raise Invalid
for 0 * Inf + QNaN is implementation defined.)

Adjust the softfloat pick_nan_muladd code to allow the target to
suppress the raising of Invalid for the inf * zero + NaN case (as an
extra flag orthogonal to its choice for when to use the default NaN),
and enable that for x86.

We do not revert here the behaviour change for hppa, sh4 or tricore:
 * The sh4 manual is clear that it should signal Invalid
 * The tricore manual is a bit vague but doesn't say it shouldn't
 * The hppa manual doesn't talk about fused multiply-add corner
   cases at all

Cc: qemu-stable@nongnu.org
Fixes: 8adcff4ae7 (""fpu: handle raising Invalid for infzero in pick_nan_muladd")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/fpu/softfloat-types.h | 16 +++++++++++++---
 target/i386/tcg/fpu_helper.c  |  5 ++++-
 fpu/softfloat-parts.c.inc     |  5 +++--
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index 9d37cdfaa8e..d8f831c331d 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -278,11 +278,21 @@ typedef enum __attribute__((__packed__)) {
     /* No propagation rule specified */
     float_infzeronan_none = 0,
     /* Result is never the default NaN (so always the input NaN) */
-    float_infzeronan_dnan_never,
+    float_infzeronan_dnan_never = 1,
     /* Result is always the default NaN */
-    float_infzeronan_dnan_always,
+    float_infzeronan_dnan_always = 2,
     /* Result is the default NaN if the input NaN is quiet */
-    float_infzeronan_dnan_if_qnan,
+    float_infzeronan_dnan_if_qnan = 3,
+    /*
+     * Don't raise Invalid for 0 * Inf + NaN. Default is to raise.
+     * IEEE 754-2008 section 7.2 makes it implementation defined whether
+     * 0 * Inf + QNaN raises Invalid or not. Note that 0 * Inf + SNaN will
+     * raise the Invalid flag for the SNaN anyway.
+     *
+     * This is a flag which can be ORed in with any of the above
+     * DNaN behaviour options.
+     */
+    float_infzeronan_suppress_invalid = (1 << 2),
 } FloatInfZeroNaNRule;
 
 /*
diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index d0a1e2f3c8a..e0a072b4ebc 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -178,8 +178,11 @@ void cpu_init_fp_statuses(CPUX86State *env)
      * "Fused-Multiply-ADD (FMA) Numeric Behavior" the NaN handling is
      * specified -- for 0 * inf + NaN the input NaN is selected, and if
      * there are multiple input NaNs they are selected in the order a, b, c.
+     * We also do not raise Invalid for the 0 * inf + (Q)NaN case.
      */
-    set_float_infzeronan_rule(float_infzeronan_dnan_never, &env->sse_status);
+    set_float_infzeronan_rule(float_infzeronan_dnan_never |
+                              float_infzeronan_suppress_invalid,
+                              &env->sse_status);
     set_float_3nan_prop_rule(float_3nan_prop_abc, &env->sse_status);
     /* Default NaN: sign bit set, most significant frac bit set */
     set_float_default_nan_pattern(0b11000000, &env->fp_status);
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index ebde42992fc..4bb341b2f94 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -126,7 +126,8 @@ static FloatPartsN *partsN(pick_nan_muladd)(FloatPartsN *a, FloatPartsN *b,
         float_raise(float_flag_invalid | float_flag_invalid_snan, s);
     }
 
-    if (infzero) {
+    if (infzero &&
+        !(s->float_infzeronan_rule & float_infzeronan_suppress_invalid)) {
         /* This is (0 * inf) + NaN or (inf * 0) + NaN */
         float_raise(float_flag_invalid | float_flag_invalid_imz, s);
     }
@@ -144,7 +145,7 @@ static FloatPartsN *partsN(pick_nan_muladd)(FloatPartsN *a, FloatPartsN *b,
          * Inf * 0 + NaN -- some implementations return the
          * default NaN here, and some return the input NaN.
          */
-        switch (s->float_infzeronan_rule) {
+        switch (s->float_infzeronan_rule & ~float_infzeronan_suppress_invalid) {
         case float_infzeronan_dnan_never:
             break;
         case float_infzeronan_dnan_always:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 02/76] tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
  2025-01-24 16:27 ` [PATCH 01/76] target/i386: Do not raise Invalid for 0 * Inf + QNaN Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-24 17:15   ` Alex Bennée
  2025-01-24 16:27 ` [PATCH 03/76] target/arm: arm_reset_sve_state() should set FPSR, not FPCR Peter Maydell
                   ` (75 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Add a test case which tests some corner case behaviour of
fused-multiply-add on x86:
 * 0 * Inf + SNaN should raise Invalid
 * 0 * Inf + QNaN shouldh not raise Invalid
 * tininess should be detected after rounding

There is also one currently-disabled test case:
 * flush-to-zero should be done after rounding

This is disabled because QEMU's emulation currently does this
incorrectly (and so would fail the test).  The test case is kept in
but disabled, as the justification for why the test running harness
has support for testing both with and without FTZ set.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/tcg/x86_64/fma.c           | 109 +++++++++++++++++++++++++++++++
 tests/tcg/x86_64/Makefile.target |   1 +
 2 files changed, 110 insertions(+)
 create mode 100644 tests/tcg/x86_64/fma.c

diff --git a/tests/tcg/x86_64/fma.c b/tests/tcg/x86_64/fma.c
new file mode 100644
index 00000000000..09c622ebc00
--- /dev/null
+++ b/tests/tcg/x86_64/fma.c
@@ -0,0 +1,109 @@
+/*
+ * Test some fused multiply add corner cases.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <inttypes.h>
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
+/*
+ * Perform one "n * m + a" operation using the vfmadd insn and return
+ * the result; on return *mxcsr_p is set to the bottom 6 bits of MXCSR
+ * (the Flag bits). If ftz is true then we set MXCSR.FTZ while doing
+ * the operation.
+ * We print the operation and its results to stdout.
+ */
+static uint64_t do_fmadd(uint64_t n, uint64_t m, uint64_t a,
+                         bool ftz, uint32_t *mxcsr_p)
+{
+    uint64_t r;
+    uint32_t mxcsr = 0;
+    uint32_t ftz_bit = ftz ? (1 << 15) : 0;
+    uint32_t saved_mxcsr = 0;
+
+    asm volatile("stmxcsr %[saved_mxcsr]\n"
+                 "stmxcsr %[mxcsr]\n"
+                 "andl $0xffff7fc0, %[mxcsr]\n"
+                 "orl %[ftz_bit], %[mxcsr]\n"
+                 "ldmxcsr %[mxcsr]\n"
+                 "movq %[a], %%xmm0\n"
+                 "movq %[m], %%xmm1\n"
+                 "movq %[n], %%xmm2\n"
+                 /* xmm0 = xmm0 + xmm2 * xmm1 */
+                 "vfmadd231sd %%xmm1, %%xmm2, %%xmm0\n"
+                 "movq %%xmm0, %[r]\n"
+                 "stmxcsr %[mxcsr]\n"
+                 "ldmxcsr %[saved_mxcsr]\n"
+                 : [r] "=r" (r), [mxcsr] "=m" (mxcsr),
+                   [saved_mxcsr] "=m" (saved_mxcsr)
+                 : [n] "r" (n), [m] "r" (m), [a] "r" (a),
+                   [ftz_bit] "r" (ftz_bit)
+                 : "xmm0", "xmm1", "xmm2");
+    *mxcsr_p = mxcsr & 0x3f;
+    printf("vfmadd132sd 0x%" PRIx64 " 0x%" PRIx64 " 0x%" PRIx64
+           " = 0x%" PRIx64 " MXCSR flags 0x%" PRIx32 "\n",
+           n, m, a, r, *mxcsr_p);
+    return r;
+}
+
+typedef struct testdata {
+    /* Input n, m, a */
+    uint64_t n;
+    uint64_t m;
+    uint64_t a;
+    bool ftz;
+    /* Expected result */
+    uint64_t expected_r;
+    /* Expected low 6 bits of MXCSR (the Flag bits) */
+    uint32_t expected_mxcsr;
+} testdata;
+
+static testdata tests[] = {
+    { 0, 0x7ff0000000000000, 0x7ff000000000aaaa, false, /* 0 * Inf + SNaN */
+      0x7ff800000000aaaa, 1 }, /* Should be QNaN and does raise Invalid */
+    { 0, 0x7ff0000000000000, 0x7ff800000000aaaa, false, /* 0 * Inf + QNaN */
+      0x7ff800000000aaaa, 0 }, /* Should be QNaN and does *not* raise Invalid */
+    /*
+     * These inputs give a result which is tiny before rounding but which
+     * becomes non-tiny after rounding. x86 is a "detect tininess after
+     * rounding" architecture, so it should give a non-denormal result and
+     * not set the Underflow flag (only the Precision flag for an inexact
+     * result).
+     */
+    { 0x3fdfffffffffffff, 0x001fffffffffffff, 0x801fffffffffffff, false,
+      0x8010000000000000, 0x20 },
+    /*
+     * Flushing of denormal outputs to zero should also happen after
+     * rounding, so setting FTZ should not affect the result or the flags.
+     * QEMU currently does not emulate this correctly because we do the
+     * flush-to-zero check before rounding, so we incorrectly produce a
+     * zero result and set Underflow as well as Precision.
+     */
+#ifdef ENABLE_FAILING_TESTS
+    { 0x3fdfffffffffffff, 0x001fffffffffffff, 0x801fffffffffffff, true,
+      0x8010000000000000, 0x20 }, /* Enabling FTZ shouldn't change flags */
+#endif
+};
+
+int main(void)
+{
+    bool passed = true;
+    for (int i = 0; i < ARRAY_SIZE(tests); i++) {
+        uint32_t mxcsr;
+        uint64_t r = do_fmadd(tests[i].n, tests[i].m, tests[i].a,
+                              tests[i].ftz, &mxcsr);
+        if (r != tests[i].expected_r) {
+            printf("expected result 0x%" PRIx64 "\n", tests[i].expected_r);
+            passed = false;
+        }
+        if (mxcsr != tests[i].expected_mxcsr) {
+            printf("expected MXCSR flags 0x%x\n", tests[i].expected_mxcsr);
+            passed = false;
+        }
+    }
+    return passed ? 0 : 1;
+}
diff --git a/tests/tcg/x86_64/Makefile.target b/tests/tcg/x86_64/Makefile.target
index d6dff559c7d..be20fc64e88 100644
--- a/tests/tcg/x86_64/Makefile.target
+++ b/tests/tcg/x86_64/Makefile.target
@@ -18,6 +18,7 @@ X86_64_TESTS += adox
 X86_64_TESTS += test-1648
 X86_64_TESTS += test-2175
 X86_64_TESTS += cross-modifying-code
+X86_64_TESTS += fma
 TESTS=$(MULTIARCH_TESTS) $(X86_64_TESTS) test-x86_64
 else
 TESTS=$(MULTIARCH_TESTS)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 03/76] target/arm: arm_reset_sve_state() should set FPSR, not FPCR
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
  2025-01-24 16:27 ` [PATCH 01/76] target/i386: Do not raise Invalid for 0 * Inf + QNaN Peter Maydell
  2025-01-24 16:27 ` [PATCH 02/76] tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:07   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host() Peter Maydell
                   ` (74 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

The pseudocode ResetSVEState() does:
    FPSR = ZeroExtend(0x0800009f<31:0>, 64);
but QEMU's arm_reset_sve_state() called vfp_set_fpcr() by accident.

Before the advent of FEAT_AFP, this was only setting a collection of
RES0 bits, which vfp_set_fpsr() would then ignore, so the only effect
was that we didn't actually set the FPSR the way we are supposed to
do.  Once FEAT_AFP is implemented, setting the bottom bits of FPSR
will change the floating point behaviour.

Call vfp_set_fpsr(), as we ought to.

(Note for stable backports: commit 7f2a01e7368f9 moved this function
from sme_helper.c to helper.c, but it had the same bug before the
move too.)

Cc: qemu-stable@nongnu.org
Fixes: f84734b87461 ("target/arm: Implement SMSTART, SMSTOP")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 63997678513..40bdfc851a5 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6413,7 +6413,7 @@ static void arm_reset_sve_state(CPUARMState *env)
     memset(env->vfp.zregs, 0, sizeof(env->vfp.zregs));
     /* Recall that FFR is stored as pregs[16]. */
     memset(env->vfp.pregs, 0, sizeof(env->vfp.pregs));
-    vfp_set_fpcr(env, 0x0800009f);
+    vfp_set_fpsr(env, 0x0800009f);
 }
 
 void aarch64_set_svcr(CPUARMState *env, uint64_t new, uint64_t mask)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host()
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (2 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 03/76] target/arm: arm_reset_sve_state() should set FPSR, not FPCR Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:07   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 05/76] target/arm: Use uint32_t " Peter Maydell
                   ` (73 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Use the FPSR_ named constants in vfp_exceptbits_from_host(),
rather than hardcoded magic numbers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index fc20a567530..fcc9e5d382e 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -39,22 +39,22 @@ static inline int vfp_exceptbits_from_host(int host_bits)
     int target_bits = 0;
 
     if (host_bits & float_flag_invalid) {
-        target_bits |= 1;
+        target_bits |= FPSR_IOC;
     }
     if (host_bits & float_flag_divbyzero) {
-        target_bits |= 2;
+        target_bits |= FPSR_DZC;
     }
     if (host_bits & float_flag_overflow) {
-        target_bits |= 4;
+        target_bits |= FPSR_OFC;
     }
     if (host_bits & (float_flag_underflow | float_flag_output_denormal)) {
-        target_bits |= 8;
+        target_bits |= FPSR_UFC;
     }
     if (host_bits & float_flag_inexact) {
-        target_bits |= 0x10;
+        target_bits |= FPSR_IXC;
     }
     if (host_bits & float_flag_input_denormal) {
-        target_bits |= 0x80;
+        target_bits |= FPSR_IDC;
     }
     return target_bits;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 05/76] target/arm: Use uint32_t in vfp_exceptbits_from_host()
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (3 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host() Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:08   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 06/76] target/arm: Define new fp_status_a32 and fp_status_a64 Peter Maydell
                   ` (72 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

In vfp_exceptbits_from_host(), we accumulate the FPSR flags in
an "int", and our return type is also "int". However, the only
callsite returns the same information as a uint32_t, and
more generally we handle FPSR values in the code as uint32_t,
not int. Bring this function in to line with that convention.

There is no behaviour change because none of the FPSR bits
we set in this function are bit 31. The input argument to
the function remains 'int' because that is the return type
of the softfloat get_float_exception_flags().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index fcc9e5d382e..afc41420eb1 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -34,9 +34,9 @@
 #ifdef CONFIG_TCG
 
 /* Convert host exception flags to vfp form.  */
-static inline int vfp_exceptbits_from_host(int host_bits)
+static inline uint32_t vfp_exceptbits_from_host(int host_bits)
 {
-    int target_bits = 0;
+    uint32_t target_bits = 0;
 
     if (host_bits & float_flag_invalid) {
         target_bits |= FPSR_IOC;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 06/76] target/arm: Define new fp_status_a32 and fp_status_a64
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (4 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 05/76] target/arm: Use uint32_t " Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:12   ` Richard Henderson
  2025-01-27  4:59   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 07/76] target/arm: Use vfp.fp_status_a64 in A64-only helper functions Peter Maydell
                   ` (71 subsequent siblings)
  77 siblings, 2 replies; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

We want to split the existing fp_status in the Arm CPUState into
separate float_status fields for AArch32 and AArch64.  (This is
because new control bits defined by FEAT_AFP only have an effect for
AArch64, not AArch32.) To make this split we will:
 * define new fp_status_a32 and fp_status_a64 which have
   identical behaviour to the existing fp_status
 * move existing uses of fp_status to fp_status_a32 or
   fp_status_a64 as appropriate
 * delete the old fp_status when it has no uses left

In this patch we add the new float_status fields.

We will also need to split fp_status_f16, but we will do that
as a separate series of patches.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           |  4 ++++
 target/arm/tcg/translate.h | 12 ++++++++++++
 target/arm/cpu.c           |  2 ++
 target/arm/vfp_helper.c    | 12 ++++++++++++
 4 files changed, 30 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9a6e8e589cc..337c5383748 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -634,6 +634,8 @@ typedef struct CPUArchState {
         /* There are a number of distinct float control structures:
          *
          *  fp_status: is the "normal" fp status.
+         *  fp_status_a32: is the "normal" fp status for AArch32 insns
+         *  fp_status_a64: is the "normal" fp status for AArch64 insns
          *  fp_status_fp16: used for half-precision calculations
          *  standard_fp_status : the ARM "Standard FPSCR Value"
          *  standard_fp_status_fp16 : used for half-precision
@@ -659,6 +661,8 @@ typedef struct CPUArchState {
          * an explicit FPSCR read.
          */
         float_status fp_status;
+        float_status fp_status_a32;
+        float_status fp_status_a64;
         float_status fp_status_f16;
         float_status standard_fp_status;
         float_status standard_fp_status_f16;
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 2d37d7c9f21..a7509b314b0 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -671,6 +671,8 @@ static inline CPUARMTBFlags arm_tbflags_from_tb(const TranslationBlock *tb)
  */
 typedef enum ARMFPStatusFlavour {
     FPST_FPCR,
+    FPST_FPCR_A32,
+    FPST_FPCR_A64,
     FPST_FPCR_F16,
     FPST_STD,
     FPST_STD_F16,
@@ -686,6 +688,10 @@ typedef enum ARMFPStatusFlavour {
  *
  * FPST_FPCR
  *   for non-FP16 operations controlled by the FPCR
+ * FPST_FPCR_A32
+ *   for AArch32 non-FP16 operations controlled by the FPCR
+ * FPST_FPCR_A64
+ *   for AArch64 non-FP16 operations controlled by the FPCR
  * FPST_FPCR_F16
  *   for operations controlled by the FPCR where FPCR.FZ16 is to be used
  * FPST_STD
@@ -702,6 +708,12 @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
     case FPST_FPCR:
         offset = offsetof(CPUARMState, vfp.fp_status);
         break;
+    case FPST_FPCR_A32:
+        offset = offsetof(CPUARMState, vfp.fp_status_a32);
+        break;
+    case FPST_FPCR_A64:
+        offset = offsetof(CPUARMState, vfp.fp_status_a64);
+        break;
     case FPST_FPCR_F16:
         offset = offsetof(CPUARMState, vfp.fp_status_f16);
         break;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index dc0231233a6..8bdd535db95 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -573,6 +573,8 @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     set_default_nan_mode(1, &env->vfp.standard_fp_status);
     set_default_nan_mode(1, &env->vfp.standard_fp_status_f16);
     arm_set_default_fp_behaviours(&env->vfp.fp_status);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index afc41420eb1..7475f97e0ce 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -64,6 +64,8 @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     uint32_t i;
 
     i = get_float_exception_flags(&env->vfp.fp_status);
+    i |= get_float_exception_flags(&env->vfp.fp_status_a32);
+    i |= get_float_exception_flags(&env->vfp.fp_status_a64);
     i |= get_float_exception_flags(&env->vfp.standard_fp_status);
     /* FZ16 does not generate an input denormal exception.  */
     i |= (get_float_exception_flags(&env->vfp.fp_status_f16)
@@ -81,6 +83,8 @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      * be the architecturally up-to-date exception flag information first.
      */
     set_float_exception_flags(0, &env->vfp.fp_status);
+    set_float_exception_flags(0, &env->vfp.fp_status_a32);
+    set_float_exception_flags(0, &env->vfp.fp_status_a64);
     set_float_exception_flags(0, &env->vfp.fp_status_f16);
     set_float_exception_flags(0, &env->vfp.standard_fp_status);
     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
@@ -109,6 +113,8 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
             break;
         }
         set_float_rounding_mode(i, &env->vfp.fp_status);
+        set_float_rounding_mode(i, &env->vfp.fp_status_a32);
+        set_float_rounding_mode(i, &env->vfp.fp_status_a64);
         set_float_rounding_mode(i, &env->vfp.fp_status_f16);
     }
     if (changed & FPCR_FZ16) {
@@ -122,10 +128,16 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         bool ftz_enabled = val & FPCR_FZ;
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
     }
     if (changed & FPCR_DN) {
         bool dnan_enabled = val & FPCR_DN;
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16);
     }
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 07/76] target/arm: Use vfp.fp_status_a64 in A64-only helper functions
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (5 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 06/76] target/arm: Define new fp_status_a32 and fp_status_a64 Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:15   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 08/76] target/arm: Use fp_status_a32 in vjvct helper Peter Maydell
                   ` (70 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Switch from vfp.fp_status to vfp.fp_status_a64 for helpers which:
 * directly reference an fp_status field
 * are called only from the A64 decoder
 * are not called inside a set_rmode/restore_rmode sequence

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sme_helper.c |  2 +-
 target/arm/tcg/vec_helper.c | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index a0e6b4a41ea..2aad00d3ad9 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -1044,7 +1044,7 @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
      * round-to-odd -- see above.
      */
     fpst_f16 = env->vfp.fp_status_f16;
-    fpst_std = env->vfp.fp_status;
+    fpst_std = env->vfp.fp_status_a64;
     set_default_nan_mode(true, &fpst_std);
     set_default_nan_mode(true, &fpst_f16);
     fpst_odd = fpst_std;
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index e3083c6e84e..44ee2c81fad 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2066,7 +2066,7 @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
 void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
                             CPUARMState *env, uint32_t desc)
 {
-    do_fmlal(vd, vn, vm, &env->vfp.fp_status, desc,
+    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, desc,
              get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
 }
 
@@ -2076,7 +2076,7 @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
     intptr_t i, oprsz = simd_oprsz(desc);
     uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
-    float_status *status = &env->vfp.fp_status;
+    float_status *status = &env->vfp.fp_status_a64;
     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16);
 
     for (i = 0; i < oprsz; i += sizeof(float32)) {
@@ -2128,7 +2128,7 @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
 void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
                                 CPUARMState *env, uint32_t desc)
 {
-    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status, desc,
+    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
 }
 
@@ -2139,7 +2139,7 @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
     uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
-    float_status *status = &env->vfp.fp_status;
+    float_status *status = &env->vfp.fp_status_a64;
     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16);
 
     for (i = 0; i < oprsz; i += 16) {
@@ -2808,7 +2808,7 @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
      */
     bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
 
-    *statusp = env->vfp.fp_status;
+    *statusp = env->vfp.fp_status_a64;
     set_default_nan_mode(true, statusp);
 
     if (ebf) {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 08/76] target/arm: Use fp_status_a32 in vjvct helper
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (6 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 07/76] target/arm: Use vfp.fp_status_a64 in A64-only helper functions Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:16   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers Peter Maydell
                   ` (69 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Use fp_status_a32 in the vjcvt helper function; this is called only
from the A32/T32 decoder and is not used inside a
set_rmode/restore_rmode sequence.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 7475f97e0ce..0671ba3a88b 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -1144,7 +1144,7 @@ uint64_t HELPER(fjcvtzs)(float64 value, float_status *status)
 
 uint32_t HELPER(vjcvt)(float64 value, CPUARMState *env)
 {
-    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status);
+    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status_a32);
     uint32_t result = pair;
     uint32_t z = (pair >> 32) == 0;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (7 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 08/76] target/arm: Use fp_status_a32 in vjvct helper Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:18   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 10/76] target/arm: Use FPST_FPCR_A32 in A32 decoder Peter Maydell
                   ` (68 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

The helpers vfp_cmps, vfp_cmpes, vfp_cmpd, vfp_cmped are used only from
the A32 decoder; the A64 decoder uses separate vfp_cmps_a64 etc helpers
(because for A64 we update the main NZCV flags and for A32 we update
the FPSCR NZCV flags). So we can make these helpers use the fp_status_a32
field instead of fp_status.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
We could in theory make A32 use the a64 helpers and do the setting
of vfp.fpsr NZCV in the generated code from the helper return value,
but it doesn't seem worthwhile to me.
---
 target/arm/vfp_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 0671ba3a88b..034f26e5daa 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -373,8 +373,8 @@ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
         FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
 }
 DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status_f16)
-DO_VFP_cmp(s, float32, float32, fp_status)
-DO_VFP_cmp(d, float64, float64, fp_status)
+DO_VFP_cmp(s, float32, float32, fp_status_a32)
+DO_VFP_cmp(d, float64, float64, fp_status_a32)
 #undef DO_VFP_cmp
 
 /* Integer to float and float to integer conversions */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 10/76] target/arm: Use FPST_FPCR_A32 in A32 decoder
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (8 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:18   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 11/76] target/arm: Use FPST_FPCR_A64 in A64 decoder Peter Maydell
                   ` (67 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

In the A32 decoder, use FPST_FPCR_A32 rather than FPST_FPCR.  By
doing an automated conversion of the whole file we avoid possibly
using more than one fpst value in a set_rmode/op/restore_rmode
sequence.

Patch created with
  perl -p -i -e 's/FPST_FPCR(?!_)/FPST_FPCR_A32/g' target/arm/tcg/translate-vfp.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-vfp.c | 54 +++++++++++++++++-----------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/target/arm/tcg/translate-vfp.c b/target/arm/tcg/translate-vfp.c
index 3cbe9a7418d..e1b8243c5d9 100644
--- a/target/arm/tcg/translate-vfp.c
+++ b/target/arm/tcg/translate-vfp.c
@@ -462,7 +462,7 @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
     if (sz == 1) {
         fpst = fpstatus_ptr(FPST_FPCR_F16);
     } else {
-        fpst = fpstatus_ptr(FPST_FPCR);
+        fpst = fpstatus_ptr(FPST_FPCR_A32);
     }
 
     tcg_rmode = gen_set_rmode(rounding, fpst);
@@ -529,7 +529,7 @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
     if (sz == 1) {
         fpst = fpstatus_ptr(FPST_FPCR_F16);
     } else {
-        fpst = fpstatus_ptr(FPST_FPCR);
+        fpst = fpstatus_ptr(FPST_FPCR_A32);
     }
 
     tcg_shift = tcg_constant_i32(0);
@@ -1398,7 +1398,7 @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
     f0 = tcg_temp_new_i32();
     f1 = tcg_temp_new_i32();
     fd = tcg_temp_new_i32();
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
 
     vfp_load_reg32(f0, vn);
     vfp_load_reg32(f1, vm);
@@ -1517,7 +1517,7 @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     f0 = tcg_temp_new_i64();
     f1 = tcg_temp_new_i64();
     fd = tcg_temp_new_i64();
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
 
     vfp_load_reg64(f0, vn);
     vfp_load_reg64(f1, vm);
@@ -2181,7 +2181,7 @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
         /* VFNMA, VFNMS */
         gen_vfp_negs(vd, vd);
     }
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
     vfp_store_reg32(vd, a->vd);
     return true;
@@ -2246,7 +2246,7 @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
         /* VFNMA, VFNMS */
         gen_vfp_negd(vd, vd);
     }
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
     vfp_store_reg64(vd, a->vd);
     return true;
@@ -2429,12 +2429,12 @@ static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
 
 static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
 {
-    gen_helper_vfp_sqrts(vd, vm, fpstatus_ptr(FPST_FPCR));
+    gen_helper_vfp_sqrts(vd, vm, fpstatus_ptr(FPST_FPCR_A32));
 }
 
 static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
 {
-    gen_helper_vfp_sqrtd(vd, vm, fpstatus_ptr(FPST_FPCR));
+    gen_helper_vfp_sqrtd(vd, vm, fpstatus_ptr(FPST_FPCR_A32));
 }
 
 DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp, aa32_fp16_arith)
@@ -2565,7 +2565,7 @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
         return true;
     }
 
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     ahp_mode = get_ahp_flag();
     tmp = tcg_temp_new_i32();
     /* The T bit tells us if we want the low or high 16 bits of Vm */
@@ -2599,7 +2599,7 @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
         return true;
     }
 
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     ahp_mode = get_ahp_flag();
     tmp = tcg_temp_new_i32();
     /* The T bit tells us if we want the low or high 16 bits of Vm */
@@ -2623,7 +2623,7 @@ static bool trans_VCVT_b16_f32(DisasContext *s, arg_VCVT_b16_f32 *a)
         return true;
     }
 
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     tmp = tcg_temp_new_i32();
 
     vfp_load_reg32(tmp, a->vm);
@@ -2646,7 +2646,7 @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
         return true;
     }
 
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     ahp_mode = get_ahp_flag();
     tmp = tcg_temp_new_i32();
 
@@ -2680,7 +2680,7 @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
         return true;
     }
 
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     ahp_mode = get_ahp_flag();
     tmp = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
@@ -2727,7 +2727,7 @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
 
     tmp = tcg_temp_new_i32();
     vfp_load_reg32(tmp, a->vm);
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     gen_helper_rints(tmp, tmp, fpst);
     vfp_store_reg32(tmp, a->vd);
     return true;
@@ -2757,7 +2757,7 @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
 
     tmp = tcg_temp_new_i64();
     vfp_load_reg64(tmp, a->vm);
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     gen_helper_rintd(tmp, tmp, fpst);
     vfp_store_reg64(tmp, a->vd);
     return true;
@@ -2803,7 +2803,7 @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
 
     tmp = tcg_temp_new_i32();
     vfp_load_reg32(tmp, a->vm);
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     tcg_rmode = gen_set_rmode(FPROUNDING_ZERO, fpst);
     gen_helper_rints(tmp, tmp, fpst);
     gen_restore_rmode(tcg_rmode, fpst);
@@ -2836,7 +2836,7 @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
 
     tmp = tcg_temp_new_i64();
     vfp_load_reg64(tmp, a->vm);
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     tcg_rmode = gen_set_rmode(FPROUNDING_ZERO, fpst);
     gen_helper_rintd(tmp, tmp, fpst);
     gen_restore_rmode(tcg_rmode, fpst);
@@ -2880,7 +2880,7 @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
 
     tmp = tcg_temp_new_i32();
     vfp_load_reg32(tmp, a->vm);
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     gen_helper_rints_exact(tmp, tmp, fpst);
     vfp_store_reg32(tmp, a->vd);
     return true;
@@ -2910,7 +2910,7 @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
 
     tmp = tcg_temp_new_i64();
     vfp_load_reg64(tmp, a->vm);
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     gen_helper_rintd_exact(tmp, tmp, fpst);
     vfp_store_reg64(tmp, a->vd);
     return true;
@@ -2937,7 +2937,7 @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
     vfp_load_reg32(vm, a->vm);
-    gen_helper_vfp_fcvtds(vd, vm, fpstatus_ptr(FPST_FPCR));
+    gen_helper_vfp_fcvtds(vd, vm, fpstatus_ptr(FPST_FPCR_A32));
     vfp_store_reg64(vd, a->vd);
     return true;
 }
@@ -2963,7 +2963,7 @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
     vfp_load_reg64(vm, a->vm);
-    gen_helper_vfp_fcvtsd(vd, vm, fpstatus_ptr(FPST_FPCR));
+    gen_helper_vfp_fcvtsd(vd, vm, fpstatus_ptr(FPST_FPCR_A32));
     vfp_store_reg32(vd, a->vd);
     return true;
 }
@@ -3010,7 +3010,7 @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
 
     vm = tcg_temp_new_i32();
     vfp_load_reg32(vm, a->vm);
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     if (a->s) {
         /* i32 -> f32 */
         gen_helper_vfp_sitos(vm, vm, fpst);
@@ -3044,7 +3044,7 @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
     vfp_load_reg32(vm, a->vm);
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     if (a->s) {
         /* i32 -> f64 */
         gen_helper_vfp_sitod(vd, vm, fpst);
@@ -3161,7 +3161,7 @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
     vd = tcg_temp_new_i32();
     vfp_load_reg32(vd, a->vd);
 
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     shift = tcg_constant_i32(frac_bits);
 
     /* Switch on op:U:sx bits */
@@ -3223,7 +3223,7 @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     vd = tcg_temp_new_i64();
     vfp_load_reg64(vd, a->vd);
 
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     shift = tcg_constant_i32(frac_bits);
 
     /* Switch on op:U:sx bits */
@@ -3307,7 +3307,7 @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
         return true;
     }
 
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     vm = tcg_temp_new_i32();
     vfp_load_reg32(vm, a->vm);
 
@@ -3347,7 +3347,7 @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
         return true;
     }
 
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A32);
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
     vfp_load_reg64(vm, a->vm);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 11/76] target/arm: Use FPST_FPCR_A64 in A64 decoder
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (9 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 10/76] target/arm: Use FPST_FPCR_A32 in A32 decoder Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:19   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 12/76] target/arm: Remove now-unused vfp.fp_status and FPST_FPCR Peter Maydell
                   ` (66 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

In the A64 decoder, use FPST_FPCR_A32 rather than FPST_FPCR.  By
doing an automated conversion of the whole file we avoid possibly
using more than one fpst value in a set_rmode/op/restore_rmode
sequence.

Patch created with

  perl -p -i -e 's/FPST_FPCR(?!_)/FPST_FPCR_A64/g' target/arm/tcg/translate-{a64,sve,sme}.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c |  70 +++++++++++-----------
 target/arm/tcg/translate-sme.c |   4 +-
 target/arm/tcg/translate-sve.c | 106 ++++++++++++++++-----------------
 3 files changed, 90 insertions(+), 90 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index bd814849c19..9f10b2b2e6a 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -726,7 +726,7 @@ static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn,
                               int rm, bool is_fp16, int data,
                               gen_helper_gvec_3_ptr *fn)
 {
-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16 : FPST_FPCR);
+    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
     tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn),
                        vec_full_reg_offset(s, rm), fpst,
@@ -768,7 +768,7 @@ static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
                               int rm, int ra, bool is_fp16, int data,
                               gen_helper_gvec_4_ptr *fn)
 {
-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16 : FPST_FPCR);
+    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
     tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn),
                        vec_full_reg_offset(s, rm),
@@ -5043,7 +5043,7 @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
         if (fp_access_check(s)) {
             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
             TCGv_i64 t1 = read_fp_dreg(s, a->rm);
-            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_FPCR));
+            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_FPCR_A64));
             write_fp_dreg(s, a->rd, t0);
         }
         break;
@@ -5051,7 +5051,7 @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
         if (fp_access_check(s)) {
             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
             TCGv_i32 t1 = read_fp_sreg(s, a->rm);
-            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_FPCR));
+            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_FPCR_A64));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -5243,9 +5243,9 @@ static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
             TCGv_i64 t1 = tcg_constant_i64(0);
             if (swap) {
-                f->gen_d(t0, t1, t0, fpstatus_ptr(FPST_FPCR));
+                f->gen_d(t0, t1, t0, fpstatus_ptr(FPST_FPCR_A64));
             } else {
-                f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_FPCR));
+                f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_FPCR_A64));
             }
             write_fp_dreg(s, a->rd, t0);
         }
@@ -5255,9 +5255,9 @@ static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
             TCGv_i32 t1 = tcg_constant_i32(0);
             if (swap) {
-                f->gen_s(t0, t1, t0, fpstatus_ptr(FPST_FPCR));
+                f->gen_s(t0, t1, t0, fpstatus_ptr(FPST_FPCR_A64));
             } else {
-                f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_FPCR));
+                f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_FPCR_A64));
             }
             write_fp_sreg(s, a->rd, t0);
         }
@@ -6207,7 +6207,7 @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
             TCGv_i64 t1 = tcg_temp_new_i64();
 
             read_vec_element(s, t1, a->rm, a->idx, MO_64);
-            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_FPCR));
+            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_FPCR_A64));
             write_fp_dreg(s, a->rd, t0);
         }
         break;
@@ -6217,7 +6217,7 @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
             TCGv_i32 t1 = tcg_temp_new_i32();
 
             read_vec_element_i32(s, t1, a->rm, a->idx, MO_32);
-            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_FPCR));
+            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_FPCR_A64));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -6256,7 +6256,7 @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
             if (neg) {
                 gen_vfp_negd(t1, t1);
             }
-            gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_FPCR));
+            gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_FPCR_A64));
             write_fp_dreg(s, a->rd, t0);
         }
         break;
@@ -6270,7 +6270,7 @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
             if (neg) {
                 gen_vfp_negs(t1, t1);
             }
-            gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_FPCR));
+            gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_FPCR_A64));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -6601,7 +6601,7 @@ static bool do_fp3_scalar_pair(DisasContext *s, arg_rr_e *a, const FPScalar *f)
 
             read_vec_element(s, t0, a->rn, 0, MO_64);
             read_vec_element(s, t1, a->rn, 1, MO_64);
-            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_FPCR));
+            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_FPCR_A64));
             write_fp_dreg(s, a->rd, t0);
         }
         break;
@@ -6612,7 +6612,7 @@ static bool do_fp3_scalar_pair(DisasContext *s, arg_rr_e *a, const FPScalar *f)
 
             read_vec_element_i32(s, t0, a->rn, 0, MO_32);
             read_vec_element_i32(s, t1, a->rn, 1, MO_32);
-            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_FPCR));
+            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_FPCR_A64));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -6762,7 +6762,7 @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             if (neg_n) {
                 gen_vfp_negd(tn, tn);
             }
-            fpst = fpstatus_ptr(FPST_FPCR);
+            fpst = fpstatus_ptr(FPST_FPCR_A64);
             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
             write_fp_dreg(s, a->rd, ta);
         }
@@ -6780,7 +6780,7 @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             if (neg_n) {
                 gen_vfp_negs(tn, tn);
             }
-            fpst = fpstatus_ptr(FPST_FPCR);
+            fpst = fpstatus_ptr(FPST_FPCR_A64);
             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
             write_fp_sreg(s, a->rd, ta);
         }
@@ -6895,7 +6895,7 @@ static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a,
     if (fp_access_check(s)) {
         MemOp esz = a->esz;
         int elts = (a->q ? 16 : 8) >> esz;
-        TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+        TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
         TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst, fn);
         write_fp_sreg(s, a->rd, res);
     }
@@ -6939,7 +6939,7 @@ static void handle_fp_compare(DisasContext *s, int size,
                               bool cmp_with_zero, bool signal_all_nans)
 {
     TCGv_i64 tcg_flags = tcg_temp_new_i64();
-    TCGv_ptr fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+    TCGv_ptr fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
 
     if (size == MO_64) {
         TCGv_i64 tcg_vn, tcg_vm;
@@ -8407,7 +8407,7 @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
     if (rmode >= 0) {
         tcg_rmode = gen_set_rmode(rmode, fpst);
     }
@@ -8513,7 +8513,7 @@ static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
     if (fp_access_check(s)) {
         TCGv_i32 tcg_rn = read_fp_sreg(s, a->rn);
         TCGv_i64 tcg_rd = tcg_temp_new_i64();
-        TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
+        TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_A64);
 
         gen_helper_vfp_fcvtds(tcg_rd, tcg_rn, fpst);
         write_fp_dreg(s, a->rd, tcg_rd);
@@ -8526,7 +8526,7 @@ static bool trans_FCVT_s_hs(DisasContext *s, arg_rr *a)
     if (fp_access_check(s)) {
         TCGv_i32 tmp = read_fp_sreg(s, a->rn);
         TCGv_i32 ahp = get_ahp_flag();
-        TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
+        TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_A64);
 
         gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
         /* write_fp_sreg is OK here because top half of result is zero */
@@ -8540,7 +8540,7 @@ static bool trans_FCVT_s_sd(DisasContext *s, arg_rr *a)
     if (fp_access_check(s)) {
         TCGv_i64 tcg_rn = read_fp_dreg(s, a->rn);
         TCGv_i32 tcg_rd = tcg_temp_new_i32();
-        TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
+        TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_A64);
 
         gen_helper_vfp_fcvtsd(tcg_rd, tcg_rn, fpst);
         write_fp_sreg(s, a->rd, tcg_rd);
@@ -8554,7 +8554,7 @@ static bool trans_FCVT_s_hd(DisasContext *s, arg_rr *a)
         TCGv_i64 tcg_rn = read_fp_dreg(s, a->rn);
         TCGv_i32 tcg_rd = tcg_temp_new_i32();
         TCGv_i32 ahp = get_ahp_flag();
-        TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
+        TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_A64);
 
         gen_helper_vfp_fcvt_f64_to_f16(tcg_rd, tcg_rn, fpst, ahp);
         /* write_fp_sreg is OK here because top half of tcg_rd is zero */
@@ -8568,7 +8568,7 @@ static bool trans_FCVT_s_sh(DisasContext *s, arg_rr *a)
     if (fp_access_check(s)) {
         TCGv_i32 tcg_rn = read_fp_hreg(s, a->rn);
         TCGv_i32 tcg_rd = tcg_temp_new_i32();
-        TCGv_ptr tcg_fpst = fpstatus_ptr(FPST_FPCR);
+        TCGv_ptr tcg_fpst = fpstatus_ptr(FPST_FPCR_A64);
         TCGv_i32 tcg_ahp = get_ahp_flag();
 
         gen_helper_vfp_fcvt_f16_to_f32(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
@@ -8582,7 +8582,7 @@ static bool trans_FCVT_s_dh(DisasContext *s, arg_rr *a)
     if (fp_access_check(s)) {
         TCGv_i32 tcg_rn = read_fp_hreg(s, a->rn);
         TCGv_i64 tcg_rd = tcg_temp_new_i64();
-        TCGv_ptr tcg_fpst = fpstatus_ptr(FPST_FPCR);
+        TCGv_ptr tcg_fpst = fpstatus_ptr(FPST_FPCR_A64);
         TCGv_i32 tcg_ahp = get_ahp_flag();
 
         gen_helper_vfp_fcvt_f16_to_f64(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
@@ -8598,7 +8598,7 @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
     TCGv_i32 tcg_shift, tcg_single;
     TCGv_i64 tcg_double;
 
-    tcg_fpstatus = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+    tcg_fpstatus = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
     tcg_shift = tcg_constant_i32(shift);
 
     switch (esz) {
@@ -8693,7 +8693,7 @@ static void do_fcvt_scalar(DisasContext *s, MemOp out, MemOp esz,
     TCGv_ptr tcg_fpstatus;
     TCGv_i32 tcg_shift, tcg_rmode, tcg_single;
 
-    tcg_fpstatus = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+    tcg_fpstatus = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
     tcg_shift = tcg_constant_i32(shift);
     tcg_rmode = gen_set_rmode(rmode, tcg_fpstatus);
 
@@ -8857,7 +8857,7 @@ static bool trans_FJCVTZS(DisasContext *s, arg_FJCVTZS *a)
     }
     if (fp_access_check(s)) {
         TCGv_i64 t = read_fp_dreg(s, a->rn);
-        TCGv_ptr fpstatus = fpstatus_ptr(FPST_FPCR);
+        TCGv_ptr fpstatus = fpstatus_ptr(FPST_FPCR_A64);
 
         gen_helper_fjcvtzs(t, t, fpstatus);
 
@@ -9115,7 +9115,7 @@ static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
      * with von Neumann rounding (round to odd)
      */
     TCGv_i32 tmp = tcg_temp_new_i32();
-    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_FPCR));
+    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_FPCR_A64));
     tcg_gen_extu_i32_i64(d, tmp);
 }
 
@@ -9208,7 +9208,7 @@ static void gen_fcvtn_hs(TCGv_i64 d, TCGv_i64 n)
 {
     TCGv_i32 tcg_lo = tcg_temp_new_i32();
     TCGv_i32 tcg_hi = tcg_temp_new_i32();
-    TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
+    TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_A64);
     TCGv_i32 ahp = get_ahp_flag();
 
     tcg_gen_extr_i64_i32(tcg_lo, tcg_hi, n);
@@ -9221,7 +9221,7 @@ static void gen_fcvtn_hs(TCGv_i64 d, TCGv_i64 n)
 static void gen_fcvtn_sd(TCGv_i64 d, TCGv_i64 n)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
-    TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
+    TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_A64);
 
     gen_helper_vfp_fcvtsd(tmp, n, fpst);
     tcg_gen_extu_i32_i64(d, tmp);
@@ -9237,7 +9237,7 @@ TRANS(FCVTXN_v, do_2misc_narrow_vector, a, f_scalar_fcvtxn)
 
 static void gen_bfcvtn_hs(TCGv_i64 d, TCGv_i64 n)
 {
-    TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
+    TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_A64);
     TCGv_i32 tmp = tcg_temp_new_i32();
     gen_helper_bfcvt_pair(tmp, n, fpst);
     tcg_gen_extu_i32_i64(d, tmp);
@@ -9312,7 +9312,7 @@ static bool do_fp1_vector(DisasContext *s, arg_qrr_e *a,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
     if (rmode >= 0) {
         tcg_rmode = gen_set_rmode(rmode, fpst);
     }
@@ -9372,7 +9372,7 @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+    fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
     tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn), fpst,
                        is_q ? 16 : 8, vec_full_reg_size(s),
@@ -9511,7 +9511,7 @@ static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
         return true;
     }
 
-    fpst = fpstatus_ptr(FPST_FPCR);
+    fpst = fpstatus_ptr(FPST_FPCR_A64);
     if (a->esz == MO_64) {
         /* 32 -> 64 bit fp conversion */
         TCGv_i64 tcg_res[2];
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 01ece570164..29bec7dd7bb 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -358,9 +358,9 @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz,
 TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a,
            MO_32, gen_helper_sme_fmopa_h)
 TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a,
-           MO_32, FPST_FPCR, gen_helper_sme_fmopa_s)
+           MO_32, FPST_FPCR_A64, gen_helper_sme_fmopa_s)
 TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a,
-           MO_64, FPST_FPCR, gen_helper_sme_fmopa_d)
+           MO_64, FPST_FPCR_A64, gen_helper_sme_fmopa_d)
 
 TRANS_FEAT(BFMOPA, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa)
 
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index e3031965920..caf8ea18328 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -141,7 +141,7 @@ static bool gen_gvec_fpst_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
                                  arg_rr_esz *a, int data)
 {
     return gen_gvec_fpst_zz(s, fn, a->rd, a->rn, data,
-                            a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+                            a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
 }
 
 /* Invoke an out-of-line helper on 3 Zregs. */
@@ -191,7 +191,7 @@ static bool gen_gvec_fpst_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
                                   arg_rrr_esz *a, int data)
 {
     return gen_gvec_fpst_zzz(s, fn, a->rd, a->rn, a->rm, data,
-                             a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+                             a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
 }
 
 /* Invoke an out-of-line helper on 4 Zregs. */
@@ -397,7 +397,7 @@ static bool gen_gvec_fpst_arg_zpzz(DisasContext *s, gen_helper_gvec_4_ptr *fn,
                                    arg_rprr_esz *a)
 {
     return gen_gvec_fpst_zzzp(s, fn, a->rd, a->rn, a->rm, a->pg, 0,
-                              a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+                              a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
 }
 
 /* Invoke a vector expander on two Zregs and an immediate.  */
@@ -3517,7 +3517,7 @@ static bool do_FMLA_zzxz(DisasContext *s, arg_rrxr_esz *a, bool sub)
     };
     return gen_gvec_fpst_zzzz(s, fns[a->esz], a->rd, a->rn, a->rm, a->ra,
                               (a->index << 1) | sub,
-                              a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+                              a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
 }
 
 TRANS_FEAT(FMLA_zzxz, aa64_sve, do_FMLA_zzxz, a, false)
@@ -3533,7 +3533,7 @@ static gen_helper_gvec_3_ptr * const fmul_idx_fns[4] = {
 };
 TRANS_FEAT(FMUL_zzx, aa64_sve, gen_gvec_fpst_zzz,
            fmul_idx_fns[a->esz], a->rd, a->rn, a->rm, a->index,
-           a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
+           a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
 
 /*
  *** SVE Floating Point Fast Reduction Group
@@ -3566,7 +3566,7 @@ static bool do_reduce(DisasContext *s, arg_rpr_esz *a,
 
     tcg_gen_addi_ptr(t_zn, tcg_env, vec_full_reg_offset(s, a->rn));
     tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, a->pg));
-    status = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+    status = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
 
     fn(temp, t_zn, t_pg, status, t_desc);
 
@@ -3618,7 +3618,7 @@ static bool do_ppz_fp(DisasContext *s, arg_rpr_esz *a,
     if (sve_access_check(s)) {
         unsigned vsz = vec_full_reg_size(s);
         TCGv_ptr status =
-            fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+            fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
 
         tcg_gen_gvec_3_ptr(pred_full_reg_offset(s, a->rd),
                            vec_full_reg_offset(s, a->rn),
@@ -3654,7 +3654,7 @@ static gen_helper_gvec_3_ptr * const ftmad_fns[4] = {
 };
 TRANS_FEAT_NONSTREAMING(FTMAD, aa64_sve, gen_gvec_fpst_zzz,
                         ftmad_fns[a->esz], a->rd, a->rn, a->rm, a->imm,
-                        a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
+                        a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
 
 /*
  *** SVE Floating Point Accumulating Reduction Group
@@ -3687,7 +3687,7 @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
     t_pg = tcg_temp_new_ptr();
     tcg_gen_addi_ptr(t_rm, tcg_env, vec_full_reg_offset(s, a->rm));
     tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, a->pg));
-    t_fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+    t_fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
     t_desc = tcg_constant_i32(simd_desc(vsz, vsz, 0));
 
     fns[a->esz - 1](t_val, t_val, t_rm, t_pg, t_fpst, t_desc);
@@ -3762,7 +3762,7 @@ static void do_fp_scalar(DisasContext *s, int zd, int zn, int pg, bool is_fp16,
     tcg_gen_addi_ptr(t_zn, tcg_env, vec_full_reg_offset(s, zn));
     tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, pg));
 
-    status = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16 : FPST_FPCR);
+    status = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
     desc = tcg_constant_i32(simd_desc(vsz, vsz, 0));
     fn(t_zd, t_zn, t_pg, scalar, status, desc);
 }
@@ -3814,7 +3814,7 @@ static bool do_fp_cmp(DisasContext *s, arg_rprr_esz *a,
     }
     if (sve_access_check(s)) {
         unsigned vsz = vec_full_reg_size(s);
-        TCGv_ptr status = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+        TCGv_ptr status = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
         tcg_gen_gvec_4_ptr(pred_full_reg_offset(s, a->rd),
                            vec_full_reg_offset(s, a->rn),
                            vec_full_reg_offset(s, a->rm),
@@ -3847,7 +3847,7 @@ static gen_helper_gvec_4_ptr * const fcadd_fns[] = {
 };
 TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
            a->rd, a->rn, a->rm, a->pg, a->rot,
-           a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
+           a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
 
 #define DO_FMLA(NAME, name) \
     static gen_helper_gvec_5_ptr * const name##_fns[4] = {              \
@@ -3856,7 +3856,7 @@ TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
     };                                                                  \
     TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp, name##_fns[a->esz], \
                a->rd, a->rn, a->rm, a->ra, a->pg, 0,                    \
-               a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
+               a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
 
 DO_FMLA(FMLA_zpzzz, fmla_zpzzz)
 DO_FMLA(FMLS_zpzzz, fmls_zpzzz)
@@ -3871,35 +3871,35 @@ static gen_helper_gvec_5_ptr * const fcmla_fns[4] = {
 };
 TRANS_FEAT(FCMLA_zpzzz, aa64_sve, gen_gvec_fpst_zzzzp, fcmla_fns[a->esz],
            a->rd, a->rn, a->rm, a->ra, a->pg, a->rot,
-           a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
+           a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
 
 static gen_helper_gvec_4_ptr * const fcmla_idx_fns[4] = {
     NULL, gen_helper_gvec_fcmlah_idx, gen_helper_gvec_fcmlas_idx, NULL
 };
 TRANS_FEAT(FCMLA_zzxz, aa64_sve, gen_gvec_fpst_zzzz, fcmla_idx_fns[a->esz],
            a->rd, a->rn, a->rm, a->ra, a->index * 4 + a->rot,
-           a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
+           a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
 
 /*
  *** SVE Floating Point Unary Operations Predicated Group
  */
 
 TRANS_FEAT(FCVT_sh, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvt_sh, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvt_sh, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVT_hs, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvt_hs, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvt_hs, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(BFCVT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_bfcvt, a, 0, FPST_FPCR)
+           gen_helper_sve_bfcvt, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(FCVT_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvt_dh, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvt_dh, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVT_hd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvt_hd, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvt_hd, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVT_ds, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvt_ds, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvt_ds, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVT_sd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvt_sd, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvt_sd, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(FCVTZS_hh, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvtzs_hh, a, 0, FPST_FPCR_F16)
@@ -3915,22 +3915,22 @@ TRANS_FEAT(FCVTZU_hd, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvtzu_hd, a, 0, FPST_FPCR_F16)
 
 TRANS_FEAT(FCVTZS_ss, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzs_ss, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvtzs_ss, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVTZU_ss, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzu_ss, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvtzu_ss, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVTZS_sd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzs_sd, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvtzs_sd, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVTZU_sd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzu_sd, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvtzu_sd, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVTZS_ds, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzs_ds, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvtzs_ds, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVTZU_ds, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzu_ds, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvtzu_ds, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(FCVTZS_dd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzs_dd, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvtzs_dd, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVTZU_dd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzu_dd, a, 0, FPST_FPCR)
+           gen_helper_sve_fcvtzu_dd, a, 0, FPST_FPCR_A64)
 
 static gen_helper_gvec_3_ptr * const frint_fns[] = {
     NULL,
@@ -3939,7 +3939,7 @@ static gen_helper_gvec_3_ptr * const frint_fns[] = {
     gen_helper_sve_frint_d
 };
 TRANS_FEAT(FRINTI, aa64_sve, gen_gvec_fpst_arg_zpz, frint_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
+           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
 
 static gen_helper_gvec_3_ptr * const frintx_fns[] = {
     NULL,
@@ -3948,7 +3948,7 @@ static gen_helper_gvec_3_ptr * const frintx_fns[] = {
     gen_helper_sve_frintx_d
 };
 TRANS_FEAT(FRINTX, aa64_sve, gen_gvec_fpst_arg_zpz, frintx_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
 
 static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a,
                           ARMFPRounding mode, gen_helper_gvec_3_ptr *fn)
@@ -3965,7 +3965,7 @@ static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a,
     }
 
     vsz = vec_full_reg_size(s);
-    status = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+    status = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
     tmode = gen_set_rmode(mode, status);
 
     tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd),
@@ -3993,14 +3993,14 @@ static gen_helper_gvec_3_ptr * const frecpx_fns[] = {
     gen_helper_sve_frecpx_s, gen_helper_sve_frecpx_d,
 };
 TRANS_FEAT(FRECPX, aa64_sve, gen_gvec_fpst_arg_zpz, frecpx_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
+           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
 
 static gen_helper_gvec_3_ptr * const fsqrt_fns[] = {
     NULL,                   gen_helper_sve_fsqrt_h,
     gen_helper_sve_fsqrt_s, gen_helper_sve_fsqrt_d,
 };
 TRANS_FEAT(FSQRT, aa64_sve, gen_gvec_fpst_arg_zpz, fsqrt_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
+           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
 
 TRANS_FEAT(SCVTF_hh, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_scvt_hh, a, 0, FPST_FPCR_F16)
@@ -4010,14 +4010,14 @@ TRANS_FEAT(SCVTF_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_scvt_dh, a, 0, FPST_FPCR_F16)
 
 TRANS_FEAT(SCVTF_ss, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_scvt_ss, a, 0, FPST_FPCR)
+           gen_helper_sve_scvt_ss, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(SCVTF_ds, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_scvt_ds, a, 0, FPST_FPCR)
+           gen_helper_sve_scvt_ds, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(SCVTF_sd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_scvt_sd, a, 0, FPST_FPCR)
+           gen_helper_sve_scvt_sd, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(SCVTF_dd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_scvt_dd, a, 0, FPST_FPCR)
+           gen_helper_sve_scvt_dd, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(UCVTF_hh, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_ucvt_hh, a, 0, FPST_FPCR_F16)
@@ -4027,14 +4027,14 @@ TRANS_FEAT(UCVTF_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_ucvt_dh, a, 0, FPST_FPCR_F16)
 
 TRANS_FEAT(UCVTF_ss, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_ucvt_ss, a, 0, FPST_FPCR)
+           gen_helper_sve_ucvt_ss, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(UCVTF_ds, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_ucvt_ds, a, 0, FPST_FPCR)
+           gen_helper_sve_ucvt_ds, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(UCVTF_sd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_ucvt_sd, a, 0, FPST_FPCR)
+           gen_helper_sve_ucvt_sd, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(UCVTF_dd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_ucvt_dd, a, 0, FPST_FPCR)
+           gen_helper_sve_ucvt_dd, a, 0, FPST_FPCR_A64)
 
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
@@ -6916,10 +6916,10 @@ DO_ZPZZ_FP(FMINP, aa64_sve2, sve2_fminp_zpzz)
 
 TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, gen_gvec_fpst_zzzz,
                         gen_helper_fmmla_s, a->rd, a->rn, a->rm, a->ra,
-                        0, FPST_FPCR)
+                        0, FPST_FPCR_A64)
 TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, gen_gvec_fpst_zzzz,
                         gen_helper_fmmla_d, a->rd, a->rn, a->rm, a->ra,
-                        0, FPST_FPCR)
+                        0, FPST_FPCR_A64)
 
 static gen_helper_gvec_4 * const sqdmlal_zzzw_fns[] = {
     NULL,                           gen_helper_sve2_sqdmlal_zzzw_h,
@@ -7035,17 +7035,17 @@ TRANS_FEAT_NONSTREAMING(RAX1, aa64_sve2_sha3, gen_gvec_fn_arg_zzz,
                         gen_gvec_rax1, a)
 
 TRANS_FEAT(FCVTNT_sh, aa64_sve2, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve2_fcvtnt_sh, a, 0, FPST_FPCR)
+           gen_helper_sve2_fcvtnt_sh, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVTNT_ds, aa64_sve2, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve2_fcvtnt_ds, a, 0, FPST_FPCR)
+           gen_helper_sve2_fcvtnt_ds, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(BFCVTNT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_bfcvtnt, a, 0, FPST_FPCR)
+           gen_helper_sve_bfcvtnt, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(FCVTLT_hs, aa64_sve2, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve2_fcvtlt_hs, a, 0, FPST_FPCR)
+           gen_helper_sve2_fcvtlt_hs, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVTLT_sd, aa64_sve2, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve2_fcvtlt_sd, a, 0, FPST_FPCR)
+           gen_helper_sve2_fcvtlt_sd, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(FCVTX_ds, aa64_sve2, do_frint_mode, a,
            FPROUNDING_ODD, gen_helper_sve_fcvt_ds)
@@ -7057,7 +7057,7 @@ static gen_helper_gvec_3_ptr * const flogb_fns[] = {
     gen_helper_flogb_s, gen_helper_flogb_d
 };
 TRANS_FEAT(FLOGB, aa64_sve2, gen_gvec_fpst_arg_zpz, flogb_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
+           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
 
 static bool do_FMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sub, bool sel)
 {
@@ -7101,7 +7101,7 @@ TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_env_arg_zzzz,
 static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
 {
     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal,
-                              a->rd, a->rn, a->rm, a->ra, sel, FPST_FPCR);
+                              a->rd, a->rn, a->rm, a->ra, sel, FPST_FPCR_A64);
 }
 
 TRANS_FEAT(BFMLALB_zzzw, aa64_sve_bf16, do_BFMLAL_zzzw, a, false)
@@ -7111,7 +7111,7 @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
 {
     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal_idx,
                               a->rd, a->rn, a->rm, a->ra,
-                              (a->index << 1) | sel, FPST_FPCR);
+                              (a->index << 1) | sel, FPST_FPCR_A64);
 }
 
 TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 12/76] target/arm: Remove now-unused vfp.fp_status and FPST_FPCR
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (10 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 11/76] target/arm: Use FPST_FPCR_A64 in A64 decoder Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:20   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64 Peter Maydell
                   ` (65 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Now we have moved all the uses of vfp.fp_status and FPST_FPCR
to either the A32 or A64 fields, we can remove these.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           | 2 --
 target/arm/tcg/translate.h | 6 ------
 target/arm/cpu.c           | 1 -
 target/arm/vfp_helper.c    | 8 +-------
 4 files changed, 1 insertion(+), 16 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 337c5383748..7b967bbd1d2 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -633,7 +633,6 @@ typedef struct CPUArchState {
 
         /* There are a number of distinct float control structures:
          *
-         *  fp_status: is the "normal" fp status.
          *  fp_status_a32: is the "normal" fp status for AArch32 insns
          *  fp_status_a64: is the "normal" fp status for AArch64 insns
          *  fp_status_fp16: used for half-precision calculations
@@ -660,7 +659,6 @@ typedef struct CPUArchState {
          * only thing which needs to read the exception flags being
          * an explicit FPSCR read.
          */
-        float_status fp_status;
         float_status fp_status_a32;
         float_status fp_status_a64;
         float_status fp_status_f16;
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index a7509b314b0..197772eb13d 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -670,7 +670,6 @@ static inline CPUARMTBFlags arm_tbflags_from_tb(const TranslationBlock *tb)
  * Enum for argument to fpstatus_ptr().
  */
 typedef enum ARMFPStatusFlavour {
-    FPST_FPCR,
     FPST_FPCR_A32,
     FPST_FPCR_A64,
     FPST_FPCR_F16,
@@ -686,8 +685,6 @@ typedef enum ARMFPStatusFlavour {
  * been set up to point to the requested field in the CPU state struct.
  * The options are:
  *
- * FPST_FPCR
- *   for non-FP16 operations controlled by the FPCR
  * FPST_FPCR_A32
  *   for AArch32 non-FP16 operations controlled by the FPCR
  * FPST_FPCR_A64
@@ -705,9 +702,6 @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
     int offset;
 
     switch (flavour) {
-    case FPST_FPCR:
-        offset = offsetof(CPUARMState, vfp.fp_status);
-        break;
     case FPST_FPCR_A32:
         offset = offsetof(CPUARMState, vfp.fp_status_a32);
         break;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 8bdd535db95..a2b9bd3fb9d 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -572,7 +572,6 @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
     set_default_nan_mode(1, &env->vfp.standard_fp_status);
     set_default_nan_mode(1, &env->vfp.standard_fp_status_f16);
-    arm_set_default_fp_behaviours(&env->vfp.fp_status);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 034f26e5daa..9fee6265f20 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -61,9 +61,8 @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
 
 static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 {
-    uint32_t i;
+    uint32_t i = 0;
 
-    i = get_float_exception_flags(&env->vfp.fp_status);
     i |= get_float_exception_flags(&env->vfp.fp_status_a32);
     i |= get_float_exception_flags(&env->vfp.fp_status_a64);
     i |= get_float_exception_flags(&env->vfp.standard_fp_status);
@@ -82,7 +81,6 @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      * values. The caller should have arranged for env->vfp.fpsr to
      * be the architecturally up-to-date exception flag information first.
      */
-    set_float_exception_flags(0, &env->vfp.fp_status);
     set_float_exception_flags(0, &env->vfp.fp_status_a32);
     set_float_exception_flags(0, &env->vfp.fp_status_a64);
     set_float_exception_flags(0, &env->vfp.fp_status_f16);
@@ -112,7 +110,6 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
             i = float_round_to_zero;
             break;
         }
-        set_float_rounding_mode(i, &env->vfp.fp_status);
         set_float_rounding_mode(i, &env->vfp.fp_status_a32);
         set_float_rounding_mode(i, &env->vfp.fp_status_a64);
         set_float_rounding_mode(i, &env->vfp.fp_status_f16);
@@ -126,8 +123,6 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     }
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
@@ -135,7 +130,6 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     }
     if (changed & FPCR_DN) {
         bool dnan_enabled = val & FPCR_DN;
-        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (11 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 12/76] target/arm: Remove now-unused vfp.fp_status and FPST_FPCR Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:21   ` Richard Henderson
  2025-01-27  5:00   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 14/76] target/arm: Use fp_status_f16_a32 in AArch32-only helpers Peter Maydell
                   ` (64 subsequent siblings)
  77 siblings, 2 replies; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

As the first part of splitting the existing fp_status_f16
into separate float_status fields for AArch32 and AArch64
(so that we can make FEAT_AFP control bits apply only
for AArch64), define the two new fp_status_f16_a32 and
fp_status_f16_a64 fields, but don't use them yet.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           |  4 ++++
 target/arm/tcg/translate.h | 12 ++++++++++++
 target/arm/cpu.c           |  2 ++
 target/arm/vfp_helper.c    | 14 ++++++++++++++
 4 files changed, 32 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 7b967bbd1d2..be409c5c76e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -636,6 +636,8 @@ typedef struct CPUArchState {
          *  fp_status_a32: is the "normal" fp status for AArch32 insns
          *  fp_status_a64: is the "normal" fp status for AArch64 insns
          *  fp_status_fp16: used for half-precision calculations
+         *  fp_status_fp16_a32: used for AArch32 half-precision calculations
+         *  fp_status_fp16_a64: used for AArch64 half-precision calculations
          *  standard_fp_status : the ARM "Standard FPSCR Value"
          *  standard_fp_status_fp16 : used for half-precision
          *       calculations with the ARM "Standard FPSCR Value"
@@ -662,6 +664,8 @@ typedef struct CPUArchState {
         float_status fp_status_a32;
         float_status fp_status_a64;
         float_status fp_status_f16;
+        float_status fp_status_f16_a32;
+        float_status fp_status_f16_a64;
         float_status standard_fp_status;
         float_status standard_fp_status_f16;
 
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 197772eb13d..57e5d92cd60 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -673,6 +673,8 @@ typedef enum ARMFPStatusFlavour {
     FPST_FPCR_A32,
     FPST_FPCR_A64,
     FPST_FPCR_F16,
+    FPST_FPCR_F16_A32,
+    FPST_FPCR_F16_A64,
     FPST_STD,
     FPST_STD_F16,
 } ARMFPStatusFlavour;
@@ -691,6 +693,10 @@ typedef enum ARMFPStatusFlavour {
  *   for AArch64 non-FP16 operations controlled by the FPCR
  * FPST_FPCR_F16
  *   for operations controlled by the FPCR where FPCR.FZ16 is to be used
+ * FPST_FPCR_F16_A32
+ *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
+ * FPST_FPCR_F16_A64
+ *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
  * FPST_STD
  *   for A32/T32 Neon operations using the "standard FPSCR value"
  * FPST_STD_F16
@@ -711,6 +717,12 @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
     case FPST_FPCR_F16:
         offset = offsetof(CPUARMState, vfp.fp_status_f16);
         break;
+    case FPST_FPCR_F16_A32:
+        offset = offsetof(CPUARMState, vfp.fp_status_f16_a32);
+        break;
+    case FPST_FPCR_F16_A64:
+        offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
+        break;
     case FPST_STD:
         offset = offsetof(CPUARMState, vfp.standard_fp_status);
         break;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index a2b9bd3fb9d..ff8514edc6d 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -576,6 +576,8 @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 9fee6265f20..45f9dfc8861 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -69,6 +69,10 @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     /* FZ16 does not generate an input denormal exception.  */
     i |= (get_float_exception_flags(&env->vfp.fp_status_f16)
           & ~float_flag_input_denormal);
+    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
+          & ~float_flag_input_denormal);
+    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+          & ~float_flag_input_denormal);
     i |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
           & ~float_flag_input_denormal);
     return vfp_exceptbits_from_host(i);
@@ -84,6 +88,8 @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_a32);
     set_float_exception_flags(0, &env->vfp.fp_status_a64);
     set_float_exception_flags(0, &env->vfp.fp_status_f16);
+    set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
+    set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
     set_float_exception_flags(0, &env->vfp.standard_fp_status);
     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
 }
@@ -113,12 +119,18 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_float_rounding_mode(i, &env->vfp.fp_status_a32);
         set_float_rounding_mode(i, &env->vfp.fp_status_a64);
         set_float_rounding_mode(i, &env->vfp.fp_status_f16);
+        set_float_rounding_mode(i, &env->vfp.fp_status_f16_a32);
+        set_float_rounding_mode(i, &env->vfp.fp_status_f16_a64);
     }
     if (changed & FPCR_FZ16) {
         bool ftz_enabled = val & FPCR_FZ16;
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
     }
     if (changed & FPCR_FZ) {
@@ -133,6 +145,8 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 14/76] target/arm: Use fp_status_f16_a32 in AArch32-only helpers
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (12 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64 Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:21   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 15/76] target/arm: Use fp_status_f16_a64 in AArch64-only helpers Peter Maydell
                   ` (63 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

We directly use fp_status_f16 in a handful of helpers that
are AArch32-specific; switch to fp_status_f16_a32 for these.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 4 ++--
 target/arm/vfp_helper.c     | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 44ee2c81fad..aaad947e506 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2060,7 +2060,7 @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
                             CPUARMState *env, uint32_t desc)
 {
     do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, desc,
-             get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
+             get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
 void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
@@ -2122,7 +2122,7 @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
                                 CPUARMState *env, uint32_t desc)
 {
     do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, desc,
-                 get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
+                 get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
 void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 45f9dfc8861..f3aa80bbfb6 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -380,7 +380,7 @@ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
     softfloat_to_vfp_compare(env, \
         FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
 }
-DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status_f16)
+DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status_f16_a32)
 DO_VFP_cmp(s, float32, float32, fp_status_a32)
 DO_VFP_cmp(d, float64, float64, fp_status_a32)
 #undef DO_VFP_cmp
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 15/76] target/arm: Use fp_status_f16_a64 in AArch64-only helpers
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (13 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 14/76] target/arm: Use fp_status_f16_a32 in AArch32-only helpers Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:22   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 16/76] target/arm: Use FPST_FPCR_F16_A32 in A32 decoder Peter Maydell
                   ` (62 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

We directly use fp_status_f16 in a handful of helpers that are
AArch64-specific; switch to fp_status_f16_a64 for these.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sme_helper.c | 4 ++--
 target/arm/tcg/vec_helper.c | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index 2aad00d3ad9..727c085f374 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -1038,12 +1038,12 @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
     float_status fpst_odd, fpst_std, fpst_f16;
 
     /*
-     * Make copies of fp_status and fp_status_f16, because this operation
+     * Make copies of the fp status fields we use, because this operation
      * does not update the cumulative fp exception status.  It also
      * produces default NaNs. We also need a second copy of fp_status with
      * round-to-odd -- see above.
      */
-    fpst_f16 = env->vfp.fp_status_f16;
+    fpst_f16 = env->vfp.fp_status_f16_a64;
     fpst_std = env->vfp.fp_status_a64;
     set_default_nan_mode(true, &fpst_std);
     set_default_nan_mode(true, &fpst_f16);
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index aaad947e506..3fbca8bc8bf 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2067,7 +2067,7 @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
                             CPUARMState *env, uint32_t desc)
 {
     do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, desc,
-             get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
+             get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
 }
 
 void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
@@ -2077,7 +2077,7 @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
     uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     float_status *status = &env->vfp.fp_status_a64;
-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16);
+    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
 
     for (i = 0; i < oprsz; i += sizeof(float32)) {
         float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negn;
@@ -2129,7 +2129,7 @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
                                 CPUARMState *env, uint32_t desc)
 {
     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, desc,
-                 get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
+                 get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
 }
 
 void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
@@ -2140,7 +2140,7 @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
     float_status *status = &env->vfp.fp_status_a64;
-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16);
+    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
 
     for (i = 0; i < oprsz; i += 16) {
         float16 mm_16 = *(float16 *)(vm + i + idx);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 16/76] target/arm: Use FPST_FPCR_F16_A32 in A32 decoder
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (14 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 15/76] target/arm: Use fp_status_f16_a64 in AArch64-only helpers Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:23   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 17/76] target/arm: Use FPST_FPCR_F16_A64 in A64 decoder Peter Maydell
                   ` (61 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

In the A32 decoder, use FPST_FPCR_F16_A32 rather than FPST_FPCR_F16.
By doing an automated conversion of the whole file we avoid possibly
using more than one fpst value in a set_rmode/op/restore_rmode
sequence.

Patch created with
  perl -p -i -e 's/FPST_FPCR_F16(?!_)/FPST_FPCR_F16_A32/g' target/arm/tcg/translate-vfp.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-vfp.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/target/arm/tcg/translate-vfp.c b/target/arm/tcg/translate-vfp.c
index e1b8243c5d9..8eebba0f272 100644
--- a/target/arm/tcg/translate-vfp.c
+++ b/target/arm/tcg/translate-vfp.c
@@ -460,7 +460,7 @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
     }
 
     if (sz == 1) {
-        fpst = fpstatus_ptr(FPST_FPCR_F16);
+        fpst = fpstatus_ptr(FPST_FPCR_F16_A32);
     } else {
         fpst = fpstatus_ptr(FPST_FPCR_A32);
     }
@@ -527,7 +527,7 @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
     }
 
     if (sz == 1) {
-        fpst = fpstatus_ptr(FPST_FPCR_F16);
+        fpst = fpstatus_ptr(FPST_FPCR_F16_A32);
     } else {
         fpst = fpstatus_ptr(FPST_FPCR_A32);
     }
@@ -1433,7 +1433,7 @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
     /*
      * Do a half-precision operation. Functionally this is
      * the same as do_vfp_3op_sp(), except:
-     *  - it uses the FPST_FPCR_F16
+     *  - it uses the FPST_FPCR_F16_A32
      *  - it doesn't need the VFP vector handling (fp16 is a
      *    v8 feature, and in v8 VFP vectors don't exist)
      *  - it does the aa32_fp16_arith feature test
@@ -1456,7 +1456,7 @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
     f0 = tcg_temp_new_i32();
     f1 = tcg_temp_new_i32();
     fd = tcg_temp_new_i32();
-    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    fpst = fpstatus_ptr(FPST_FPCR_F16_A32);
 
     vfp_load_reg16(f0, vn);
     vfp_load_reg16(f1, vm);
@@ -2122,7 +2122,7 @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
         /* VFNMA, VFNMS */
         gen_vfp_negh(vd, vd);
     }
-    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    fpst = fpstatus_ptr(FPST_FPCR_F16_A32);
     gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
     vfp_store_reg32(vd, a->vd);
     return true;
@@ -2424,7 +2424,7 @@ DO_VFP_2OP(VNEG, dp, gen_vfp_negd, aa32_fpdp_v2)
 
 static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
 {
-    gen_helper_vfp_sqrth(vd, vm, fpstatus_ptr(FPST_FPCR_F16));
+    gen_helper_vfp_sqrth(vd, vm, fpstatus_ptr(FPST_FPCR_F16_A32));
 }
 
 static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
@@ -2706,7 +2706,7 @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
 
     tmp = tcg_temp_new_i32();
     vfp_load_reg16(tmp, a->vm);
-    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    fpst = fpstatus_ptr(FPST_FPCR_F16_A32);
     gen_helper_rinth(tmp, tmp, fpst);
     vfp_store_reg32(tmp, a->vd);
     return true;
@@ -2779,7 +2779,7 @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
 
     tmp = tcg_temp_new_i32();
     vfp_load_reg16(tmp, a->vm);
-    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    fpst = fpstatus_ptr(FPST_FPCR_F16_A32);
     tcg_rmode = gen_set_rmode(FPROUNDING_ZERO, fpst);
     gen_helper_rinth(tmp, tmp, fpst);
     gen_restore_rmode(tcg_rmode, fpst);
@@ -2859,7 +2859,7 @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
 
     tmp = tcg_temp_new_i32();
     vfp_load_reg16(tmp, a->vm);
-    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    fpst = fpstatus_ptr(FPST_FPCR_F16_A32);
     gen_helper_rinth_exact(tmp, tmp, fpst);
     vfp_store_reg32(tmp, a->vd);
     return true;
@@ -2983,7 +2983,7 @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
 
     vm = tcg_temp_new_i32();
     vfp_load_reg32(vm, a->vm);
-    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    fpst = fpstatus_ptr(FPST_FPCR_F16_A32);
     if (a->s) {
         /* i32 -> f16 */
         gen_helper_vfp_sitoh(vm, vm, fpst);
@@ -3105,7 +3105,7 @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
     vd = tcg_temp_new_i32();
     vfp_load_reg32(vd, a->vd);
 
-    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    fpst = fpstatus_ptr(FPST_FPCR_F16_A32);
     shift = tcg_constant_i32(frac_bits);
 
     /* Switch on op:U:sx bits */
@@ -3273,7 +3273,7 @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
         return true;
     }
 
-    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    fpst = fpstatus_ptr(FPST_FPCR_F16_A32);
     vm = tcg_temp_new_i32();
     vfp_load_reg16(vm, a->vm);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 17/76] target/arm: Use FPST_FPCR_F16_A64 in A64 decoder
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (15 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 16/76] target/arm: Use FPST_FPCR_F16_A32 in A32 decoder Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:23   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 18/76] target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16 Peter Maydell
                   ` (60 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

In the A32 decoder, use FPST_FPCR_F16_A32 rather than FPST_FPCR_F16.
By doing an automated conversion of the whole file we avoid possibly
using more than one fpst value in a set_rmode/op/restore_rmode
sequence.

Patch created with
  perl -p -i -e 's/FPST_FPCR_F16(?!_)/FPST_FPCR_F16_A64/g' target/arm/tcg/translate-{a64,sve,sme}.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 32 ++++++++---------
 target/arm/tcg/translate-sve.c | 66 +++++++++++++++++-----------------
 2 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 9f10b2b2e6a..b713a5f6025 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -726,7 +726,7 @@ static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn,
                               int rm, bool is_fp16, int data,
                               gen_helper_gvec_3_ptr *fn)
 {
-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
     tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn),
                        vec_full_reg_offset(s, rm), fpst,
@@ -768,7 +768,7 @@ static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
                               int rm, int ra, bool is_fp16, int data,
                               gen_helper_gvec_4_ptr *fn)
 {
-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
     tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn),
                        vec_full_reg_offset(s, rm),
@@ -5062,7 +5062,7 @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
         if (fp_access_check(s)) {
             TCGv_i32 t0 = read_fp_hreg(s, a->rn);
             TCGv_i32 t1 = read_fp_hreg(s, a->rm);
-            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_FPCR_F16));
+            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_FPCR_F16_A64));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -5270,9 +5270,9 @@ static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
             TCGv_i32 t0 = read_fp_hreg(s, a->rn);
             TCGv_i32 t1 = tcg_constant_i32(0);
             if (swap) {
-                f->gen_h(t0, t1, t0, fpstatus_ptr(FPST_FPCR_F16));
+                f->gen_h(t0, t1, t0, fpstatus_ptr(FPST_FPCR_F16_A64));
             } else {
-                f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_FPCR_F16));
+                f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_FPCR_F16_A64));
             }
             write_fp_sreg(s, a->rd, t0);
         }
@@ -6230,7 +6230,7 @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
             TCGv_i32 t1 = tcg_temp_new_i32();
 
             read_vec_element_i32(s, t1, a->rm, a->idx, MO_16);
-            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_FPCR_F16));
+            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_FPCR_F16_A64));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -6288,7 +6288,7 @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
                 gen_vfp_negh(t1, t1);
             }
             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
-                                       fpstatus_ptr(FPST_FPCR_F16));
+                                       fpstatus_ptr(FPST_FPCR_F16_A64));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -6626,7 +6626,7 @@ static bool do_fp3_scalar_pair(DisasContext *s, arg_rr_e *a, const FPScalar *f)
 
             read_vec_element_i32(s, t0, a->rn, 0, MO_16);
             read_vec_element_i32(s, t1, a->rn, 1, MO_16);
-            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_FPCR_F16));
+            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_FPCR_F16_A64));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -6801,7 +6801,7 @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             if (neg_n) {
                 gen_vfp_negh(tn, tn);
             }
-            fpst = fpstatus_ptr(FPST_FPCR_F16);
+            fpst = fpstatus_ptr(FPST_FPCR_F16_A64);
             gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
             write_fp_sreg(s, a->rd, ta);
         }
@@ -6895,7 +6895,7 @@ static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a,
     if (fp_access_check(s)) {
         MemOp esz = a->esz;
         int elts = (a->q ? 16 : 8) >> esz;
-        TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+        TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
         TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst, fn);
         write_fp_sreg(s, a->rd, res);
     }
@@ -6939,7 +6939,7 @@ static void handle_fp_compare(DisasContext *s, int size,
                               bool cmp_with_zero, bool signal_all_nans)
 {
     TCGv_i64 tcg_flags = tcg_temp_new_i64();
-    TCGv_ptr fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+    TCGv_ptr fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
 
     if (size == MO_64) {
         TCGv_i64 tcg_vn, tcg_vm;
@@ -8407,7 +8407,7 @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
     if (rmode >= 0) {
         tcg_rmode = gen_set_rmode(rmode, fpst);
     }
@@ -8598,7 +8598,7 @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
     TCGv_i32 tcg_shift, tcg_single;
     TCGv_i64 tcg_double;
 
-    tcg_fpstatus = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+    tcg_fpstatus = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
     tcg_shift = tcg_constant_i32(shift);
 
     switch (esz) {
@@ -8693,7 +8693,7 @@ static void do_fcvt_scalar(DisasContext *s, MemOp out, MemOp esz,
     TCGv_ptr tcg_fpstatus;
     TCGv_i32 tcg_shift, tcg_rmode, tcg_single;
 
-    tcg_fpstatus = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+    tcg_fpstatus = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
     tcg_shift = tcg_constant_i32(shift);
     tcg_rmode = gen_set_rmode(rmode, tcg_fpstatus);
 
@@ -9312,7 +9312,7 @@ static bool do_fp1_vector(DisasContext *s, arg_qrr_e *a,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
     if (rmode >= 0) {
         tcg_rmode = gen_set_rmode(rmode, fpst);
     }
@@ -9372,7 +9372,7 @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+    fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
     tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn), fpst,
                        is_q ? 16 : 8, vec_full_reg_size(s),
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index caf8ea18328..37de816964a 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -141,7 +141,7 @@ static bool gen_gvec_fpst_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
                                  arg_rr_esz *a, int data)
 {
     return gen_gvec_fpst_zz(s, fn, a->rd, a->rn, data,
-                            a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+                            a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
 }
 
 /* Invoke an out-of-line helper on 3 Zregs. */
@@ -191,7 +191,7 @@ static bool gen_gvec_fpst_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
                                   arg_rrr_esz *a, int data)
 {
     return gen_gvec_fpst_zzz(s, fn, a->rd, a->rn, a->rm, data,
-                             a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+                             a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
 }
 
 /* Invoke an out-of-line helper on 4 Zregs. */
@@ -397,7 +397,7 @@ static bool gen_gvec_fpst_arg_zpzz(DisasContext *s, gen_helper_gvec_4_ptr *fn,
                                    arg_rprr_esz *a)
 {
     return gen_gvec_fpst_zzzp(s, fn, a->rd, a->rn, a->rm, a->pg, 0,
-                              a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+                              a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
 }
 
 /* Invoke a vector expander on two Zregs and an immediate.  */
@@ -3517,7 +3517,7 @@ static bool do_FMLA_zzxz(DisasContext *s, arg_rrxr_esz *a, bool sub)
     };
     return gen_gvec_fpst_zzzz(s, fns[a->esz], a->rd, a->rn, a->rm, a->ra,
                               (a->index << 1) | sub,
-                              a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+                              a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
 }
 
 TRANS_FEAT(FMLA_zzxz, aa64_sve, do_FMLA_zzxz, a, false)
@@ -3533,7 +3533,7 @@ static gen_helper_gvec_3_ptr * const fmul_idx_fns[4] = {
 };
 TRANS_FEAT(FMUL_zzx, aa64_sve, gen_gvec_fpst_zzz,
            fmul_idx_fns[a->esz], a->rd, a->rn, a->rm, a->index,
-           a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
+           a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
 /*
  *** SVE Floating Point Fast Reduction Group
@@ -3566,7 +3566,7 @@ static bool do_reduce(DisasContext *s, arg_rpr_esz *a,
 
     tcg_gen_addi_ptr(t_zn, tcg_env, vec_full_reg_offset(s, a->rn));
     tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, a->pg));
-    status = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+    status = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
 
     fn(temp, t_zn, t_pg, status, t_desc);
 
@@ -3618,7 +3618,7 @@ static bool do_ppz_fp(DisasContext *s, arg_rpr_esz *a,
     if (sve_access_check(s)) {
         unsigned vsz = vec_full_reg_size(s);
         TCGv_ptr status =
-            fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+            fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
 
         tcg_gen_gvec_3_ptr(pred_full_reg_offset(s, a->rd),
                            vec_full_reg_offset(s, a->rn),
@@ -3654,7 +3654,7 @@ static gen_helper_gvec_3_ptr * const ftmad_fns[4] = {
 };
 TRANS_FEAT_NONSTREAMING(FTMAD, aa64_sve, gen_gvec_fpst_zzz,
                         ftmad_fns[a->esz], a->rd, a->rn, a->rm, a->imm,
-                        a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
+                        a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
 /*
  *** SVE Floating Point Accumulating Reduction Group
@@ -3687,7 +3687,7 @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
     t_pg = tcg_temp_new_ptr();
     tcg_gen_addi_ptr(t_rm, tcg_env, vec_full_reg_offset(s, a->rm));
     tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, a->pg));
-    t_fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+    t_fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
     t_desc = tcg_constant_i32(simd_desc(vsz, vsz, 0));
 
     fns[a->esz - 1](t_val, t_val, t_rm, t_pg, t_fpst, t_desc);
@@ -3762,7 +3762,7 @@ static void do_fp_scalar(DisasContext *s, int zd, int zn, int pg, bool is_fp16,
     tcg_gen_addi_ptr(t_zn, tcg_env, vec_full_reg_offset(s, zn));
     tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, pg));
 
-    status = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+    status = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
     desc = tcg_constant_i32(simd_desc(vsz, vsz, 0));
     fn(t_zd, t_zn, t_pg, scalar, status, desc);
 }
@@ -3814,7 +3814,7 @@ static bool do_fp_cmp(DisasContext *s, arg_rprr_esz *a,
     }
     if (sve_access_check(s)) {
         unsigned vsz = vec_full_reg_size(s);
-        TCGv_ptr status = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+        TCGv_ptr status = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
         tcg_gen_gvec_4_ptr(pred_full_reg_offset(s, a->rd),
                            vec_full_reg_offset(s, a->rn),
                            vec_full_reg_offset(s, a->rm),
@@ -3847,7 +3847,7 @@ static gen_helper_gvec_4_ptr * const fcadd_fns[] = {
 };
 TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
            a->rd, a->rn, a->rm, a->pg, a->rot,
-           a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
+           a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
 #define DO_FMLA(NAME, name) \
     static gen_helper_gvec_5_ptr * const name##_fns[4] = {              \
@@ -3856,7 +3856,7 @@ TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
     };                                                                  \
     TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp, name##_fns[a->esz], \
                a->rd, a->rn, a->rm, a->ra, a->pg, 0,                    \
-               a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
+               a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
 DO_FMLA(FMLA_zpzzz, fmla_zpzzz)
 DO_FMLA(FMLS_zpzzz, fmls_zpzzz)
@@ -3871,14 +3871,14 @@ static gen_helper_gvec_5_ptr * const fcmla_fns[4] = {
 };
 TRANS_FEAT(FCMLA_zpzzz, aa64_sve, gen_gvec_fpst_zzzzp, fcmla_fns[a->esz],
            a->rd, a->rn, a->rm, a->ra, a->pg, a->rot,
-           a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
+           a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
 static gen_helper_gvec_4_ptr * const fcmla_idx_fns[4] = {
     NULL, gen_helper_gvec_fcmlah_idx, gen_helper_gvec_fcmlas_idx, NULL
 };
 TRANS_FEAT(FCMLA_zzxz, aa64_sve, gen_gvec_fpst_zzzz, fcmla_idx_fns[a->esz],
            a->rd, a->rn, a->rm, a->ra, a->index * 4 + a->rot,
-           a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
+           a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
 /*
  *** SVE Floating Point Unary Operations Predicated Group
@@ -3902,17 +3902,17 @@ TRANS_FEAT(FCVT_sd, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvt_sd, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(FCVTZS_hh, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzs_hh, a, 0, FPST_FPCR_F16)
+           gen_helper_sve_fcvtzs_hh, a, 0, FPST_FPCR_F16_A64)
 TRANS_FEAT(FCVTZU_hh, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzu_hh, a, 0, FPST_FPCR_F16)
+           gen_helper_sve_fcvtzu_hh, a, 0, FPST_FPCR_F16_A64)
 TRANS_FEAT(FCVTZS_hs, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzs_hs, a, 0, FPST_FPCR_F16)
+           gen_helper_sve_fcvtzs_hs, a, 0, FPST_FPCR_F16_A64)
 TRANS_FEAT(FCVTZU_hs, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzu_hs, a, 0, FPST_FPCR_F16)
+           gen_helper_sve_fcvtzu_hs, a, 0, FPST_FPCR_F16_A64)
 TRANS_FEAT(FCVTZS_hd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzs_hd, a, 0, FPST_FPCR_F16)
+           gen_helper_sve_fcvtzs_hd, a, 0, FPST_FPCR_F16_A64)
 TRANS_FEAT(FCVTZU_hd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvtzu_hd, a, 0, FPST_FPCR_F16)
+           gen_helper_sve_fcvtzu_hd, a, 0, FPST_FPCR_F16_A64)
 
 TRANS_FEAT(FCVTZS_ss, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvtzs_ss, a, 0, FPST_FPCR_A64)
@@ -3939,7 +3939,7 @@ static gen_helper_gvec_3_ptr * const frint_fns[] = {
     gen_helper_sve_frint_d
 };
 TRANS_FEAT(FRINTI, aa64_sve, gen_gvec_fpst_arg_zpz, frint_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
+           a, 0, a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
 static gen_helper_gvec_3_ptr * const frintx_fns[] = {
     NULL,
@@ -3948,7 +3948,7 @@ static gen_helper_gvec_3_ptr * const frintx_fns[] = {
     gen_helper_sve_frintx_d
 };
 TRANS_FEAT(FRINTX, aa64_sve, gen_gvec_fpst_arg_zpz, frintx_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+           a, 0, a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
 
 static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a,
                           ARMFPRounding mode, gen_helper_gvec_3_ptr *fn)
@@ -3965,7 +3965,7 @@ static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a,
     }
 
     vsz = vec_full_reg_size(s);
-    status = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64);
+    status = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
     tmode = gen_set_rmode(mode, status);
 
     tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd),
@@ -3993,21 +3993,21 @@ static gen_helper_gvec_3_ptr * const frecpx_fns[] = {
     gen_helper_sve_frecpx_s, gen_helper_sve_frecpx_d,
 };
 TRANS_FEAT(FRECPX, aa64_sve, gen_gvec_fpst_arg_zpz, frecpx_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
+           a, 0, a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
 static gen_helper_gvec_3_ptr * const fsqrt_fns[] = {
     NULL,                   gen_helper_sve_fsqrt_h,
     gen_helper_sve_fsqrt_s, gen_helper_sve_fsqrt_d,
 };
 TRANS_FEAT(FSQRT, aa64_sve, gen_gvec_fpst_arg_zpz, fsqrt_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
+           a, 0, a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
 TRANS_FEAT(SCVTF_hh, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_scvt_hh, a, 0, FPST_FPCR_F16)
+           gen_helper_sve_scvt_hh, a, 0, FPST_FPCR_F16_A64)
 TRANS_FEAT(SCVTF_sh, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_scvt_sh, a, 0, FPST_FPCR_F16)
+           gen_helper_sve_scvt_sh, a, 0, FPST_FPCR_F16_A64)
 TRANS_FEAT(SCVTF_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_scvt_dh, a, 0, FPST_FPCR_F16)
+           gen_helper_sve_scvt_dh, a, 0, FPST_FPCR_F16_A64)
 
 TRANS_FEAT(SCVTF_ss, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_scvt_ss, a, 0, FPST_FPCR_A64)
@@ -4020,11 +4020,11 @@ TRANS_FEAT(SCVTF_dd, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_scvt_dd, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(UCVTF_hh, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_ucvt_hh, a, 0, FPST_FPCR_F16)
+           gen_helper_sve_ucvt_hh, a, 0, FPST_FPCR_F16_A64)
 TRANS_FEAT(UCVTF_sh, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_ucvt_sh, a, 0, FPST_FPCR_F16)
+           gen_helper_sve_ucvt_sh, a, 0, FPST_FPCR_F16_A64)
 TRANS_FEAT(UCVTF_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_ucvt_dh, a, 0, FPST_FPCR_F16)
+           gen_helper_sve_ucvt_dh, a, 0, FPST_FPCR_F16_A64)
 
 TRANS_FEAT(UCVTF_ss, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_ucvt_ss, a, 0, FPST_FPCR_A64)
@@ -7057,7 +7057,7 @@ static gen_helper_gvec_3_ptr * const flogb_fns[] = {
     gen_helper_flogb_s, gen_helper_flogb_d
 };
 TRANS_FEAT(FLOGB, aa64_sve2, gen_gvec_fpst_arg_zpz, flogb_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR_A64)
+           a, 0, a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
 static bool do_FMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sub, bool sel)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 18/76] target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (16 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 17/76] target/arm: Use FPST_FPCR_F16_A64 in A64 decoder Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:23   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 19/76] fpu: Rename float_flag_input_denormal to float_flag_input_denormal_flushed Peter Maydell
                   ` (59 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Now we have moved all the uses of vfp.fp_status_f16 and FPST_FPCR_F16
to the new A32 or A64 fields, we can remove these.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           | 2 --
 target/arm/tcg/translate.h | 6 ------
 target/arm/cpu.c           | 1 -
 target/arm/vfp_helper.c    | 7 -------
 4 files changed, 16 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index be409c5c76e..2213c277348 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -635,7 +635,6 @@ typedef struct CPUArchState {
          *
          *  fp_status_a32: is the "normal" fp status for AArch32 insns
          *  fp_status_a64: is the "normal" fp status for AArch64 insns
-         *  fp_status_fp16: used for half-precision calculations
          *  fp_status_fp16_a32: used for AArch32 half-precision calculations
          *  fp_status_fp16_a64: used for AArch64 half-precision calculations
          *  standard_fp_status : the ARM "Standard FPSCR Value"
@@ -663,7 +662,6 @@ typedef struct CPUArchState {
          */
         float_status fp_status_a32;
         float_status fp_status_a64;
-        float_status fp_status_f16;
         float_status fp_status_f16_a32;
         float_status fp_status_f16_a64;
         float_status standard_fp_status;
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 57e5d92cd60..ec4c0cf03fc 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -672,7 +672,6 @@ static inline CPUARMTBFlags arm_tbflags_from_tb(const TranslationBlock *tb)
 typedef enum ARMFPStatusFlavour {
     FPST_FPCR_A32,
     FPST_FPCR_A64,
-    FPST_FPCR_F16,
     FPST_FPCR_F16_A32,
     FPST_FPCR_F16_A64,
     FPST_STD,
@@ -691,8 +690,6 @@ typedef enum ARMFPStatusFlavour {
  *   for AArch32 non-FP16 operations controlled by the FPCR
  * FPST_FPCR_A64
  *   for AArch64 non-FP16 operations controlled by the FPCR
- * FPST_FPCR_F16
- *   for operations controlled by the FPCR where FPCR.FZ16 is to be used
  * FPST_FPCR_F16_A32
  *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
  * FPST_FPCR_F16_A64
@@ -714,9 +711,6 @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
     case FPST_FPCR_A64:
         offset = offsetof(CPUARMState, vfp.fp_status_a64);
         break;
-    case FPST_FPCR_F16:
-        offset = offsetof(CPUARMState, vfp.fp_status_f16);
-        break;
     case FPST_FPCR_F16_A32:
         offset = offsetof(CPUARMState, vfp.fp_status_f16_a32);
         break;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index ff8514edc6d..7a83b9ee34f 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -575,7 +575,6 @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
-    arm_set_default_fp_behaviours(&env->vfp.fp_status_f16);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index f3aa80bbfb6..3ed69d73698 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -67,8 +67,6 @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     i |= get_float_exception_flags(&env->vfp.fp_status_a64);
     i |= get_float_exception_flags(&env->vfp.standard_fp_status);
     /* FZ16 does not generate an input denormal exception.  */
-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16)
-          & ~float_flag_input_denormal);
     i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
           & ~float_flag_input_denormal);
     i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
@@ -87,7 +85,6 @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      */
     set_float_exception_flags(0, &env->vfp.fp_status_a32);
     set_float_exception_flags(0, &env->vfp.fp_status_a64);
-    set_float_exception_flags(0, &env->vfp.fp_status_f16);
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
     set_float_exception_flags(0, &env->vfp.standard_fp_status);
@@ -118,17 +115,14 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         }
         set_float_rounding_mode(i, &env->vfp.fp_status_a32);
         set_float_rounding_mode(i, &env->vfp.fp_status_a64);
-        set_float_rounding_mode(i, &env->vfp.fp_status_f16);
         set_float_rounding_mode(i, &env->vfp.fp_status_f16_a32);
         set_float_rounding_mode(i, &env->vfp.fp_status_f16_a64);
     }
     if (changed & FPCR_FZ16) {
         bool ftz_enabled = val & FPCR_FZ16;
-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
@@ -144,7 +138,6 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         bool dnan_enabled = val & FPCR_DN;
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
-        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 19/76] fpu: Rename float_flag_input_denormal to float_flag_input_denormal_flushed
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (17 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 18/76] target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16 Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:25   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 20/76] fpu: Rename float_flag_output_denormal to float_flag_output_denormal_flushed Peter Maydell
                   ` (58 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Our float_flag_input_denormal exception flag is set when the fpu code
flushes an input denormal to zero.  This is what many guest
architectures (eg classic Arm behaviour) require, but it is not the
only donarmal-related reason we might want to set an exception flag.
The x86 behaviour (which we do not currently model correctly) wants
to see an exception flag when a denormal input is *not* flushed to
zero and is actually used in an arithmetic operation. Arm's FEAT_AFP
also wants these semantics.

Rename float_flag_input_denormal to float_flag_input_denormal_flushed
to make it clearer when it is set and to allow us to add a new
float_flag_input_denormal_used next to it for the x86/FEAT_AFP
semantics.

Commit created with
 for f in `git grep -l float_flag_input_denormal`; do sed -i -e 's/float_flag_input_denormal/float_flag_input_denormal_flushed/' $f; done

and manual editing of softfloat-types.h and softfloat.c to clean
up the indentation afterwards and to fix a comment which wasn't
using the full name of the flag.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/fpu/softfloat-types.h |  5 +++--
 fpu/softfloat.c               |  4 ++--
 target/arm/tcg/sve_helper.c   |  6 +++---
 target/arm/vfp_helper.c       | 10 +++++-----
 target/i386/tcg/fpu_helper.c  |  6 +++---
 target/mips/tcg/msa_helper.c  |  2 +-
 target/rx/op_helper.c         |  2 +-
 fpu/softfloat-parts.c.inc     |  2 +-
 8 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index d8f831c331d..77bc172a074 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -154,7 +154,8 @@ enum {
     float_flag_overflow        = 0x0004,
     float_flag_underflow       = 0x0008,
     float_flag_inexact         = 0x0010,
-    float_flag_input_denormal  = 0x0020,
+    /* We flushed an input denormal to 0 (because of flush_inputs_to_zero) */
+    float_flag_input_denormal_flushed = 0x0020,
     float_flag_output_denormal = 0x0040,
     float_flag_invalid_isi     = 0x0080,  /* inf - inf */
     float_flag_invalid_imz     = 0x0100,  /* inf * 0 */
@@ -312,7 +313,7 @@ typedef struct float_status {
     bool tininess_before_rounding;
     /* should denormalised results go to zero and set the inexact flag? */
     bool flush_to_zero;
-    /* should denormalised inputs go to zero and set the input_denormal flag? */
+    /* should denormalised inputs go to zero and set input_denormal_flushed? */
     bool flush_inputs_to_zero;
     bool default_nan_mode;
     /*
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 8d75d668172..648050be6fb 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -132,7 +132,7 @@ this code that are retained.
         if (unlikely(soft_t ## _is_denormal(*a))) {                     \
             *a = soft_t ## _set_sign(soft_t ## _zero,                   \
                                      soft_t ## _is_neg(*a));            \
-            float_raise(float_flag_input_denormal, s);                  \
+            float_raise(float_flag_input_denormal_flushed, s);          \
         }                                                               \
     }
 
@@ -4848,7 +4848,7 @@ float128 float128_silence_nan(float128 a, float_status *status)
 static bool parts_squash_denormal(FloatParts64 p, float_status *status)
 {
     if (p.exp == 0 && p.frac != 0) {
-        float_raise(float_flag_input_denormal, status);
+        float_raise(float_flag_input_denormal_flushed, status);
         return true;
     }
 
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index d0865dece35..9837c5bc7ac 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -4658,7 +4658,7 @@ static int16_t do_float16_logb_as_int(float16 a, float_status *s)
                 return -15 - clz32(frac);
             }
             /* flush to zero */
-            float_raise(float_flag_input_denormal, s);
+            float_raise(float_flag_input_denormal_flushed, s);
         }
     } else if (unlikely(exp == 0x1f)) {
         if (frac == 0) {
@@ -4686,7 +4686,7 @@ static int32_t do_float32_logb_as_int(float32 a, float_status *s)
                 return -127 - clz32(frac);
             }
             /* flush to zero */
-            float_raise(float_flag_input_denormal, s);
+            float_raise(float_flag_input_denormal_flushed, s);
         }
     } else if (unlikely(exp == 0xff)) {
         if (frac == 0) {
@@ -4714,7 +4714,7 @@ static int64_t do_float64_logb_as_int(float64 a, float_status *s)
                 return -1023 - clz64(frac);
             }
             /* flush to zero */
-            float_raise(float_flag_input_denormal, s);
+            float_raise(float_flag_input_denormal_flushed, s);
         }
     } else if (unlikely(exp == 0x7ff)) {
         if (frac == 0) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 3ed69d73698..444702a4600 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -53,7 +53,7 @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
     if (host_bits & float_flag_inexact) {
         target_bits |= FPSR_IXC;
     }
-    if (host_bits & float_flag_input_denormal) {
+    if (host_bits & float_flag_input_denormal_flushed) {
         target_bits |= FPSR_IDC;
     }
     return target_bits;
@@ -68,11 +68,11 @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     i |= get_float_exception_flags(&env->vfp.standard_fp_status);
     /* FZ16 does not generate an input denormal exception.  */
     i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
-          & ~float_flag_input_denormal);
+          & ~float_flag_input_denormal_flushed);
     i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
-          & ~float_flag_input_denormal);
+          & ~float_flag_input_denormal_flushed);
     i |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
-          & ~float_flag_input_denormal);
+          & ~float_flag_input_denormal_flushed);
     return vfp_exceptbits_from_host(i);
 }
 
@@ -1133,7 +1133,7 @@ uint64_t HELPER(fjcvtzs)(float64 value, float_status *status)
 
     /* Normal inexact, denormal with flush-to-zero, or overflow or NaN */
     inexact = e_new & (float_flag_inexact |
-                       float_flag_input_denormal |
+                       float_flag_input_denormal_flushed |
                        float_flag_invalid);
 
     /* While not inexact for IEEE FP, -0.0 is inexact for JavaScript. */
diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index e0a072b4ebc..7151e809643 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -207,7 +207,7 @@ static void merge_exception_flags(CPUX86State *env, uint8_t old_flags)
                        (new_flags & float_flag_overflow ? FPUS_OE : 0) |
                        (new_flags & float_flag_underflow ? FPUS_UE : 0) |
                        (new_flags & float_flag_inexact ? FPUS_PE : 0) |
-                       (new_flags & float_flag_input_denormal ? FPUS_DE : 0)));
+                       (new_flags & float_flag_input_denormal_flushed ? FPUS_DE : 0)));
 }
 
 static inline floatx80 helper_fdiv(CPUX86State *env, floatx80 a, floatx80 b)
@@ -1832,7 +1832,7 @@ void helper_fxtract(CPUX86State *env)
             int shift = clz64(temp.l.lower);
             temp.l.lower <<= shift;
             expdif = 1 - EXPBIAS - shift;
-            float_raise(float_flag_input_denormal, &env->fp_status);
+            float_raise(float_flag_input_denormal_flushed, &env->fp_status);
         } else {
             expdif = EXPD(temp) - EXPBIAS;
         }
@@ -3261,7 +3261,7 @@ void update_mxcsr_from_sse_status(CPUX86State *env)
     uint8_t flags = get_float_exception_flags(&env->sse_status);
     /*
      * The MXCSR denormal flag has opposite semantics to
-     * float_flag_input_denormal (the softfloat code sets that flag
+     * float_flag_input_denormal_flushed (the softfloat code sets that flag
      * only when flushing input denormals to zero, but SSE sets it
      * only when not flushing them to zero), so is not converted
      * here.
diff --git a/target/mips/tcg/msa_helper.c b/target/mips/tcg/msa_helper.c
index 1d40383ca4f..aeab6a1d8b3 100644
--- a/target/mips/tcg/msa_helper.c
+++ b/target/mips/tcg/msa_helper.c
@@ -6231,7 +6231,7 @@ static inline int update_msacsr(CPUMIPSState *env, int action, int denormal)
     enable = GET_FP_ENABLE(env->active_tc.msacsr) | FP_UNIMPLEMENTED;
 
     /* Set Inexact (I) when flushing inputs to zero */
-    if ((ieee_exception_flags & float_flag_input_denormal) &&
+    if ((ieee_exception_flags & float_flag_input_denormal_flushed) &&
             (env->active_tc.msacsr & MSACSR_FS_MASK) != 0) {
         if (action & CLEAR_IS_INEXACT) {
             mips_exception_flags &= ~FP_INEXACT;
diff --git a/target/rx/op_helper.c b/target/rx/op_helper.c
index 691a12b2be1..59dd1ae6128 100644
--- a/target/rx/op_helper.c
+++ b/target/rx/op_helper.c
@@ -99,7 +99,7 @@ static void update_fpsw(CPURXState *env, float32 ret, uintptr_t retaddr)
         if (xcpt & float_flag_inexact) {
             SET_FPSW(X);
         }
-        if ((xcpt & (float_flag_input_denormal
+        if ((xcpt & (float_flag_input_denormal_flushed
                      | float_flag_output_denormal))
             && !FIELD_EX32(env->fpsw, FPSW, DN)) {
             env->fpsw = FIELD_DP32(env->fpsw, FPSW, CE, 1);
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 4bb341b2f94..ec2467e9fff 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -199,7 +199,7 @@ static void partsN(canonicalize)(FloatPartsN *p, float_status *status,
         if (likely(frac_eqz(p))) {
             p->cls = float_class_zero;
         } else if (status->flush_inputs_to_zero) {
-            float_raise(float_flag_input_denormal, status);
+            float_raise(float_flag_input_denormal_flushed, status);
             p->cls = float_class_zero;
             frac_clear(p);
         } else {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 20/76] fpu: Rename float_flag_output_denormal to float_flag_output_denormal_flushed
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (18 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 19/76] fpu: Rename float_flag_input_denormal to float_flag_input_denormal_flushed Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:26   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 21/76] fpu: Fix a comment in softfloat-types.h Peter Maydell
                   ` (57 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Our float_flag_output_denormal exception flag is set when
the fpu code flushes an output denormal to zero. Rename
it to float_flag_output_denormal_flushed:
 * this keeps it parallel with the flag for flushing
   input denormals, which we just renamed
 * it makes it clearer that it doesn't mean "set when
   the output is a denormal"

Commit created with
 for f in `git grep -l float_flag_output_denormal`; do sed -i -e 's/float_flag_output_denormal/float_flag_output_denormal_flushed/' $f; done

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/fpu/softfloat-types.h | 3 ++-
 fpu/softfloat.c               | 2 +-
 target/arm/vfp_helper.c       | 2 +-
 target/i386/tcg/fpu_helper.c  | 2 +-
 target/m68k/fpu_helper.c      | 2 +-
 target/mips/tcg/msa_helper.c  | 2 +-
 target/rx/op_helper.c         | 2 +-
 target/tricore/fpu_helper.c   | 6 +++---
 fpu/softfloat-parts.c.inc     | 2 +-
 9 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index 77bc172a074..4a806e3981a 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -156,7 +156,8 @@ enum {
     float_flag_inexact         = 0x0010,
     /* We flushed an input denormal to 0 (because of flush_inputs_to_zero) */
     float_flag_input_denormal_flushed = 0x0020,
-    float_flag_output_denormal = 0x0040,
+    /* We flushed an output denormal to 0 (because of flush_to_zero) */
+    float_flag_output_denormal_flushed = 0x0040,
     float_flag_invalid_isi     = 0x0080,  /* inf - inf */
     float_flag_invalid_imz     = 0x0100,  /* inf * 0 */
     float_flag_invalid_idi     = 0x0200,  /* inf / inf */
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 648050be6fb..26f3a8dc87e 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5017,7 +5017,7 @@ floatx80 roundAndPackFloatx80(FloatX80RoundPrec roundingPrecision, bool zSign,
         }
         if ( zExp <= 0 ) {
             if (status->flush_to_zero) {
-                float_raise(float_flag_output_denormal, status);
+                float_raise(float_flag_output_denormal_flushed, status);
                 return packFloatx80(zSign, 0, 0);
             }
             isTiny = status->tininess_before_rounding
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 444702a4600..3c8f3e65887 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -47,7 +47,7 @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
     if (host_bits & float_flag_overflow) {
         target_bits |= FPSR_OFC;
     }
-    if (host_bits & (float_flag_underflow | float_flag_output_denormal)) {
+    if (host_bits & (float_flag_underflow | float_flag_output_denormal_flushed)) {
         target_bits |= FPSR_UFC;
     }
     if (host_bits & float_flag_inexact) {
diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index 7151e809643..de6d0b252ec 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -3271,7 +3271,7 @@ void update_mxcsr_from_sse_status(CPUX86State *env)
                    (flags & float_flag_overflow ? FPUS_OE : 0) |
                    (flags & float_flag_underflow ? FPUS_UE : 0) |
                    (flags & float_flag_inexact ? FPUS_PE : 0) |
-                   (flags & float_flag_output_denormal ? FPUS_UE | FPUS_PE :
+                   (flags & float_flag_output_denormal_flushed ? FPUS_UE | FPUS_PE :
                     0));
 }
 
diff --git a/target/m68k/fpu_helper.c b/target/m68k/fpu_helper.c
index e3f4a188501..339b73ad7dc 100644
--- a/target/m68k/fpu_helper.c
+++ b/target/m68k/fpu_helper.c
@@ -175,7 +175,7 @@ static int cpu_m68k_exceptbits_from_host(int host_bits)
     if (host_bits & float_flag_overflow) {
         target_bits |= 0x40;
     }
-    if (host_bits & (float_flag_underflow | float_flag_output_denormal)) {
+    if (host_bits & (float_flag_underflow | float_flag_output_denormal_flushed)) {
         target_bits |= 0x20;
     }
     if (host_bits & float_flag_divbyzero) {
diff --git a/target/mips/tcg/msa_helper.c b/target/mips/tcg/msa_helper.c
index aeab6a1d8b3..ec38d9fde5e 100644
--- a/target/mips/tcg/msa_helper.c
+++ b/target/mips/tcg/msa_helper.c
@@ -6241,7 +6241,7 @@ static inline int update_msacsr(CPUMIPSState *env, int action, int denormal)
     }
 
     /* Set Inexact (I) and Underflow (U) when flushing outputs to zero */
-    if ((ieee_exception_flags & float_flag_output_denormal) &&
+    if ((ieee_exception_flags & float_flag_output_denormal_flushed) &&
             (env->active_tc.msacsr & MSACSR_FS_MASK) != 0) {
         mips_exception_flags |= FP_INEXACT;
         if (action & CLEAR_FS_UNDERFLOW) {
diff --git a/target/rx/op_helper.c b/target/rx/op_helper.c
index 59dd1ae6128..b3ed822dd11 100644
--- a/target/rx/op_helper.c
+++ b/target/rx/op_helper.c
@@ -100,7 +100,7 @@ static void update_fpsw(CPURXState *env, float32 ret, uintptr_t retaddr)
             SET_FPSW(X);
         }
         if ((xcpt & (float_flag_input_denormal_flushed
-                     | float_flag_output_denormal))
+                     | float_flag_output_denormal_flushed))
             && !FIELD_EX32(env->fpsw, FPSW, DN)) {
             env->fpsw = FIELD_DP32(env->fpsw, FPSW, CE, 1);
         }
diff --git a/target/tricore/fpu_helper.c b/target/tricore/fpu_helper.c
index 5d38aea143a..1b72dcc5f5c 100644
--- a/target/tricore/fpu_helper.c
+++ b/target/tricore/fpu_helper.c
@@ -43,7 +43,7 @@ static inline uint8_t f_get_excp_flags(CPUTriCoreState *env)
            & (float_flag_invalid
               | float_flag_overflow
               | float_flag_underflow
-              | float_flag_output_denormal
+              | float_flag_output_denormal_flushed
               | float_flag_divbyzero
               | float_flag_inexact);
 }
@@ -99,7 +99,7 @@ static void f_update_psw_flags(CPUTriCoreState *env, uint8_t flags)
         some_excp = 1;
     }
 
-    if (flags & float_flag_underflow || flags & float_flag_output_denormal) {
+    if (flags & float_flag_underflow || flags & float_flag_output_denormal_flushed) {
         env->FPU_FU = 1 << 31;
         some_excp = 1;
     }
@@ -109,7 +109,7 @@ static void f_update_psw_flags(CPUTriCoreState *env, uint8_t flags)
         some_excp = 1;
     }
 
-    if (flags & float_flag_inexact || flags & float_flag_output_denormal) {
+    if (flags & float_flag_inexact || flags & float_flag_output_denormal_flushed) {
         env->PSW |= 1 << 26;
         some_excp = 1;
     }
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index ec2467e9fff..73621f4a970 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -335,7 +335,7 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
         }
         frac_shr(p, frac_shift);
     } else if (s->flush_to_zero) {
-        flags |= float_flag_output_denormal;
+        flags |= float_flag_output_denormal_flushed;
         p->cls = float_class_zero;
         exp = 0;
         frac_clear(p);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 21/76] fpu: Fix a comment in softfloat-types.h
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (19 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 20/76] fpu: Rename float_flag_output_denormal to float_flag_output_denormal_flushed Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:27   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 22/76] fpu: Add float_class_denormal Peter Maydell
                   ` (56 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

In softfloat-types.h a comment documents that if the float_status
field flush_to_zero is set then we flush denormalised results to 0
and set the inexact flag.  This isn't correct: the status flag that
we set when flush_to_zero causes us to flush an output to zero is
float_flag_output_denormal_flushed.

Correct the comment.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/fpu/softfloat-types.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index 4a806e3981a..c177923e319 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -312,7 +312,7 @@ typedef struct float_status {
     Float3NaNPropRule float_3nan_prop_rule;
     FloatInfZeroNaNRule float_infzeronan_rule;
     bool tininess_before_rounding;
-    /* should denormalised results go to zero and set the inexact flag? */
+    /* should denormalised results go to zero and set output_denormal_flushed? */
     bool flush_to_zero;
     /* should denormalised inputs go to zero and set input_denormal_flushed? */
     bool flush_inputs_to_zero;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 22/76] fpu: Add float_class_denormal
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (20 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 21/76] fpu: Fix a comment in softfloat-types.h Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:31   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 23/76] fpu: Implement float_flag_input_denormal_used Peter Maydell
                   ` (55 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Currently in softfloat we canonicalize input denormals and so the
code that implements floating point operations does not need to care
whether the input value was originally normal or denormal.  However,
both x86 and Arm FEAT_AFP require that an exception flag is set if:
 * an input is denormal
 * that input is not squashed to zero
 * that input is actually used in the calculation (e.g. we
   did not find the other input was a NaN)

So we need to track that the input was a non-squashed denormal.  To
do this we add a new value to the FloatClass enum.  In this commit we
add the value and adjust the code everywhere that looks at FloatClass
values so that the new float_class_denormal behaves identically to
float_class_normal.  We will add the code that does the "raise a new
float exception flag if an input was an unsquashed denormal and we
used it" in a subsequent commit.

There should be no behavioural change in this commit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 fpu/softfloat.c           | 32 ++++++++++++++++++++++++++++---
 fpu/softfloat-parts.c.inc | 40 ++++++++++++++++++++++++---------------
 2 files changed, 54 insertions(+), 18 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 26f3a8dc87e..03a604c38ec 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -404,12 +404,16 @@ float64_gen2(float64 xa, float64 xb, float_status *s,
 /*
  * Classify a floating point number. Everything above float_class_qnan
  * is a NaN so cls >= float_class_qnan is any NaN.
+ *
+ * Note that we canonicalize denormals, so most code should treat
+ * class_normal and class_denormal identically.
  */
 
 typedef enum __attribute__ ((__packed__)) {
     float_class_unclassified,
     float_class_zero,
     float_class_normal,
+    float_class_denormal, /* input was a non-squashed denormal */
     float_class_inf,
     float_class_qnan,  /* all NaNs from here */
     float_class_snan,
@@ -420,12 +424,14 @@ typedef enum __attribute__ ((__packed__)) {
 enum {
     float_cmask_zero    = float_cmask(float_class_zero),
     float_cmask_normal  = float_cmask(float_class_normal),
+    float_cmask_denormal = float_cmask(float_class_denormal),
     float_cmask_inf     = float_cmask(float_class_inf),
     float_cmask_qnan    = float_cmask(float_class_qnan),
     float_cmask_snan    = float_cmask(float_class_snan),
 
     float_cmask_infzero = float_cmask_zero | float_cmask_inf,
     float_cmask_anynan  = float_cmask_qnan | float_cmask_snan,
+    float_cmask_anynorm = float_cmask_normal | float_cmask_denormal,
 };
 
 /* Flags for parts_minmax. */
@@ -459,6 +465,20 @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
     return c == float_class_qnan;
 }
 
+/*
+ * Return true if the float_cmask has only normals in it
+ * (including input denormals that were canonicalized)
+ */
+static inline bool cmask_is_only_normals(int cmask)
+{
+    return !(cmask & ~float_cmask_anynorm);
+}
+
+static inline bool is_anynorm(FloatClass c)
+{
+    return float_cmask(c) & float_cmask_anynorm;
+}
+
 /*
  * Structure holding all of the decomposed parts of a float.
  * The exponent is unbiased and the fraction is normalized.
@@ -1729,6 +1749,7 @@ static float64 float64r32_round_pack_canonical(FloatParts64 *p,
      */
     switch (p->cls) {
     case float_class_normal:
+    case float_class_denormal:
         if (unlikely(p->exp == 0)) {
             /*
              * The result is denormal for float32, but can be represented
@@ -1817,6 +1838,7 @@ static floatx80 floatx80_round_pack_canonical(FloatParts128 *p,
 
     switch (p->cls) {
     case float_class_normal:
+    case float_class_denormal:
         if (s->floatx80_rounding_precision == floatx80_precision_x) {
             parts_uncanon_normal(p, s, fmt);
             frac = p->frac_hi;
@@ -2697,6 +2719,7 @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
         break;
 
     case float_class_normal:
+    case float_class_denormal:
     case float_class_zero:
         break;
 
@@ -2729,7 +2752,7 @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
     a->sign = b->sign;
     a->exp = b->exp;
 
-    if (a->cls == float_class_normal) {
+    if (is_anynorm(a->cls)) {
         frac_truncjam(a, b);
     } else if (is_nan(a->cls)) {
         /* Discard the low bits of the NaN. */
@@ -3218,6 +3241,7 @@ static Int128 float128_to_int128_scalbn(float128 a, FloatRoundMode rmode,
         return int128_zero();
 
     case float_class_normal:
+    case float_class_denormal:
         if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
             flags = float_flag_inexact;
         }
@@ -3645,6 +3669,7 @@ static Int128 float128_to_uint128_scalbn(float128 a, FloatRoundMode rmode,
         return int128_zero();
 
     case float_class_normal:
+    case float_class_denormal:
         if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
             flags = float_flag_inexact;
             if (p.cls == float_class_zero) {
@@ -5231,6 +5256,8 @@ float32 float32_exp2(float32 a, float_status *status)
     float32_unpack_canonical(&xp, a, status);
     if (unlikely(xp.cls != float_class_normal)) {
         switch (xp.cls) {
+        case float_class_denormal:
+            break;
         case float_class_snan:
         case float_class_qnan:
             parts_return_nan(&xp, status);
@@ -5240,9 +5267,8 @@ float32 float32_exp2(float32 a, float_status *status)
         case float_class_zero:
             return float32_one;
         default:
-            break;
+            g_assert_not_reached();
         }
-        g_assert_not_reached();
     }
 
     float_raise(float_flag_inexact, status);
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 73621f4a970..8621cb87185 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -204,7 +204,7 @@ static void partsN(canonicalize)(FloatPartsN *p, float_status *status,
             frac_clear(p);
         } else {
             int shift = frac_normalize(p);
-            p->cls = float_class_normal;
+            p->cls = float_class_denormal;
             p->exp = fmt->frac_shift - fmt->exp_bias
                    - shift + !fmt->m68k_denormal;
         }
@@ -395,7 +395,7 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
 static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                             const FloatFmt *fmt)
 {
-    if (likely(p->cls == float_class_normal)) {
+    if (likely(is_anynorm(p->cls))) {
         parts_uncanon_normal(p, s, fmt);
     } else {
         switch (p->cls) {
@@ -435,7 +435,7 @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
 
     if (a->sign != b_sign) {
         /* Subtraction */
-        if (likely(ab_mask == float_cmask_normal)) {
+        if (likely(cmask_is_only_normals(ab_mask))) {
             if (parts_sub_normal(a, b)) {
                 return a;
             }
@@ -468,7 +468,7 @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
         }
     } else {
         /* Addition */
-        if (likely(ab_mask == float_cmask_normal)) {
+        if (likely(cmask_is_only_normals(ab_mask))) {
             parts_add_normal(a, b);
             return a;
         }
@@ -488,12 +488,12 @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
     }
 
     if (b->cls == float_class_zero) {
-        g_assert(a->cls == float_class_normal);
+        g_assert(is_anynorm(a->cls));
         return a;
     }
 
     g_assert(a->cls == float_class_zero);
-    g_assert(b->cls == float_class_normal);
+    g_assert(is_anynorm(b->cls));
  return_b:
     b->sign = b_sign;
     return b;
@@ -513,7 +513,7 @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
     bool sign = a->sign ^ b->sign;
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         FloatPartsW tmp;
 
         frac_mulw(&tmp, a, b);
@@ -596,7 +596,7 @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
         a->sign ^= 1;
     }
 
-    if (unlikely(ab_mask != float_cmask_normal)) {
+    if (unlikely(!cmask_is_only_normals(ab_mask))) {
         if (unlikely(ab_mask == float_cmask_infzero)) {
             float_raise(float_flag_invalid | float_flag_invalid_imz, s);
             goto d_nan;
@@ -611,7 +611,7 @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
         }
 
         g_assert(ab_mask & float_cmask_zero);
-        if (c->cls == float_class_normal) {
+        if (is_anynorm(c->cls)) {
             *a = *c;
             goto return_normal;
         }
@@ -692,7 +692,7 @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
     bool sign = a->sign ^ b->sign;
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         a->sign = sign;
         a->exp -= b->exp + frac_div(a, b);
         return a;
@@ -750,7 +750,7 @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
 {
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         frac_modrem(a, b, mod_quot);
         return a;
     }
@@ -800,6 +800,8 @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
 
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
+        case float_class_denormal:
+            break;
         case float_class_snan:
         case float_class_qnan:
             parts_return_nan(a, status);
@@ -1130,6 +1132,7 @@ static void partsN(round_to_int)(FloatPartsN *a, FloatRoundMode rmode,
     case float_class_inf:
         break;
     case float_class_normal:
+    case float_class_denormal:
         if (parts_round_to_int_normal(a, rmode, scale, fmt->frac_size)) {
             float_raise(float_flag_inexact, s);
         }
@@ -1174,6 +1177,7 @@ static int64_t partsN(float_to_sint)(FloatPartsN *p, FloatRoundMode rmode,
         return 0;
 
     case float_class_normal:
+    case float_class_denormal:
         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
         if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
             flags = float_flag_inexact;
@@ -1241,6 +1245,7 @@ static uint64_t partsN(float_to_uint)(FloatPartsN *p, FloatRoundMode rmode,
         return 0;
 
     case float_class_normal:
+    case float_class_denormal:
         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
         if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
             flags = float_flag_inexact;
@@ -1304,6 +1309,7 @@ static int64_t partsN(float_to_sint_modulo)(FloatPartsN *p,
         return 0;
 
     case float_class_normal:
+    case float_class_denormal:
         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
         if (parts_round_to_int_normal(p, rmode, 0, N - 2)) {
             flags = float_flag_inexact;
@@ -1452,9 +1458,10 @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
     a_exp = a->exp;
     b_exp = b->exp;
 
-    if (unlikely(ab_mask != float_cmask_normal)) {
+    if (unlikely(!cmask_is_only_normals(ab_mask))) {
         switch (a->cls) {
         case float_class_normal:
+        case float_class_denormal:
             break;
         case float_class_inf:
             a_exp = INT16_MAX;
@@ -1467,6 +1474,7 @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
         }
         switch (b->cls) {
         case float_class_normal:
+        case float_class_denormal:
             break;
         case float_class_inf:
             b_exp = INT16_MAX;
@@ -1513,7 +1521,7 @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
 {
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         FloatRelation cmp;
 
         if (a->sign != b->sign) {
@@ -1581,6 +1589,7 @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
     case float_class_inf:
         break;
     case float_class_normal:
+    case float_class_denormal:
         a->exp += MIN(MAX(n, -0x10000), 0x10000);
         break;
     default:
@@ -1599,6 +1608,8 @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
 
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
+        case float_class_denormal:
+            break;
         case float_class_snan:
         case float_class_qnan:
             parts_return_nan(a, s);
@@ -1615,9 +1626,8 @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
             }
             return;
         default:
-            break;
+            g_assert_not_reached();
         }
-        g_assert_not_reached();
     }
     if (unlikely(a->sign)) {
         goto d_nan;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 23/76] fpu: Implement float_flag_input_denormal_used
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (21 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 22/76] fpu: Add float_class_denormal Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 15:42   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 24/76] fpu: allow flushing of output denormals to be after rounding Peter Maydell
                   ` (54 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

For the x86 and the Arm FEAT_AFP semantics, we need to be able to
tell the target code that the FPU operation has used an input
denormal.  Implement this; when it happens we set the new
float_flag_denormal_input_used.

Note that we only set this when an input denormal is actually used by
the operation: if the operation results in Invalid Operation or
Divide By Zero or the result is a NaN because some other input was a
NaN then we never needed to look at the input denormal and do not set
denormal_input_used.

We mostly do not need to adjust the hardfloat codepaths to deal with
this flag, because almost all hardfloat operations are already gated
on the input not being a denormal, and will fall back to softfloat
for a denormal input.  The only exception is the comparison
operations, where we need to add the check for input denormals, which
must now fall back to softfloat where they did not before.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/fpu/softfloat-types.h |  7 ++++
 fpu/softfloat.c               | 37 +++++++++++++++++--
 fpu/softfloat-parts.c.inc     | 68 ++++++++++++++++++++++++++++++++++-
 3 files changed, 108 insertions(+), 4 deletions(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index c177923e319..b9b4e8e55fc 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -165,6 +165,13 @@ enum {
     float_flag_invalid_sqrt    = 0x0800,  /* sqrt(-x) */
     float_flag_invalid_cvti    = 0x1000,  /* non-nan to integer */
     float_flag_invalid_snan    = 0x2000,  /* any operand was snan */
+    /*
+     * An input was denormal and we used it (without flushing it to zero).
+     * Not set if we do not actually use the denormal input (e.g.
+     * because some other input was a NaN, or because the operation
+     * wasn't actually carried out (divide-by-zero; invalid))
+     */
+    float_flag_input_denormal_used = 0x4000,
 };
 
 /*
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 03a604c38ec..1b4046e81a9 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2718,8 +2718,10 @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
                                   float16_params_ahp.frac_size + 1);
         break;
 
-    case float_class_normal:
     case float_class_denormal:
+        float_raise(float_flag_input_denormal_used, s);
+        break;
+    case float_class_normal:
     case float_class_zero:
         break;
 
@@ -2733,6 +2735,9 @@ static void parts64_float_to_float(FloatParts64 *a, float_status *s)
     if (is_nan(a->cls)) {
         parts_return_nan(a, s);
     }
+    if (a->cls == float_class_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
 }
 
 static void parts128_float_to_float(FloatParts128 *a, float_status *s)
@@ -2740,6 +2745,9 @@ static void parts128_float_to_float(FloatParts128 *a, float_status *s)
     if (is_nan(a->cls)) {
         parts_return_nan(a, s);
     }
+    if (a->cls == float_class_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
 }
 
 #define parts_float_to_float(P, S) \
@@ -2752,12 +2760,21 @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
     a->sign = b->sign;
     a->exp = b->exp;
 
-    if (is_anynorm(a->cls)) {
+    switch (a->cls) {
+    case float_class_denormal:
+        float_raise(float_flag_input_denormal_used, s);
+        /* fall through */
+    case float_class_normal:
         frac_truncjam(a, b);
-    } else if (is_nan(a->cls)) {
+        break;
+    case float_class_snan:
+    case float_class_qnan:
         /* Discard the low bits of the NaN. */
         a->frac = b->frac_hi;
         parts_return_nan(a, s);
+        break;
+    default:
+        break;
     }
 }
 
@@ -2772,6 +2789,9 @@ static void parts_float_to_float_widen(FloatParts128 *a, FloatParts64 *b,
     if (is_nan(a->cls)) {
         parts_return_nan(a, s);
     }
+    if (a->cls == float_class_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
 }
 
 float32 float16_to_float32(float16 a, bool ieee, float_status *s)
@@ -4411,6 +4431,11 @@ float32_hs_compare(float32 xa, float32 xb, float_status *s, bool is_quiet)
         goto soft;
     }
 
+    if (unlikely(float32_is_denormal(ua.s) || float32_is_denormal(ub.s))) {
+        /* We may need to set the input_denormal_used flag */
+        goto soft;
+    }
+
     float32_input_flush2(&ua.s, &ub.s, s);
     if (isgreaterequal(ua.h, ub.h)) {
         if (isgreater(ua.h, ub.h)) {
@@ -4462,6 +4487,12 @@ float64_hs_compare(float64 xa, float64 xb, float_status *s, bool is_quiet)
     }
 
     float64_input_flush2(&ua.s, &ub.s, s);
+
+    if (unlikely(float64_is_denormal(ua.s) || float64_is_denormal(ub.s))) {
+        /* We may need to set the input_denormal_used flag */
+        goto soft;
+    }
+
     if (isgreaterequal(ua.h, ub.h)) {
         if (isgreater(ua.h, ub.h)) {
             return float_relation_greater;
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 8621cb87185..0122b35008a 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -433,6 +433,15 @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
     bool b_sign = b->sign ^ subtract;
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
+    /*
+     * For addition and subtraction, we will consume an
+     * input denormal unless the other input is a NaN.
+     */
+    if ((ab_mask & (float_cmask_denormal | float_cmask_anynan)) ==
+        float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     if (a->sign != b_sign) {
         /* Subtraction */
         if (likely(cmask_is_only_normals(ab_mask))) {
@@ -516,6 +525,10 @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     if (likely(cmask_is_only_normals(ab_mask))) {
         FloatPartsW tmp;
 
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
+
         frac_mulw(&tmp, a, b);
         frac_truncjam(a, &tmp);
 
@@ -541,6 +554,10 @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     }
 
     /* Multiply by 0 or Inf */
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     if (ab_mask & float_cmask_inf) {
         a->cls = float_class_inf;
         a->sign = sign;
@@ -664,6 +681,16 @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
     if (flags & float_muladd_negate_result) {
         a->sign ^= 1;
     }
+
+    /*
+     * All result types except for "return the default NaN
+     * because this is an Invalid Operation" go through here;
+     * this matches the set of cases where we consumed a
+     * denormal input.
+     */
+    if (abc_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
     return a;
 
  return_sub_zero:
@@ -693,6 +720,9 @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
     bool sign = a->sign ^ b->sign;
 
     if (likely(cmask_is_only_normals(ab_mask))) {
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
         a->sign = sign;
         a->exp -= b->exp + frac_div(a, b);
         return a;
@@ -713,6 +743,10 @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
         return parts_pick_nan(a, b, s);
     }
 
+    if ((ab_mask & float_cmask_denormal) && b->cls != float_class_zero) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     a->sign = sign;
 
     /* Inf / X */
@@ -751,6 +785,9 @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
     if (likely(cmask_is_only_normals(ab_mask))) {
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
         frac_modrem(a, b, mod_quot);
         return a;
     }
@@ -771,6 +808,10 @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
         return a;
     }
 
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     /* N % Inf; 0 % N */
     g_assert(b->cls == float_class_inf || a->cls == float_class_zero);
     return a;
@@ -801,6 +842,10 @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
         case float_class_denormal:
+            if (!a->sign) {
+                /* -ve denormal will be InvalidOperation */
+                float_raise(float_flag_input_denormal_used, status);
+            }
             break;
         case float_class_snan:
         case float_class_qnan:
@@ -1431,6 +1476,9 @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
         if ((flags & (minmax_isnum | minmax_isnumber))
             && !(ab_mask & float_cmask_snan)
             && (ab_mask & ~float_cmask_qnan)) {
+            if (ab_mask & float_cmask_denormal) {
+                float_raise(float_flag_input_denormal_used, s);
+            }
             return is_nan(a->cls) ? b : a;
         }
 
@@ -1455,6 +1503,10 @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
         return parts_pick_nan(a, b, s);
     }
 
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     a_exp = a->exp;
     b_exp = b->exp;
 
@@ -1524,6 +1576,10 @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
     if (likely(cmask_is_only_normals(ab_mask))) {
         FloatRelation cmp;
 
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
+
         if (a->sign != b->sign) {
             goto a_sign;
         }
@@ -1549,6 +1605,10 @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
         return float_relation_unordered;
     }
 
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     if (ab_mask & float_cmask_zero) {
         if (ab_mask == float_cmask_zero) {
             return float_relation_equal;
@@ -1588,8 +1648,10 @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
     case float_class_zero:
     case float_class_inf:
         break;
-    case float_class_normal:
     case float_class_denormal:
+        float_raise(float_flag_input_denormal_used, s);
+        /* fall through */
+    case float_class_normal:
         a->exp += MIN(MAX(n, -0x10000), 0x10000);
         break;
     default:
@@ -1609,6 +1671,10 @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
         case float_class_denormal:
+            if (!a->sign) {
+                /* -ve denormal will be InvalidOperation */
+                float_raise(float_flag_input_denormal_used, s);
+            }
             break;
         case float_class_snan:
         case float_class_qnan:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 24/76] fpu: allow flushing of output denormals to be after rounding
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (22 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 23/76] fpu: Implement float_flag_input_denormal_used Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 16:41   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 25/76] target/arm: Remove redundant advsimd float16 helpers Peter Maydell
                   ` (53 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Currently we handle flushing of output denormals in uncanon_normal
always before we deal with rounding.  This works for architectures
that detect tininess before rounding, but is usually not the right
place when the architecture detects tininess after rounding.  For
example, for x86 the SDM states that the MXCSR FTZ control bit causes
outputs to be flushed to zero "when it detects a floating-point
underflow condition".  This means that we mustn't flush to zero if
the input is such that after rounding it is no longer tiny.

At least one of our guest architectures does underflow detection
after rounding but flushing of denormals before rounding (MIPS MSA);
this means we need to have a config knob for this that is separate
from our existing tininess_before_rounding setting.

Add an ftz_detection flag.  For consistency with
tininess_before_rounding, we make it default to "detect ftz after
rounding"; this means that we need to explicitly set the flag to
"detect ftz before rounding" on every existing architecture that sets
flush_to_zero, so that this commit has no behaviour change.
(This means more code change here but for the long term a less
confusing API.)

For several architectures the current behaviour is either
definitely or possibly wrong; annotate those with TODO comments.
These architectures are definitely wrong (and should detect
ftz after rounding):
 * x86
 * Alpha

For these architectures the spec is unclear:
 * MIPS (for non-MSA)
 * RX
 * SH4

PA-RISC makes ftz detection IMPDEF, but we aren't setting the
"tininess before rounding" setting that we ought to.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/fpu/softfloat-helpers.h | 11 +++++++++++
 include/fpu/softfloat-types.h   | 18 ++++++++++++++++++
 target/mips/fpu_helper.h        |  6 ++++++
 target/alpha/cpu.c              |  7 +++++++
 target/arm/cpu.c                |  1 +
 target/hppa/fpu_helper.c        | 11 +++++++++++
 target/i386/tcg/fpu_helper.c    |  8 ++++++++
 target/mips/msa.c               |  9 +++++++++
 target/ppc/cpu_init.c           |  3 +++
 target/rx/cpu.c                 |  8 ++++++++
 target/sh4/cpu.c                |  8 ++++++++
 target/tricore/helper.c         |  1 +
 tests/fp/fp-bench.c             |  1 +
 fpu/softfloat-parts.c.inc       | 21 +++++++++++++++------
 14 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h
index 4cb30a48220..a4c1a4fa3b8 100644
--- a/include/fpu/softfloat-helpers.h
+++ b/include/fpu/softfloat-helpers.h
@@ -109,6 +109,12 @@ static inline void set_flush_inputs_to_zero(bool val, float_status *status)
     status->flush_inputs_to_zero = val;
 }
 
+static inline void set_float_detect_ftz(FloatFTZDetection d,
+                                        float_status *status)
+{
+    status->ftz_detection = d;
+}
+
 static inline void set_default_nan_mode(bool val, float_status *status)
 {
     status->default_nan_mode = val;
@@ -183,4 +189,9 @@ static inline bool get_default_nan_mode(const float_status *status)
     return status->default_nan_mode;
 }
 
+static inline FloatFTZDetection get_float_detect_ftz(const float_status *status)
+{
+    return status->ftz_detection;
+}
+
 #endif /* SOFTFLOAT_HELPERS_H */
diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index b9b4e8e55fc..77cfed9d52e 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -304,6 +304,22 @@ typedef enum __attribute__((__packed__)) {
     float_infzeronan_suppress_invalid = (1 << 2),
 } FloatInfZeroNaNRule;
 
+/*
+ * When flush_to_zero is set, should we detect denormal results to
+ * be flushed before or after rounding? For most architectures this
+ * should be set to match the tininess_before_rounding setting,
+ * but a few architectures, e.g. MIPS MSA, detect FTZ before
+ * rounding but tininess after rounding.
+ *
+ * This enum is arranged so that the default if the target doesn't
+ * configure it matches the default for tininess_before_rounding
+ * (i.e. "after rounding").
+ */
+typedef enum __attribute__((__packed__)) {
+    detect_ftz_after_rounding = 0,
+    detect_ftz_before_rounding = 1,
+} FloatFTZDetection;
+
 /*
  * Floating Point Status. Individual architectures may maintain
  * several versions of float_status for different functions. The
@@ -321,6 +337,8 @@ typedef struct float_status {
     bool tininess_before_rounding;
     /* should denormalised results go to zero and set output_denormal_flushed? */
     bool flush_to_zero;
+    /* do we detect and flush denormal results before or after rounding? */
+    FloatFTZDetection ftz_detection;
     /* should denormalised inputs go to zero and set input_denormal_flushed? */
     bool flush_inputs_to_zero;
     bool default_nan_mode;
diff --git a/target/mips/fpu_helper.h b/target/mips/fpu_helper.h
index 6ad1e466cfd..042f7e02c03 100644
--- a/target/mips/fpu_helper.h
+++ b/target/mips/fpu_helper.h
@@ -84,6 +84,12 @@ static inline void fp_reset(CPUMIPSState *env)
      */
     set_float_2nan_prop_rule(float_2nan_prop_s_ab,
                              &env->active_fpu.fp_status);
+    /*
+     * TODO: the spec does't say clearly whether FTZ happens before
+     * or after rounding for normal FPU operations.
+     */
+    set_float_detect_ftz(detect_ftz_before_rounding,
+                         &env->active_fpu.fp_status);
 }
 
 /* MSA */
diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index e1b898e5755..d4bffd58834 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -202,6 +202,13 @@ static void alpha_cpu_initfn(Object *obj)
     set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
     /* Default NaN: sign bit clear, msb frac bit set */
     set_float_default_nan_pattern(0b01000000, &env->fp_status);
+    /*
+     * TODO: this is incorrect. The Alpha Architecture Handbook version 4
+     * section 4.7.7.11 says that we flush to zero for underflow cases, so
+     * this should be detect_ftz_after_rounding to match the
+     * tininess_after_rounding (which is specified in section 4.7.5).
+     */
+    set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status);
 #if defined(CONFIG_USER_ONLY)
     env->flags = ENV_FLAG_PS_USER | ENV_FLAG_FEN;
     cpu_alpha_store_fpcr(env, (uint64_t)(FPCR_INVD | FPCR_DZED | FPCR_OVFD
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 7a83b9ee34f..0b4cd872d27 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -185,6 +185,7 @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
 static void arm_set_default_fp_behaviours(float_status *s)
 {
     set_float_detect_tininess(float_tininess_before_rounding, s);
+    set_float_detect_ftz(detect_ftz_before_rounding, s);
     set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
     set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
     set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
diff --git a/target/hppa/fpu_helper.c b/target/hppa/fpu_helper.c
index 239c027ec52..a0f01e3e734 100644
--- a/target/hppa/fpu_helper.c
+++ b/target/hppa/fpu_helper.c
@@ -67,6 +67,17 @@ void HELPER(loaded_fr0)(CPUHPPAState *env)
     set_float_infzeronan_rule(float_infzeronan_dnan_never, &env->fp_status);
     /* Default NaN: sign bit clear, msb-1 frac bit set */
     set_float_default_nan_pattern(0b00100000, &env->fp_status);
+    /*
+     * "PA-RISC 2.0 Architecture" says it is IMPDEF whether the flushing
+     * enabled by FPSR.D happens before or after rounding. We pick "before"
+     * for consistency with tininess detection.
+     */
+    set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status);
+    /*
+     * TODO: "PA-RISC 2.0 Architecture" chapter 10 says that we should
+     * detect tininess before rounding, but we don't set that here so we
+     * get the default tininess after rounding.
+     */
 }
 
 void cpu_hppa_loaded_fr0(CPUHPPAState *env)
diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index de6d0b252ec..9bf23fdd0f6 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -188,6 +188,14 @@ void cpu_init_fp_statuses(CPUX86State *env)
     set_float_default_nan_pattern(0b11000000, &env->fp_status);
     set_float_default_nan_pattern(0b11000000, &env->mmx_status);
     set_float_default_nan_pattern(0b11000000, &env->sse_status);
+    /*
+     * TODO: x86 does flush-to-zero detection after rounding (the SDM
+     * section 10.2.3.3 on the FTZ bit of MXCSR says that we flush
+     * when we detect underflow, which x86 does after rounding).
+     */
+    set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status);
+    set_float_detect_ftz(detect_ftz_before_rounding, &env->mmx_status);
+    set_float_detect_ftz(detect_ftz_before_rounding, &env->sse_status);
 }
 
 static inline uint8_t save_exception_flags(CPUX86State *env)
diff --git a/target/mips/msa.c b/target/mips/msa.c
index fc77bfc7b9a..2899577e8e5 100644
--- a/target/mips/msa.c
+++ b/target/mips/msa.c
@@ -48,6 +48,15 @@ void msa_reset(CPUMIPSState *env)
     /* tininess detected after rounding.*/
     set_float_detect_tininess(float_tininess_after_rounding,
                               &env->active_tc.msa_fp_status);
+    /*
+     * MSACSR.FS detects tiny results to flush to zero before rounding
+     * (per "MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD
+     * Architecture Module, Revision 1.1" section 3.5.4), even though it
+     * detects tininess after rounding for underflow purposes (section 3.4.2
+     * table 3.3).
+     */
+    set_float_detect_ftz(detect_ftz_before_rounding,
+                         &env->active_tc.msa_fp_status);
 
     /*
      * According to MIPS specifications, if one of the two operands is
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index c05c2dc42dc..8fa41307370 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7262,6 +7262,9 @@ static void ppc_cpu_reset_hold(Object *obj, ResetType type)
     /* tininess for underflow is detected before rounding */
     set_float_detect_tininess(float_tininess_before_rounding,
                               &env->fp_status);
+    /* Similarly for flush-to-zero */
+    set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status);
+
     /*
      * PowerPC propagation rules:
      *  1. A if it sNaN or qNaN
diff --git a/target/rx/cpu.c b/target/rx/cpu.c
index 8c50c7a1bc8..a18c3d81e38 100644
--- a/target/rx/cpu.c
+++ b/target/rx/cpu.c
@@ -103,6 +103,14 @@ static void rx_cpu_reset_hold(Object *obj, ResetType type)
     set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
     /* Default NaN value: sign bit clear, set frac msb */
     set_float_default_nan_pattern(0b01000000, &env->fp_status);
+    /*
+     * TODO: "RX Family RXv1 Instruction Set Architecture" is not 100% clear
+     * on whether flush-to-zero should happen before or after rounding, but
+     * section 1.3.2 says that it happens when underflow is detected, and
+     * implies that underflow is detected after rounding. So this may not
+     * be the correct setting.
+     */
+    set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status);
 }
 
 static ObjectClass *rx_cpu_class_by_name(const char *cpu_model)
diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
index 24a22724c61..cade4463119 100644
--- a/target/sh4/cpu.c
+++ b/target/sh4/cpu.c
@@ -130,6 +130,14 @@ static void superh_cpu_reset_hold(Object *obj, ResetType type)
     set_default_nan_mode(1, &env->fp_status);
     /* sign bit clear, set all frac bits other than msb */
     set_float_default_nan_pattern(0b00111111, &env->fp_status);
+    /*
+     * TODO: "SH-4 CPU Core Architecture ADCS 7182230F" doesn't say whether
+     * it detects tininess before or after rounding. Section 6.4 is clear
+     * that flush-to-zero happens when the result underflows, though, so
+     * either this should be "detect ftz after rounding" or else we should
+     * be setting "detect tininess before rounding".
+     */
+    set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status);
 }
 
 static void superh_cpu_disas_set_info(CPUState *cpu, disassemble_info *info)
diff --git a/target/tricore/helper.c b/target/tricore/helper.c
index e8b0ec51611..df4a2b5b9d8 100644
--- a/target/tricore/helper.c
+++ b/target/tricore/helper.c
@@ -116,6 +116,7 @@ void fpu_set_state(CPUTriCoreState *env)
     set_flush_inputs_to_zero(1, &env->fp_status);
     set_flush_to_zero(1, &env->fp_status);
     set_float_detect_tininess(float_tininess_before_rounding, &env->fp_status);
+    set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status);
     set_default_nan_mode(1, &env->fp_status);
     /* Default NaN pattern: sign bit clear, frac msb set */
     set_float_default_nan_pattern(0b01000000, &env->fp_status);
diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
index eacb39b99cb..9e3694bc4e1 100644
--- a/tests/fp/fp-bench.c
+++ b/tests/fp/fp-bench.c
@@ -496,6 +496,7 @@ static void run_bench(void)
     set_float_3nan_prop_rule(float_3nan_prop_s_cab, &soft_status);
     set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, &soft_status);
     set_float_default_nan_pattern(0b01000000, &soft_status);
+    set_float_detect_ftz(detect_ftz_before_rounding, &soft_status);
 
     f = bench_funcs[operation][precision];
     g_assert(f);
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 0122b35008a..324e67de259 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -334,7 +334,8 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
             p->frac_lo &= ~round_mask;
         }
         frac_shr(p, frac_shift);
-    } else if (s->flush_to_zero) {
+    } else if (s->flush_to_zero &&
+               s->ftz_detection == detect_ftz_before_rounding) {
         flags |= float_flag_output_denormal_flushed;
         p->cls = float_class_zero;
         exp = 0;
@@ -381,11 +382,19 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
         exp = (p->frac_hi & DECOMPOSED_IMPLICIT_BIT) && !fmt->m68k_denormal;
         frac_shr(p, frac_shift);
 
-        if (is_tiny && (flags & float_flag_inexact)) {
-            flags |= float_flag_underflow;
-        }
-        if (exp == 0 && frac_eqz(p)) {
-            p->cls = float_class_zero;
+        if (is_tiny) {
+            if (s->flush_to_zero) {
+                assert(s->ftz_detection == detect_ftz_after_rounding);
+                flags |= float_flag_output_denormal_flushed;
+                p->cls = float_class_zero;
+                exp = 0;
+                frac_clear(p);
+            } else if (flags & float_flag_inexact) {
+                flags |= float_flag_underflow;
+            }
+            if (exp == 0 && frac_eqz(p)) {
+                p->cls = float_class_zero;
+            }
         }
     }
     p->exp = exp;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 25/76] target/arm: Remove redundant advsimd float16 helpers
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (23 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 24/76] fpu: allow flushing of output denormals to be after rounding Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 16:59   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 26/76] target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions Peter Maydell
                   ` (52 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

The advsimd_addh etc helpers defined in helper-a64.c are identical to
the vfp_addh etc helpers defined in helper-vfp.c: both take two
float16 inputs (in a uint32_t type) plus a float_status* and are
simple wrappers around the softfloat float16_* functions.

(The duplication seems to be a historical accident: we added the
advsimd helpers in 2018 as part of the A64 implementation, and at
that time there was no f16 emulation in A32.  Then later we added the
A32 f16 handling by extending the existing VFP helper macros to
generate f16 versions as well as f32 and f64, and didn't realise we
could clean things up.)

Remove the now-unnecessary advsimd helpers and make the places that
generated calls to them use the vfp helpers instead. Many of the
helper functions were already unused.

(The remaining advsimd_ helpers are those which don't have vfp
versions.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-a64.h    |  8 --------
 target/arm/tcg/helper-a64.c    |  9 ---------
 target/arm/tcg/translate-a64.c | 16 ++++++++--------
 3 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index 0c120bf3883..bac12fbe55b 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -47,14 +47,6 @@ DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
 DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, fpst)
 DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
-DEF_HELPER_FLAGS_3(advsimd_maxh, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
-DEF_HELPER_FLAGS_3(advsimd_minh, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
-DEF_HELPER_FLAGS_3(advsimd_maxnumh, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
-DEF_HELPER_FLAGS_3(advsimd_minnumh, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
-DEF_HELPER_3(advsimd_addh, f16, f16, f16, fpst)
-DEF_HELPER_3(advsimd_subh, f16, f16, f16, fpst)
-DEF_HELPER_3(advsimd_mulh, f16, f16, f16, fpst)
-DEF_HELPER_3(advsimd_divh, f16, f16, f16, fpst)
 DEF_HELPER_3(advsimd_ceq_f16, i32, f16, f16, fpst)
 DEF_HELPER_3(advsimd_cge_f16, i32, f16, f16, fpst)
 DEF_HELPER_3(advsimd_cgt_f16, i32, f16, f16, fpst)
diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index 3b226daee78..05036089dd7 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -439,15 +439,6 @@ uint32_t ADVSIMD_HELPER(name, h)(uint32_t a, uint32_t b, float_status *fpst) \
     return float16_ ## name(a, b, fpst);    \
 }
 
-ADVSIMD_HALFOP(add)
-ADVSIMD_HALFOP(sub)
-ADVSIMD_HALFOP(mul)
-ADVSIMD_HALFOP(div)
-ADVSIMD_HALFOP(min)
-ADVSIMD_HALFOP(max)
-ADVSIMD_HALFOP(minnum)
-ADVSIMD_HALFOP(maxnum)
-
 #define ADVSIMD_TWOHALFOP(name)                                         \
 uint32_t ADVSIMD_HELPER(name, 2h)(uint32_t two_a, uint32_t two_b,       \
                                   float_status *fpst)                   \
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index b713a5f6025..74766a0bc47 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5101,28 +5101,28 @@ static const FPScalar f_scalar_fmul = {
 TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul)
 
 static const FPScalar f_scalar_fmax = {
-    gen_helper_advsimd_maxh,
+    gen_helper_vfp_maxh,
     gen_helper_vfp_maxs,
     gen_helper_vfp_maxd,
 };
 TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax)
 
 static const FPScalar f_scalar_fmin = {
-    gen_helper_advsimd_minh,
+    gen_helper_vfp_minh,
     gen_helper_vfp_mins,
     gen_helper_vfp_mind,
 };
 TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin)
 
 static const FPScalar f_scalar_fmaxnm = {
-    gen_helper_advsimd_maxnumh,
+    gen_helper_vfp_maxnumh,
     gen_helper_vfp_maxnums,
     gen_helper_vfp_maxnumd,
 };
 TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm)
 
 static const FPScalar f_scalar_fminnm = {
-    gen_helper_advsimd_minnumh,
+    gen_helper_vfp_minnumh,
     gen_helper_vfp_minnums,
     gen_helper_vfp_minnumd,
 };
@@ -6902,10 +6902,10 @@ static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a,
     return true;
 }
 
-TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_maxnumh)
-TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_minnumh)
-TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_maxh)
-TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_minh)
+TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxnumh)
+TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minnumh)
+TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxh)
+TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minh)
 
 TRANS(FMAXNMV_s, do_fp_reduction, a, gen_helper_vfp_maxnums)
 TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 26/76] target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (24 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 25/76] target/arm: Remove redundant advsimd float16 helpers Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:01   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 27/76] target/arm: Define FPCR AH, FIZ, NEP bits Peter Maydell
                   ` (51 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

We should be using the F16-specific float_status for conversions from
half-precision, because halfprec inputs never set Input Denormal.

Without FEAT_AHP, using the wrong fpst here had no effect, because
the only difference between the F16_A64 and A64 fpst is its handling
of flush-to-zero on input and output, and the helper functions
vfp_fcvt_f16_to_* and vfp_fcvt_*_to_f16 all explicitly squash the
relevant flushing flags, and flush_inputs_to_zero was the only way
that IDC could be set.

With FEAT_AHP, the FPCR.AH=1 behaviour sets IDC for
input_denormal_used, which we will only ignore in
vfp_get_fpsr_from_host() for the F16_A64 fpst; so it matters that we
use that one for f16 inputs (and the normal one for single/double to
f16 conversions).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 9 ++++++---
 target/arm/tcg/translate-sve.c | 4 ++--
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 74766a0bc47..a47fdcd2e48 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8568,7 +8568,7 @@ static bool trans_FCVT_s_sh(DisasContext *s, arg_rr *a)
     if (fp_access_check(s)) {
         TCGv_i32 tcg_rn = read_fp_hreg(s, a->rn);
         TCGv_i32 tcg_rd = tcg_temp_new_i32();
-        TCGv_ptr tcg_fpst = fpstatus_ptr(FPST_FPCR_A64);
+        TCGv_ptr tcg_fpst = fpstatus_ptr(FPST_FPCR_F16_A64);
         TCGv_i32 tcg_ahp = get_ahp_flag();
 
         gen_helper_vfp_fcvt_f16_to_f32(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
@@ -8582,7 +8582,7 @@ static bool trans_FCVT_s_dh(DisasContext *s, arg_rr *a)
     if (fp_access_check(s)) {
         TCGv_i32 tcg_rn = read_fp_hreg(s, a->rn);
         TCGv_i64 tcg_rd = tcg_temp_new_i64();
-        TCGv_ptr tcg_fpst = fpstatus_ptr(FPST_FPCR_A64);
+        TCGv_ptr tcg_fpst = fpstatus_ptr(FPST_FPCR_F16_A64);
         TCGv_i32 tcg_ahp = get_ahp_flag();
 
         gen_helper_vfp_fcvt_f16_to_f64(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
@@ -9511,13 +9511,14 @@ static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
         return true;
     }
 
-    fpst = fpstatus_ptr(FPST_FPCR_A64);
     if (a->esz == MO_64) {
         /* 32 -> 64 bit fp conversion */
         TCGv_i64 tcg_res[2];
         TCGv_i32 tcg_op = tcg_temp_new_i32();
         int srcelt = a->q ? 2 : 0;
 
+        fpst = fpstatus_ptr(FPST_FPCR_A64);
+
         for (pass = 0; pass < 2; pass++) {
             tcg_res[pass] = tcg_temp_new_i64();
             read_vec_element_i32(s, tcg_op, a->rn, srcelt + pass, MO_32);
@@ -9532,6 +9533,8 @@ static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
         TCGv_i32 tcg_res[4];
         TCGv_i32 ahp = get_ahp_flag();
 
+        fpst = fpstatus_ptr(FPST_FPCR_F16_A64);
+
         for (pass = 0; pass < 4; pass++) {
             tcg_res[pass] = tcg_temp_new_i32();
             read_vec_element_i32(s, tcg_res[pass], a->rn, srcelt + pass, MO_16);
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 37de816964a..fc7f0d077a5 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3887,7 +3887,7 @@ TRANS_FEAT(FCMLA_zzxz, aa64_sve, gen_gvec_fpst_zzzz, fcmla_idx_fns[a->esz],
 TRANS_FEAT(FCVT_sh, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvt_sh, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVT_hs, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvt_hs, a, 0, FPST_FPCR_A64)
+           gen_helper_sve_fcvt_hs, a, 0, FPST_FPCR_F16_A64)
 
 TRANS_FEAT(BFCVT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_bfcvt, a, 0, FPST_FPCR_A64)
@@ -3895,7 +3895,7 @@ TRANS_FEAT(BFCVT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
 TRANS_FEAT(FCVT_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvt_dh, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVT_hd, aa64_sve, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_fcvt_hd, a, 0, FPST_FPCR_A64)
+           gen_helper_sve_fcvt_hd, a, 0, FPST_FPCR_F16_A64)
 TRANS_FEAT(FCVT_ds, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvt_ds, a, 0, FPST_FPCR_A64)
 TRANS_FEAT(FCVT_sd, aa64_sve, gen_gvec_fpst_arg_zpz,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 27/76] target/arm: Define FPCR AH, FIZ, NEP bits
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (25 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 26/76] target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:08   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 28/76] target/arm: Implement FPCR.FIZ handling Peter Maydell
                   ` (50 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

The Armv8.7 FEAT_AFP feature defines three new control bits in
the FPCR:
 * FPCR.AH: "alternate floating point mode"; this changes floating
   point behaviour in a variety of ways, including:
    - the sign of a default NaN is 1, not 0
    - if FPCR.FZ is also 1, denormals detected after rounding
      with an unbounded exponent has been applied are flushed to zero
    - FPCR.FZ does not cause denormalized inputs to be flushed to zero
    - miscellaneous other corner-case behaviour changes
 * FPCR.FIZ: flush denormalized numbers to zero on input for
   most instructions
 * FPCR.NEP: makes scalar SIMD operations merge the result with
   higher vector elements in one of the source registers, instead
   of zeroing the higher elements of the destination

This commit defines the new bits in the FPCR, and allows them to be
read or written when FEAT_AFP is implemented.  Actual behaviour
changes will be implemented in subsequent commits.

Note that these are the first FPCR bits which don't appear in the
AArch32 FPSCR view of the register, and which share bit positions
with FPSR bits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu-features.h |  5 +++++
 target/arm/cpu.h          |  3 +++
 target/arm/vfp_helper.c   | 11 ++++++++---
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index 30302d6c5b4..7bf24c506b3 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -802,6 +802,11 @@ static inline bool isar_feature_aa64_hcx(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HCX) != 0;
 }
 
+static inline bool isar_feature_aa64_afp(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, AFP) != 0;
+}
+
 static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 2213c277348..7ba227ac4c5 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1713,6 +1713,9 @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
  */
 
 /* FPCR bits */
+#define FPCR_FIZ    (1 << 0)    /* Flush Inputs to Zero (FEAT_AFP) */
+#define FPCR_AH     (1 << 1)    /* Alternate Handling (FEAT_AFP) */
+#define FPCR_NEP    (1 << 2)    /* SIMD scalar ops preserve elts (FEAT_AFP) */
 #define FPCR_IOE    (1 << 8)    /* Invalid Operation exception trap enable */
 #define FPCR_DZE    (1 << 9)    /* Divide by Zero exception trap enable */
 #define FPCR_OFE    (1 << 10)   /* Overflow exception trap enable */
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 3c8f3e65887..8c79ab4fc8a 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -242,6 +242,9 @@ static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask)
     if (!cpu_isar_feature(any_fp16, cpu)) {
         val &= ~FPCR_FZ16;
     }
+    if (!cpu_isar_feature(aa64_afp, cpu)) {
+        val &= ~(FPCR_FIZ | FPCR_AH | FPCR_NEP);
+    }
 
     if (!cpu_isar_feature(aa64_ebf16, cpu)) {
         val &= ~FPCR_EBF;
@@ -271,12 +274,14 @@ static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask)
      * We don't implement trapped exception handling, so the
      * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
      *
-     * The FPCR bits we keep in vfp.fpcr are AHP, DN, FZ, RMode, EBF
-     * and FZ16. Len, Stride and LTPSIZE we just handled. Store those bits
+     * The FPCR bits we keep in vfp.fpcr are AHP, DN, FZ, RMode, EBF, FZ16,
+     * FIZ, AH, and NEP.
+     * Len, Stride and LTPSIZE we just handled. Store those bits
      * there, and zero any of the other FPCR bits and the RES0 and RAZ/WI
      * bits.
      */
-    val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16 | FPCR_EBF;
+    val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16 |
+        FPCR_EBF | FPCR_FIZ | FPCR_AH | FPCR_NEP;
     env->vfp.fpcr &= ~mask;
     env->vfp.fpcr |= val;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 28/76] target/arm: Implement FPCR.FIZ handling
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (26 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 27/76] target/arm: Define FPCR AH, FIZ, NEP bits Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:25   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 29/76] target/arm: Adjust FP behaviour for FPCR.AH = 1 Peter Maydell
                   ` (49 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Part of FEAT_AFP is the new control bit FPCR.FIZ.  This bit affects
flushing of single and double precision denormal inputs to zero for
AArch64 floating point instructions.  (For half-precision, the
existing FPCR.FZ16 control remains the only one.)

FPCR.FIZ differs from FPCR.FZ in that if we flush an input denormal
only because of FPCR.FIZ then we should *not* set the cumulative
exception bit FPSR.IDC.

FEAT_AFP also defines that in AArch64 the existing FPCR.FZ only
applies when FPCR.AH is 0.

We can implement this by setting the "flush inputs to zero" state
appropriately when FPCR is written, and by not reflecting the
float_flag_input_denormal status flag into FPSR reads when it is the
result only of FPSR.FIZ.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 58 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 48 insertions(+), 10 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 8c79ab4fc8a..5a0b389f7a3 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -61,19 +61,29 @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
 
 static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 {
-    uint32_t i = 0;
+    uint32_t a32_flags = 0, a64_flags = 0;
 
-    i |= get_float_exception_flags(&env->vfp.fp_status_a32);
-    i |= get_float_exception_flags(&env->vfp.fp_status_a64);
-    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
+    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
+    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
     /* FZ16 does not generate an input denormal exception.  */
-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
+    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
           & ~float_flag_input_denormal_flushed);
-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
           & ~float_flag_input_denormal_flushed);
-    i |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
+
+    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
+    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~float_flag_input_denormal_flushed);
-    return vfp_exceptbits_from_host(i);
+    /*
+     * Flushing an input denormal only because FPCR.FIZ == 1 does
+     * not set FPSR.IDC. So squash it unless (FPCR.AH == 0 && FPCR.FZ == 1).
+     * We only do this for the a64 flags because FIZ has no effect
+     * on AArch32 even if it is set.
+     */
+    if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
+        a64_flags &= ~float_flag_input_denormal_flushed;
+    }
+    return vfp_exceptbits_from_host(a32_flags | a64_flags);
 }
 
 static void vfp_clear_float_status_exc_flags(CPUARMState *env)
@@ -91,6 +101,17 @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
 }
 
+static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
+{
+    /*
+     * Synchronize any pending exception-flag information in the
+     * float_status values into env->vfp.fpsr, and then clear out
+     * the float_status data.
+     */
+    env->vfp.fpsr |= vfp_get_fpsr_from_host(env);
+    vfp_clear_float_status_exc_flags(env);
+}
+
 static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
 {
     uint64_t changed = env->vfp.fpcr;
@@ -130,9 +151,18 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
+        /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+    }
+    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
+        /*
+         * A64: Flush denormalized inputs to zero if FPCR.FIZ = 1, or
+         * both FPCR.AH = 0 and FPCR.FZ = 1.
+         */
+        bool fitz_enabled = (val & FPCR_FIZ) ||
+            (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ;
+        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status_a64);
     }
     if (changed & FPCR_DN) {
         bool dnan_enabled = val & FPCR_DN;
@@ -141,6 +171,14 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
     }
+    /*
+     * If any bits changed that we look at in vfp_get_fpsr_from_host(),
+     * we must sync the float_status flags into vfp.fpsr now (under the
+     * old regime) before we update vfp.fpcr.
+     */
+    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
+        vfp_sync_and_clear_float_status_exc_flags(env);
+    }
 }
 
 #else
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 29/76] target/arm: Adjust FP behaviour for FPCR.AH = 1
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (27 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 28/76] target/arm: Implement FPCR.FIZ handling Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:27   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 30/76] target/arm: Adjust exception flag handling for AH " Peter Maydell
                   ` (48 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

When FPCR.AH is set, various behaviours of AArch64 floating point
operations which are controlled by softfloat config settings change:
 * tininess and ftz detection before/after rounding
 * NaN propagation order
 * result of 0 * Inf + NaN
 * default NaN value

When the guest changes the value of the AH bit, switch these config
settings on the fp_status_a64 and fp_status_f16_a64 float_status
fields.

This requires us to make the arm_set_default_fp_behaviours() function
global, since we now need to call it from cpu.c and vfp_helper.c; we
move it to vfp_helper.c so it can be next to the new
arm_set_ah_fp_behaviours().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h  |  4 +++
 target/arm/cpu.c        | 23 -----------------
 target/arm/vfp_helper.c | 56 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 23 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 863a84edf81..98073acc276 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1828,4 +1828,8 @@ uint64_t gt_virt_cnt_offset(CPUARMState *env);
  * all EL1" scope; this covers stage 1 and stage 2.
  */
 int alle1_tlbmask(CPUARMState *env);
+
+/* Set the float_status behaviour to match the Arm defaults */
+void arm_set_default_fp_behaviours(float_status *s);
+
 #endif
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 0b4cd872d27..1ba22c4c7aa 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -169,29 +169,6 @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
     QLIST_INSERT_HEAD(&cpu->el_change_hooks, entry, node);
 }
 
-/*
- * Set the float_status behaviour to match the Arm defaults:
- *  * tininess-before-rounding
- *  * 2-input NaN propagation prefers SNaN over QNaN, and then
- *    operand A over operand B (see FPProcessNaNs() pseudocode)
- *  * 3-input NaN propagation prefers SNaN over QNaN, and then
- *    operand C over A over B (see FPProcessNaNs3() pseudocode,
- *    but note that for QEMU muladd is a * b + c, whereas for
- *    the pseudocode function the arguments are in the order c, a, b.
- *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
- *    and the input NaN if it is signalling
- *  * Default NaN has sign bit clear, msb frac bit set
- */
-static void arm_set_default_fp_behaviours(float_status *s)
-{
-    set_float_detect_tininess(float_tininess_before_rounding, s);
-    set_float_detect_ftz(detect_ftz_before_rounding, s);
-    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
-    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
-    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
-    set_float_default_nan_pattern(0b01000000, s);
-}
-
 static void cp_reg_reset(gpointer key, gpointer value, gpointer opaque)
 {
     /* Reset a single ARMCPRegInfo register */
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 5a0b389f7a3..7507ff24bc0 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -31,6 +31,50 @@
    Single precision routines have a "s" suffix, double precision a
    "d" suffix.  */
 
+/*
+ * Set the float_status behaviour to match the Arm defaults:
+ *  * tininess-before-rounding
+ *  * 2-input NaN propagation prefers SNaN over QNaN, and then
+ *    operand A over operand B (see FPProcessNaNs() pseudocode)
+ *  * 3-input NaN propagation prefers SNaN over QNaN, and then
+ *    operand C over A over B (see FPProcessNaNs3() pseudocode,
+ *    but note that for QEMU muladd is a * b + c, whereas for
+ *    the pseudocode function the arguments are in the order c, a, b.
+ *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
+ *    and the input NaN if it is signalling
+ *  * Default NaN has sign bit clear, msb frac bit set
+ */
+void arm_set_default_fp_behaviours(float_status *s)
+{
+    set_float_detect_tininess(float_tininess_before_rounding, s);
+    set_float_detect_ftz(detect_ftz_before_rounding, s);
+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
+    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
+    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
+    set_float_default_nan_pattern(0b01000000, s);
+}
+
+/*
+ * Set the float_status behaviour to match the FEAT_AFP
+ * FPCR.AH=1 requirements:
+ *  * tininess-after-rounding
+ *  * 2-input NaN propagation prefers the first NaN
+ *  * 3-input NaN propagation prefers a over b over c
+ *  * 0 * Inf + NaN always returns the input NaN and doesn't
+ *    set Invalid for a QNaN
+ *  * default NaN has sign bit set, msb frac bit set
+ */
+static void arm_set_ah_fp_behaviours(float_status *s)
+{
+    set_float_detect_tininess(float_tininess_after_rounding, s);
+    set_float_detect_ftz(detect_ftz_after_rounding, s);
+    set_float_2nan_prop_rule(float_2nan_prop_ab, s);
+    set_float_3nan_prop_rule(float_3nan_prop_abc, s);
+    set_float_infzeronan_rule(float_infzeronan_dnan_never |
+                              float_infzeronan_suppress_invalid, s);
+    set_float_default_nan_pattern(0b11000000, s);
+}
+
 #ifdef CONFIG_TCG
 
 /* Convert host exception flags to vfp form.  */
@@ -171,6 +215,18 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
     }
+    if (changed & FPCR_AH) {
+        bool ah_enabled = val & FPCR_AH;
+
+        if (ah_enabled) {
+            /* Change behaviours for A64 FP operations */
+            arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
+            arm_set_ah_fp_behaviours(&env->vfp.fp_status_f16_a64);
+        } else {
+            arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
+            arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
+        }
+    }
     /*
      * If any bits changed that we look at in vfp_get_fpsr_from_host(),
      * we must sync the float_status flags into vfp.fpsr now (under the
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 30/76] target/arm: Adjust exception flag handling for AH = 1
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (28 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 29/76] target/arm: Adjust FP behaviour for FPCR.AH = 1 Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:29   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 31/76] target/arm: Add FPCR.AH to tbflags Peter Maydell
                   ` (47 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

When FPCR.AH = 1, some of the cumulative exception flags in the FPSR
behave slightly differently for A64 operations:
 * IDC is set when a denormal input is used without flushing
 * IXC (Inexact) is set when an output denormal is flushed to zero

Update vfp_get_fpsr_from_host() to do this.

Note that because half-precision operations never set IDC, we now
need to add float_flag_input_denormal_used to the set we mask out of
fp_status_f16_a64.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 7507ff24bc0..2eb75bd7ecc 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -78,7 +78,7 @@ static void arm_set_ah_fp_behaviours(float_status *s)
 #ifdef CONFIG_TCG
 
 /* Convert host exception flags to vfp form.  */
-static inline uint32_t vfp_exceptbits_from_host(int host_bits)
+static inline uint32_t vfp_exceptbits_from_host(int host_bits, bool ah)
 {
     uint32_t target_bits = 0;
 
@@ -100,6 +100,16 @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
     if (host_bits & float_flag_input_denormal_flushed) {
         target_bits |= FPSR_IDC;
     }
+    /*
+     * With FPCR.AH, IDC is set when an input denormal is used,
+     * and flushing an output denormal to zero sets both IXC and UFC.
+     */
+    if (ah && (host_bits & float_flag_input_denormal_used)) {
+        target_bits |= FPSR_IDC;
+    }
+    if (ah && (host_bits & float_flag_output_denormal_flushed)) {
+        target_bits |= FPSR_IXC;
+    }
     return target_bits;
 }
 
@@ -117,7 +127,7 @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 
     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
-          & ~float_flag_input_denormal_flushed);
+          & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
     /*
      * Flushing an input denormal only because FPCR.FIZ == 1 does
      * not set FPSR.IDC. So squash it unless (FPCR.AH == 0 && FPCR.FZ == 1).
@@ -127,7 +137,8 @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
         a64_flags &= ~float_flag_input_denormal_flushed;
     }
-    return vfp_exceptbits_from_host(a32_flags | a64_flags);
+    return vfp_exceptbits_from_host(a64_flags, env->vfp.fpcr & FPCR_AH) |
+        vfp_exceptbits_from_host(a32_flags, false);
 }
 
 static void vfp_clear_float_status_exc_flags(CPUARMState *env)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 31/76] target/arm: Add FPCR.AH to tbflags
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (29 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 30/76] target/arm: Adjust exception flag handling for AH " Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:30   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 32/76] target/arm: Set up float_status to use for FPCR.AH=1 behaviour Peter Maydell
                   ` (46 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

We are going to need to generate different code in some cases when
FPCR.AH is 1.  For example:
 * Floating point neg and abs must not flip the sign bit of NaNs
 * some insns (FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, and various
   BFCVT and BFM bfloat16 ops) need to use a different float_status
   to the usual one

Encode FPCR.AH into the A64 tbflags, so we can refer to it at
translate time.

Because we now have a bit in FPCR that affects codegen, we can't mark
the AArch64 FPCR register as being SUPPRESS_TB_END any more; writes
to it will now end the TB and trigger a regeneration of hflags.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h               | 1 +
 target/arm/tcg/translate.h     | 2 ++
 target/arm/helper.c            | 2 +-
 target/arm/tcg/hflags.c        | 4 ++++
 target/arm/tcg/translate-a64.c | 1 +
 5 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 7ba227ac4c5..c8b44c725d0 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3197,6 +3197,7 @@ FIELD(TBFLAG_A64, NV2, 34, 1)
 FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
 /* Set if FEAT_NV2 RAM accesses are big-endian */
 FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
+FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
 
 /*
  * Helpers for using the above. Note that only the A64 accessors use
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index ec4c0cf03fc..c37c0b539e2 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -155,6 +155,8 @@ typedef struct DisasContext {
     bool nv2_mem_e20;
     /* True if NV2 enabled and NV2 RAM accesses are big-endian */
     bool nv2_mem_be;
+    /* True if FPCR.AH is 1 (alternate floating point handling) */
+    bool fpcr_ah;
     /*
      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
      *  < 0, set by the current instruction.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 40bdfc851a5..7d95eae9971 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4848,7 +4848,7 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = aa64_daif_write, .resetfn = arm_cp_reset_ignore },
     { .name = "FPCR", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .opc2 = 0, .crn = 4, .crm = 4,
-      .access = PL0_RW, .type = ARM_CP_FPU | ARM_CP_SUPPRESS_TB_END,
+      .access = PL0_RW, .type = ARM_CP_FPU,
       .readfn = aa64_fpcr_read, .writefn = aa64_fpcr_write },
     { .name = "FPSR", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .opc2 = 1, .crn = 4, .crm = 4,
diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index f03977b4b00..b3a78564ec1 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -404,6 +404,10 @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
         DP_TBFLAG_A64(flags, TCMA, aa64_va_parameter_tcma(tcr, mmu_idx));
     }
 
+    if (env->vfp.fpcr & FPCR_AH) {
+        DP_TBFLAG_A64(flags, AH, 1);
+    }
+
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 }
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index a47fdcd2e48..556da6d23cd 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -9666,6 +9666,7 @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->nv2 = EX_TBFLAG_A64(tb_flags, NV2);
     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
+    dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
     dc->vec_len = 0;
     dc->vec_stride = 0;
     dc->cp_regs = arm_cpu->cp_regs;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 32/76] target/arm: Set up float_status to use for FPCR.AH=1 behaviour
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (30 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 31/76] target/arm: Add FPCR.AH to tbflags Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:36   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 33/76] target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS Peter Maydell
                   ` (45 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

When FPCR.AH is 1, the behaviour of some instructions changes:
 * AdvSIMD BFCVT, BFCVTN, BFCVTN2, BFMLALB, BFMLALT
 * SVE BFCVT, BFCVTNT, BFMLALB, BFMLALT, BFMLSLB, BFMLSLT
 * SME BFCVT, BFCVTN, BFMLAL, BFMLSL (these are all in SME2 which
   QEMU does not yet implement)
 * FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS

The behaviour change is:
 * the instructions do not update the FPSR cumulative exception flags
 * trapped floating point exceptions are disabled (a no-op for QEMU,
   which doesn't implement FPCR.{IDE,IXE,UFE,OFE,DZE,IOE})
 * rounding is always round-to-nearest-even regardless of FPCR.RMode
 * denormalized inputs and outputs are always flushed to zero, as if
   FPCR.{FZ,FIZ} is {1,1}
 * FPCR.FZ16 is still honoured for half-precision inputs

(See the Arm ARM DDI0487L.a section A1.5.9.)

We can provide all these behaviours with another pair of float_status fields
which we use only for these insns, when FPCR.AH is 1. These float_status
fields will always have:
 * flush_to_zero and flush_inputs_to_zero set for the non-F16 field
 * rounding mode set to round-to-nearest-even
and so the only FPCR fields they need to honour are DN and FZ16.

In this commit we only define the new fp_status fields and give them
the required behaviour when FPSR is updated.  In subsequent commits
we will arrange to use this new fp_status field for the instructions
that should be affected by FPCR.AH in this way.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
I'm not super enthusiastic about the ah_fp_status naming, which sort
of suggests it's always to be used when AH=1, rather than "for this
specific group of insns when AH=1". But I couldn't think of a better
name that was still reasonably short...
---
 target/arm/cpu.h           | 15 +++++++++++++++
 target/arm/internals.h     |  2 ++
 target/arm/tcg/translate.h | 14 ++++++++++++++
 target/arm/cpu.c           |  4 ++++
 target/arm/vfp_helper.c    | 13 ++++++++++++-
 5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c8b44c725d0..cfb16151577 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -640,6 +640,13 @@ typedef struct CPUArchState {
          *  standard_fp_status : the ARM "Standard FPSCR Value"
          *  standard_fp_status_fp16 : used for half-precision
          *       calculations with the ARM "Standard FPSCR Value"
+         *  ah_fp_status: used for the A64 insns which change behaviour
+         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+         *       and the reciprocal and square root estimate/step insns)
+         *  ah_fp_status_f16: used for the A64 insns which change behaviour
+         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+         *       and the reciprocal and square root estimate/step insns);
+         *       for half-precision
          *
          * Half-precision operations are governed by a separate
          * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
@@ -654,6 +661,12 @@ typedef struct CPUArchState {
          * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
          * using a fixed value for it.
          *
+         * The ah_fp_status is needed because some insns have different
+         * behaviour when FPCR.AH == 1: they don't update cumulative
+         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
+         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
+         * which means we need an ah_fp_status_f16 as well.
+         *
          * To avoid having to transfer exception bits around, we simply
          * say that the FPSCR cumulative exception flags are the logical
          * OR of the flags in the four fp statuses. This relies on the
@@ -666,6 +679,8 @@ typedef struct CPUArchState {
         float_status fp_status_f16_a64;
         float_status standard_fp_status;
         float_status standard_fp_status_f16;
+        float_status ah_fp_status;
+        float_status ah_fp_status_f16;
 
         uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
         uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 98073acc276..b3187341456 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1831,5 +1831,7 @@ int alle1_tlbmask(CPUARMState *env);
 
 /* Set the float_status behaviour to match the Arm defaults */
 void arm_set_default_fp_behaviours(float_status *s);
+/* Set the float_status behaviour to match Arm FPCR.AH=1 behaviour */
+void arm_set_ah_fp_behaviours(float_status *s);
 
 #endif
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index c37c0b539e2..d6edd8db76b 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -676,6 +676,8 @@ typedef enum ARMFPStatusFlavour {
     FPST_FPCR_A64,
     FPST_FPCR_F16_A32,
     FPST_FPCR_F16_A64,
+    FPST_FPCR_AH,
+    FPST_FPCR_AH_F16,
     FPST_STD,
     FPST_STD_F16,
 } ARMFPStatusFlavour;
@@ -696,6 +698,12 @@ typedef enum ARMFPStatusFlavour {
  *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
  * FPST_FPCR_F16_A64
  *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
+ * FPST_FPCR_AH:
+ *   for AArch64 operations which change behaviour when AH=1 (specifically,
+ *   bfloat16 conversions and multiplies, and the reciprocal and square root
+ *   estimate/step insns)
+ * FPST_FPCR_AH_F16:
+ *   ditto, but for half-precision operations
  * FPST_STD
  *   for A32/T32 Neon operations using the "standard FPSCR value"
  * FPST_STD_F16
@@ -719,6 +727,12 @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
     case FPST_FPCR_F16_A64:
         offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
         break;
+    case FPST_FPCR_AH:
+        offset = offsetof(CPUARMState, vfp.ah_fp_status);
+        break;
+    case FPST_FPCR_AH_F16:
+        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
+        break;
     case FPST_STD:
         offset = offsetof(CPUARMState, vfp.standard_fp_status);
         break;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 1ba22c4c7aa..8fa220a7165 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -556,6 +556,10 @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
+    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
+    set_flush_to_zero(1, &env->vfp.ah_fp_status);
+    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
+    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
 
 #ifndef CONFIG_USER_ONLY
     if (kvm_enabled()) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 2eb75bd7ecc..50a8a659577 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -64,7 +64,7 @@ void arm_set_default_fp_behaviours(float_status *s)
  *    set Invalid for a QNaN
  *  * default NaN has sign bit set, msb frac bit set
  */
-static void arm_set_ah_fp_behaviours(float_status *s)
+void arm_set_ah_fp_behaviours(float_status *s)
 {
     set_float_detect_tininess(float_tininess_after_rounding, s);
     set_float_detect_ftz(detect_ftz_after_rounding, s);
@@ -128,6 +128,11 @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
+    /*
+     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
+     * they are used for insns that must not set the cumulative exception bits.
+     */
+
     /*
      * Flushing an input denormal only because FPCR.FIZ == 1 does
      * not set FPSR.IDC. So squash it unless (FPCR.AH == 0 && FPCR.FZ == 1).
@@ -154,6 +159,8 @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
     set_float_exception_flags(0, &env->vfp.standard_fp_status);
     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
+    set_float_exception_flags(0, &env->vfp.ah_fp_status);
+    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
 }
 
 static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
@@ -199,9 +206,11 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
     }
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
@@ -225,6 +234,8 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
+        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
+        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
     }
     if (changed & FPCR_AH) {
         bool ah_enabled = val & FPCR_AH;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 33/76] target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (31 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 32/76] target/arm: Set up float_status to use for FPCR.AH=1 behaviour Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:40   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 34/76] target/arm: Use FPST_FPCR_AH for BFCVT* insns Peter Maydell
                   ` (44 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

For the instructions FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, use
FPST_FPCR_AH or FPST_FPCR_AH_F16 when FPCR.AH is 1, so that they get
the required behaviour changes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
select_fpst() is another function I'm not super happy wit hthe
naming of, because again it should only be used for the subset
of insns which have this particular behaviour, but the current
name kind of implies more generality than that. Suggestions welcome.
---
 target/arm/tcg/translate.h     |  13 ++++
 target/arm/tcg/translate-a64.c | 119 +++++++++++++++++++++++++--------
 target/arm/tcg/translate-sve.c |  30 ++++++---
 3 files changed, 127 insertions(+), 35 deletions(-)

diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index d6edd8db76b..680ca52a181 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -746,6 +746,19 @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
     return statusptr;
 }
 
+/*
+ * Return the ARMFPStatusFlavour to use based on element size and
+ * whether FPCR.AH is set.
+ */
+static inline ARMFPStatusFlavour select_fpst(DisasContext *s, MemOp esz)
+{
+    if (s->fpcr_ah) {
+        return esz == MO_16 ? FPST_FPCR_AH_F16 : FPST_FPCR_AH;
+    } else {
+        return esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64;
+    }
+}
+
 /**
  * finalize_memop_atom:
  * @s: DisasContext
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 556da6d23cd..2a0c5e23e74 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -723,10 +723,10 @@ static void gen_gvec_op3_ool(DisasContext *s, bool is_q, int rd,
  * an out-of-line helper.
  */
 static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn,
-                              int rm, bool is_fp16, int data,
+                              int rm, ARMFPStatusFlavour fpsttype, int data,
                               gen_helper_gvec_3_ptr *fn)
 {
-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
+    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
     tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn),
                        vec_full_reg_offset(s, rm), fpst,
@@ -5036,14 +5036,16 @@ typedef struct FPScalar {
     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
 } FPScalar;
 
-static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
+                                        const FPScalar *f,
+                                        ARMFPStatusFlavour fpsttype)
 {
     switch (a->esz) {
     case MO_64:
         if (fp_access_check(s)) {
             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
             TCGv_i64 t1 = read_fp_dreg(s, a->rm);
-            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_FPCR_A64));
+            f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
             write_fp_dreg(s, a->rd, t0);
         }
         break;
@@ -5051,7 +5053,7 @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
         if (fp_access_check(s)) {
             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
             TCGv_i32 t1 = read_fp_sreg(s, a->rm);
-            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_FPCR_A64));
+            f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -5062,7 +5064,7 @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
         if (fp_access_check(s)) {
             TCGv_i32 t0 = read_fp_hreg(s, a->rn);
             TCGv_i32 t1 = read_fp_hreg(s, a->rm);
-            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_FPCR_F16_A64));
+            f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -5072,6 +5074,18 @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
     return true;
 }
 
+static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+{
+    return do_fp3_scalar_with_fpsttype(s, a, f,
+                                       a->esz == MO_16 ?
+                                       FPST_FPCR_F16_A64 : FPST_FPCR_A64);
+}
+
+static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+{
+    return do_fp3_scalar_with_fpsttype(s, a, f, select_fpst(s, a->esz));
+}
+
 static const FPScalar f_scalar_fadd = {
     gen_helper_vfp_addh,
     gen_helper_vfp_adds,
@@ -5225,14 +5239,14 @@ static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f32,
     gen_helper_recpsf_f64,
 };
-TRANS(FRECPS_s, do_fp3_scalar, a, &f_scalar_frecps)
+TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
 
 static const FPScalar f_scalar_frsqrts = {
     gen_helper_rsqrtsf_f16,
     gen_helper_rsqrtsf_f32,
     gen_helper_rsqrtsf_f64,
 };
-TRANS(FRSQRTS_s, do_fp3_scalar, a, &f_scalar_frsqrts)
+TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
 
 static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                        const FPScalar *f, bool swap)
@@ -5483,8 +5497,10 @@ TRANS(CMHS_s, do_cmop_d, a, TCG_COND_GEU)
 TRANS(CMEQ_s, do_cmop_d, a, TCG_COND_EQ)
 TRANS(CMTST_s, do_cmop_d, a, TCG_COND_TSTNE)
 
-static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
-                          gen_helper_gvec_3_ptr * const fns[3])
+static bool do_fp3_vector_with_fpsttype(DisasContext *s, arg_qrrr_e *a,
+                                        int data,
+                                        gen_helper_gvec_3_ptr * const fns[3],
+                                        ARMFPStatusFlavour fpsttype)
 {
     MemOp esz = a->esz;
     int check = fp_access_check_vector_hsd(s, a->q, esz);
@@ -5493,11 +5509,26 @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
         return check == 0;
     }
 
-    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
-                      esz == MO_16, data, fns[esz - 1]);
+    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm, fpsttype,
+                      data, fns[esz - 1]);
     return true;
 }
 
+static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
+                          gen_helper_gvec_3_ptr * const fns[3])
+{
+    return do_fp3_vector_with_fpsttype(s, a, data, fns,
+                                       a->esz == MO_16 ?
+                                       FPST_FPCR_F16_A64 :FPST_FPCR_A64);
+}
+
+static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
+                             gen_helper_gvec_3_ptr * const f[3])
+{
+    return do_fp3_vector_with_fpsttype(s, a, data, f,
+                                       select_fpst(s, a->esz));
+}
+
 static gen_helper_gvec_3_ptr * const f_vector_fadd[3] = {
     gen_helper_gvec_fadd_h,
     gen_helper_gvec_fadd_s,
@@ -5622,14 +5653,14 @@ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
     gen_helper_gvec_recps_s,
     gen_helper_gvec_recps_d,
 };
-TRANS(FRECPS_v, do_fp3_vector, a, 0, f_vector_frecps)
+TRANS(FRECPS_v, do_fp3_vector_ah, a, 0, f_vector_frecps)
 
 static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
     gen_helper_gvec_rsqrts_h,
     gen_helper_gvec_rsqrts_s,
     gen_helper_gvec_rsqrts_d,
 };
-TRANS(FRSQRTS_v, do_fp3_vector, a, 0, f_vector_frsqrts)
+TRANS(FRSQRTS_v, do_fp3_vector_ah, a, 0, f_vector_frsqrts)
 
 static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
     gen_helper_gvec_faddp_h,
@@ -6385,7 +6416,8 @@ static bool do_fp3_vector_idx(DisasContext *s, arg_qrrx_e *a,
     }
 
     gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
-                      esz == MO_16, a->idx, fns[esz - 1]);
+                      esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64,
+                      a->idx, fns[esz - 1]);
     return true;
 }
 
@@ -8394,8 +8426,9 @@ typedef struct FPScalar1 {
     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_ptr);
 } FPScalar1;
 
-static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
-                          const FPScalar1 *f, int rmode)
+static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
+                                        const FPScalar1 *f, int rmode,
+                                        ARMFPStatusFlavour fpsttype)
 {
     TCGv_i32 tcg_rmode = NULL;
     TCGv_ptr fpst;
@@ -8407,7 +8440,7 @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
+    fpst = fpstatus_ptr(fpsttype);
     if (rmode >= 0) {
         tcg_rmode = gen_set_rmode(rmode, fpst);
     }
@@ -8438,6 +8471,20 @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
     return true;
 }
 
+static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
+                          const FPScalar1 *f, int rmode)
+{
+    return do_fp1_scalar_with_fpsttype(s, a, f, rmode,
+                                       a->esz == MO_16 ?
+                                       FPST_FPCR_F16_A64 : FPST_FPCR_A64);
+}
+
+static bool do_fp1_scalar_ah(DisasContext *s, arg_rr_e *a,
+                             const FPScalar1 *f, int rmode)
+{
+    return do_fp1_scalar_with_fpsttype(s, a, f, rmode, select_fpst(s, a->esz));
+}
+
 static const FPScalar1 f_scalar_fsqrt = {
     gen_helper_vfp_sqrth,
     gen_helper_vfp_sqrts,
@@ -8492,21 +8539,21 @@ static const FPScalar1 f_scalar_frecpe = {
     gen_helper_recpe_f32,
     gen_helper_recpe_f64,
 };
-TRANS(FRECPE_s, do_fp1_scalar, a, &f_scalar_frecpe, -1)
+TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
 
 static const FPScalar1 f_scalar_frecpx = {
     gen_helper_frecpx_f16,
     gen_helper_frecpx_f32,
     gen_helper_frecpx_f64,
 };
-TRANS(FRECPX_s, do_fp1_scalar, a, &f_scalar_frecpx, -1)
+TRANS(FRECPX_s, do_fp1_scalar_ah, a, &f_scalar_frecpx, -1)
 
 static const FPScalar1 f_scalar_frsqrte = {
     gen_helper_rsqrte_f16,
     gen_helper_rsqrte_f32,
     gen_helper_rsqrte_f64,
 };
-TRANS(FRSQRTE_s, do_fp1_scalar, a, &f_scalar_frsqrte, -1)
+TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
 
 static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
 {
@@ -9361,9 +9408,10 @@ TRANS_FEAT(FRINT64Z_v, aa64_frint, do_fp1_vector, a,
            &f_scalar_frint64, FPROUNDING_ZERO)
 TRANS_FEAT(FRINT64X_v, aa64_frint, do_fp1_vector, a, &f_scalar_frint64, -1)
 
-static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
-                             int rd, int rn, int data,
-                             gen_helper_gvec_2_ptr * const fns[3])
+static bool do_gvec_op2_fpst_with_fpsttype(DisasContext *s, MemOp esz,
+                                           bool is_q, int rd, int rn, int data,
+                                           gen_helper_gvec_2_ptr * const fns[3],
+                                           ARMFPStatusFlavour fpsttype)
 {
     int check = fp_access_check_vector_hsd(s, is_q, esz);
     TCGv_ptr fpst;
@@ -9372,7 +9420,7 @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
+    fpst = fpstatus_ptr(fpsttype);
     tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn), fpst,
                        is_q ? 16 : 8, vec_full_reg_size(s),
@@ -9380,6 +9428,23 @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
     return true;
 }
 
+static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
+                             int rd, int rn, int data,
+                             gen_helper_gvec_2_ptr * const fns[3])
+{
+    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data, fns,
+                                          esz == MO_16 ? FPST_FPCR_F16_A64 :
+                                          FPST_FPCR_A64);
+}
+
+static bool do_gvec_op2_ah_fpst(DisasContext *s, MemOp esz, bool is_q,
+                                int rd, int rn, int data,
+                                gen_helper_gvec_2_ptr * const fns[3])
+{
+    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data,
+                                          fns, select_fpst(s, esz));
+}
+
 static gen_helper_gvec_2_ptr * const f_scvtf_v[] = {
     gen_helper_gvec_vcvt_sh,
     gen_helper_gvec_vcvt_sf,
@@ -9489,14 +9554,14 @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
     gen_helper_gvec_frecpe_s,
     gen_helper_gvec_frecpe_d,
 };
-TRANS(FRECPE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
+TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
 
 static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
     gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s,
     gen_helper_gvec_frsqrte_d,
 };
-TRANS(FRSQRTE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
+TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
 
 static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index fc7f0d077a5..8ed8677baa8 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -137,11 +137,11 @@ static bool gen_gvec_fpst_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
     return true;
 }
 
-static bool gen_gvec_fpst_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
-                                 arg_rr_esz *a, int data)
+static bool gen_gvec_fpst_ah_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
+                                    arg_rr_esz *a, int data)
 {
     return gen_gvec_fpst_zz(s, fn, a->rd, a->rn, data,
-                            a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
+                            select_fpst(s, a->esz));
 }
 
 /* Invoke an out-of-line helper on 3 Zregs. */
@@ -194,6 +194,13 @@ static bool gen_gvec_fpst_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
                              a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
 }
 
+static bool gen_gvec_fpst_ah_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
+                                     arg_rrr_esz *a, int data)
+{
+    return gen_gvec_fpst_zzz(s, fn, a->rd, a->rn, a->rm, data,
+                             select_fpst(s, a->esz));
+}
+
 /* Invoke an out-of-line helper on 4 Zregs. */
 static bool gen_gvec_ool_zzzz(DisasContext *s, gen_helper_gvec_4 *fn,
                               int rd, int rn, int rm, int ra, int data)
@@ -3597,13 +3604,13 @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
     NULL,                     gen_helper_gvec_frecpe_h,
     gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
 };
-TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_arg_zz, frecpe_fns[a->esz], a, 0)
+TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
 
 static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
     NULL,                      gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
 };
-TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_arg_zz, frsqrte_fns[a->esz], a, 0)
+TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
 
 /*
  *** SVE Floating Point Compare with Zero Group
@@ -3707,11 +3714,18 @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
     };                                                              \
     TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_arg_zzz, name##_fns[a->esz], a, 0)
 
+#define DO_FP3_AH(NAME, name) \
+    static gen_helper_gvec_3_ptr * const name##_fns[4] = {          \
+        NULL, gen_helper_gvec_##name##_h,                           \
+        gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
+    };                                                              \
+    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz, name##_fns[a->esz], a, 0)
+
 DO_FP3(FADD_zzz, fadd)
 DO_FP3(FSUB_zzz, fsub)
 DO_FP3(FMUL_zzz, fmul)
-DO_FP3(FRECPS, recps)
-DO_FP3(FRSQRTS, rsqrts)
+DO_FP3_AH(FRECPS, recps)
+DO_FP3_AH(FRSQRTS, rsqrts)
 
 #undef DO_FP3
 
@@ -3993,7 +4007,7 @@ static gen_helper_gvec_3_ptr * const frecpx_fns[] = {
     gen_helper_sve_frecpx_s, gen_helper_sve_frecpx_d,
 };
 TRANS_FEAT(FRECPX, aa64_sve, gen_gvec_fpst_arg_zpz, frecpx_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
+           a, 0, select_fpst(s, a->esz))
 
 static gen_helper_gvec_3_ptr * const fsqrt_fns[] = {
     NULL,                   gen_helper_sve_fsqrt_h,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 34/76] target/arm: Use FPST_FPCR_AH for BFCVT* insns
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (32 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 33/76] target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:42   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 35/76] target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns Peter Maydell
                   ` (43 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

When FPCR.AH is 1, use FPST_FPCR_AH for:
 * AdvSIMD BFCVT, BFCVTN, BFCVTN2
 * SVE BFCVT, BFCVTNT

so that they get the required behaviour changes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 27 +++++++++++++++++++++------
 target/arm/tcg/translate-sve.c |  6 ++++--
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 2a0c5e23e74..d53864ad794 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8514,7 +8514,7 @@ TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
 static const FPScalar1 f_scalar_bfcvt = {
     .gen_s = gen_helper_bfcvt,
 };
-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar, a, &f_scalar_bfcvt, -1)
+TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
 
 static const FPScalar1 f_scalar_frint32 = {
     NULL,
@@ -9290,12 +9290,27 @@ static void gen_bfcvtn_hs(TCGv_i64 d, TCGv_i64 n)
     tcg_gen_extu_i32_i64(d, tmp);
 }
 
-static ArithOneOp * const f_vector_bfcvtn[] = {
-    NULL,
-    gen_bfcvtn_hs,
-    NULL,
+static void gen_bfcvtn_ah_hs(TCGv_i64 d, TCGv_i64 n)
+{
+    TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_AH);
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    gen_helper_bfcvt_pair(tmp, n, fpst);
+    tcg_gen_extu_i32_i64(d, tmp);
+}
+
+static ArithOneOp * const f_vector_bfcvtn[2][3] = {
+    {
+        NULL,
+        gen_bfcvtn_hs,
+        NULL,
+    }, {
+        NULL,
+        gen_bfcvtn_ah_hs,
+        NULL,
+    }
 };
-TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a, f_vector_bfcvtn)
+TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a,
+           f_vector_bfcvtn[s->fpcr_ah])
 
 static bool trans_SHLL_v(DisasContext *s, arg_qrr_e *a)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 8ed8677baa8..4d77b55d545 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3904,7 +3904,8 @@ TRANS_FEAT(FCVT_hs, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvt_hs, a, 0, FPST_FPCR_F16_A64)
 
 TRANS_FEAT(BFCVT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_bfcvt, a, 0, FPST_FPCR_A64)
+           gen_helper_sve_bfcvt, a, 0,
+           s->fpcr_ah ? FPST_FPCR_AH : FPST_FPCR_A64)
 
 TRANS_FEAT(FCVT_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvt_dh, a, 0, FPST_FPCR_A64)
@@ -7054,7 +7055,8 @@ TRANS_FEAT(FCVTNT_ds, aa64_sve2, gen_gvec_fpst_arg_zpz,
            gen_helper_sve2_fcvtnt_ds, a, 0, FPST_FPCR_A64)
 
 TRANS_FEAT(BFCVTNT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_bfcvtnt, a, 0, FPST_FPCR_A64)
+           gen_helper_sve_bfcvtnt, a, 0,
+           s->fpcr_ah ? FPST_FPCR_AH : FPST_FPCR_A64)
 
 TRANS_FEAT(FCVTLT_hs, aa64_sve2, gen_gvec_fpst_arg_zpz,
            gen_helper_sve2_fcvtlt_hs, a, 0, FPST_FPCR_A64)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 35/76] target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (33 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 34/76] target/arm: Use FPST_FPCR_AH for BFCVT* insns Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:44   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 36/76] target/arm: Add FPCR.NEP to TBFLAGS Peter Maydell
                   ` (42 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

When FPCR.AH is 1, use FPST_FPCR_AH for:
 * AdvSIMD BFMLALB, BFMLALT
 * SVE BFMLALB, BFMLALT, BFMLSLB, BFMLSLT

so that they get the required behaviour changes.

We do this by making gen_gvec_op4_fpst() take an ARMFPStatusFlavour
rather than a bool is_fp16; existing callsites now select
FPST_FPCR_F16_A64 vs FPST_FPCR_A64 themselves rather than passing in
the boolean.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 20 +++++++++++++-------
 target/arm/tcg/translate-sve.c |  6 ++++--
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index d53864ad794..0b3e4ec136d 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -765,10 +765,11 @@ static void gen_gvec_op4_env(DisasContext *s, bool is_q, int rd, int rn,
  * an out-of-line helper.
  */
 static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
-                              int rm, int ra, bool is_fp16, int data,
+                              int rm, int ra, ARMFPStatusFlavour fpsttype,
+                              int data,
                               gen_helper_gvec_4_ptr *fn)
 {
-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
+    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
     tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn),
                        vec_full_reg_offset(s, rm),
@@ -5837,7 +5838,8 @@ static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
     }
     if (fp_access_check(s)) {
         /* Q bit selects BFMLALB vs BFMLALT. */
-        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, false, a->q,
+        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
+                          s->fpcr_ah ? FPST_FPCR_AH : FPST_FPCR_A64, a->q,
                           gen_helper_gvec_bfmlal);
     }
     return true;
@@ -5870,7 +5872,8 @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
     }
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                      a->esz == MO_16, a->rot, fn[a->esz]);
+                      a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64,
+                      a->rot, fn[a->esz]);
     return true;
 }
 
@@ -6450,7 +6453,8 @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
     }
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                      esz == MO_16, (a->idx << 1) | neg,
+                      esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64,
+                      (a->idx << 1) | neg,
                       fns[esz - 1]);
     return true;
 }
@@ -6585,7 +6589,8 @@ static bool trans_BFMLAL_vi(DisasContext *s, arg_qrrx_e *a)
     }
     if (fp_access_check(s)) {
         /* Q bit selects BFMLALB vs BFMLALT. */
-        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, 0,
+        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
+                          s->fpcr_ah ? FPST_FPCR_AH : FPST_FPCR_A64,
                           (a->idx << 1) | a->q,
                           gen_helper_gvec_bfmlal_idx);
     }
@@ -6614,7 +6619,8 @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
     }
     if (fp_access_check(s)) {
         gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                          a->esz == MO_16, (a->idx << 2) | a->rot, fn);
+                          a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64,
+                          (a->idx << 2) | a->rot, fn);
     }
     return true;
 }
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 4d77b55d545..ad415c43565 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -7117,7 +7117,8 @@ TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_env_arg_zzzz,
 static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
 {
     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal,
-                              a->rd, a->rn, a->rm, a->ra, sel, FPST_FPCR_A64);
+                              a->rd, a->rn, a->rm, a->ra, sel,
+                              s->fpcr_ah ? FPST_FPCR_AH : FPST_FPCR_A64);
 }
 
 TRANS_FEAT(BFMLALB_zzzw, aa64_sve_bf16, do_BFMLAL_zzzw, a, false)
@@ -7127,7 +7128,8 @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
 {
     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal_idx,
                               a->rd, a->rn, a->rm, a->ra,
-                              (a->index << 1) | sel, FPST_FPCR_A64);
+                              (a->index << 1) | sel,
+                              s->fpcr_ah ? FPST_FPCR_AH : FPST_FPCR_A64);
 }
 
 TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 36/76] target/arm: Add FPCR.NEP to TBFLAGS
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (34 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 35/76] target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:45   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 37/76] target/arm: Define and use new write_fp_*reg_merging() functions Peter Maydell
                   ` (41 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

For FEAT_AFP, we want to emit different code when FPCR.NEP is set, so
that instead of zeroing the high elements of a vector register when
we write the output of a scalar operation to it, we instead merge in
those elements from one of the source registers.  Since this affects
the generated code, we need to put FPCR.NEP into the TBFLAGS.

FPCR.NEP is treated as 0 when in streaming SVE mode and FEAT_SME_FA64
is not implemented or not enabled; we can implement this logic in
rebuild_hflags_a64().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h               | 1 +
 target/arm/tcg/translate.h     | 2 ++
 target/arm/tcg/hflags.c        | 9 +++++++++
 target/arm/tcg/translate-a64.c | 1 +
 4 files changed, 13 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index cfb16151577..f562e0687c9 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3213,6 +3213,7 @@ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
 /* Set if FEAT_NV2 RAM accesses are big-endian */
 FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
 FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
+FIELD(TBFLAG_A64, NEP, 38, 1)   /* FPCR.NEP */
 
 /*
  * Helpers for using the above. Note that only the A64 accessors use
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 680ca52a181..59e780df2ee 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -157,6 +157,8 @@ typedef struct DisasContext {
     bool nv2_mem_be;
     /* True if FPCR.AH is 1 (alternate floating point handling) */
     bool fpcr_ah;
+    /* True if FPCR.NEP is 1 (FEAT_AFP scalar upper-element result handling) */
+    bool fpcr_nep;
     /*
      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
      *  < 0, set by the current instruction.
diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index b3a78564ec1..9e6a1869f94 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -407,6 +407,15 @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
     if (env->vfp.fpcr & FPCR_AH) {
         DP_TBFLAG_A64(flags, AH, 1);
     }
+    if (env->vfp.fpcr & FPCR_NEP) {
+        /*
+         * In streaming-SVE without FA64, NEP behaves as if zero;
+         * compare pseudocode IsMerging()
+         */
+        if (!(EX_TBFLAG_A64(flags, PSTATE_SM) && !sme_fa64(env, el))) {
+            DP_TBFLAG_A64(flags, NEP, 1);
+        }
+    }
 
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 }
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 0b3e4ec136d..d34672a8ba6 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -9753,6 +9753,7 @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
     dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
+    dc->fpcr_nep = EX_TBFLAG_A64(tb_flags, NEP);
     dc->vec_len = 0;
     dc->vec_stride = 0;
     dc->cp_regs = arm_cpu->cp_regs;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 37/76] target/arm: Define and use new write_fp_*reg_merging() functions
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (35 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 36/76] target/arm: Add FPCR.NEP to TBFLAGS Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:52   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 38/76] target/arm: Handle FPCR.NEP for 3-input scalar operations Peter Maydell
                   ` (40 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

For FEAT_AFP's FPCR.NEP bit, we need to programmatically change the
behaviour of the writeback of the result for most SIMD scalar
operations, so that instead of zeroing the upper part of the result
register it merges the upper elements from one of the input
registers.

Provide new functions write_fp_*reg_merging() which can be used
instead of the existing write_fp_*reg() functions when we want this
"merge the result with one of the input registers if FPCR.NEP is
enabled" handling, and use them in do_fp3_scalar_with_fpsttype().

Note that (as documented in the description of the FPCR.NEP bit)
which input register to use as the merge source varies by
instruction: for these 2-input scalar operations, the comparison
instructions take from Rm, not Rn.

We'll extend this to also provide the merging behaviour for
the remaining scalar insns in subsequent commits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 117 +++++++++++++++++++++++++--------
 1 file changed, 91 insertions(+), 26 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index d34672a8ba6..19a4ae14c15 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -665,6 +665,68 @@ static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
     write_fp_dreg(s, reg, tmp);
 }
 
+/*
+ * Write a double result to 128 bit vector register reg, honouring FPCR.NEP:
+ * - if FPCR.NEP == 0, clear the high elements of reg
+ * - if FPCR.NEP == 1, set the high elements of reg from mergereg
+ *   (i.e. merge the result with those high elements)
+ * In either case, SVE register bits above 128 are zeroed (per R_WKYLB).
+ */
+static void write_fp_dreg_merging(DisasContext *s, int reg, int mergereg,
+                                  TCGv_i64 v)
+{
+    if (!s->fpcr_nep) {
+        write_fp_dreg(s, reg, v);
+        return;
+    }
+
+    /*
+     * Move from mergereg to reg; this sets the high elements and
+     * clears the bits above 128 as a side effect.
+     */
+    tcg_gen_gvec_mov(MO_64, fp_reg_offset(s, reg, MO_64),
+                     fp_reg_offset(s, mergereg, MO_64),
+                     16, vec_full_reg_size(s));
+    tcg_gen_st_i64(v, tcg_env, fp_reg_offset(s, reg, MO_64));
+}
+
+/*
+ * Write a single-prec result, but only clear the higher elements
+ * of the destination register if FPCR.NEP is 0; otherwise preserve them.
+ */
+static void write_fp_sreg_merging(DisasContext *s, int reg, int mergereg,
+                                  TCGv_i32 v)
+{
+    if (!s->fpcr_nep) {
+        write_fp_sreg(s, reg, v);
+        return;
+    }
+
+    tcg_gen_gvec_mov(MO_64, fp_reg_offset(s, reg, MO_64),
+                     fp_reg_offset(s, mergereg, MO_64),
+                     16, vec_full_reg_size(s));
+    tcg_gen_st_i32(v, tcg_env, fp_reg_offset(s, reg, MO_32));
+}
+
+/*
+ * Write a half-prec result, but only clear the higher elements
+ * of the destination register if FPCR.NEP is 0; otherwise preserve them.
+ * The caller must ensure that the top 16 bits of v are zero.
+ */
+static void write_fp_hreg_merging(DisasContext *s, int reg, int mergereg,
+                                  TCGv_i32 v)
+{
+    if (!s->fpcr_nep) {
+        write_fp_sreg(s, reg, v);
+        return;
+    }
+
+    tcg_gen_gvec_mov(MO_64, fp_reg_offset(s, reg, MO_64),
+                     fp_reg_offset(s, mergereg, MO_64),
+                     16, vec_full_reg_size(s));
+    tcg_gen_st16_i32(v, tcg_env, fp_reg_offset(s, reg, MO_16));
+}
+
 /* Expand a 2-operand AdvSIMD vector operation using an expander function.  */
 static void gen_gvec_fn2(DisasContext *s, bool is_q, int rd, int rn,
                          GVecGen2Fn *gvec_fn, int vece)
@@ -5038,7 +5100,7 @@ typedef struct FPScalar {
 } FPScalar;
 
 static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
-                                        const FPScalar *f,
+                                        const FPScalar *f, int mergereg,
                                         ARMFPStatusFlavour fpsttype)
 {
     switch (a->esz) {
@@ -5047,7 +5109,7 @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
             TCGv_i64 t1 = read_fp_dreg(s, a->rm);
             f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
-            write_fp_dreg(s, a->rd, t0);
+            write_fp_dreg_merging(s, a->rd, mergereg, t0);
         }
         break;
     case MO_32:
@@ -5055,7 +5117,7 @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
             TCGv_i32 t1 = read_fp_sreg(s, a->rm);
             f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_sreg_merging(s, a->rd, mergereg, t0);
         }
         break;
     case MO_16:
@@ -5066,7 +5128,7 @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
             TCGv_i32 t0 = read_fp_hreg(s, a->rn);
             TCGv_i32 t1 = read_fp_hreg(s, a->rm);
             f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_hreg_merging(s, a->rd, mergereg, t0);
         }
         break;
     default:
@@ -5075,16 +5137,19 @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
     return true;
 }
 
-static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+                          int mergereg)
 {
-    return do_fp3_scalar_with_fpsttype(s, a, f,
+    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
                                        a->esz == MO_16 ?
                                        FPST_FPCR_F16_A64 : FPST_FPCR_A64);
 }
 
-static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+                             int mergereg)
 {
-    return do_fp3_scalar_with_fpsttype(s, a, f, select_fpst(s, a->esz));
+    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
+                                       select_fpst(s, a->esz));
 }
 
 static const FPScalar f_scalar_fadd = {
@@ -5092,63 +5157,63 @@ static const FPScalar f_scalar_fadd = {
     gen_helper_vfp_adds,
     gen_helper_vfp_addd,
 };
-TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd)
+TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd, a->rn)
 
 static const FPScalar f_scalar_fsub = {
     gen_helper_vfp_subh,
     gen_helper_vfp_subs,
     gen_helper_vfp_subd,
 };
-TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub)
+TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub, a->rn)
 
 static const FPScalar f_scalar_fdiv = {
     gen_helper_vfp_divh,
     gen_helper_vfp_divs,
     gen_helper_vfp_divd,
 };
-TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv)
+TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv, a->rn)
 
 static const FPScalar f_scalar_fmul = {
     gen_helper_vfp_mulh,
     gen_helper_vfp_muls,
     gen_helper_vfp_muld,
 };
-TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul)
+TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul, a->rn)
 
 static const FPScalar f_scalar_fmax = {
     gen_helper_vfp_maxh,
     gen_helper_vfp_maxs,
     gen_helper_vfp_maxd,
 };
-TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax)
+TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
 
 static const FPScalar f_scalar_fmin = {
     gen_helper_vfp_minh,
     gen_helper_vfp_mins,
     gen_helper_vfp_mind,
 };
-TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin)
+TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
 
 static const FPScalar f_scalar_fmaxnm = {
     gen_helper_vfp_maxnumh,
     gen_helper_vfp_maxnums,
     gen_helper_vfp_maxnumd,
 };
-TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm)
+TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm, a->rn)
 
 static const FPScalar f_scalar_fminnm = {
     gen_helper_vfp_minnumh,
     gen_helper_vfp_minnums,
     gen_helper_vfp_minnumd,
 };
-TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm)
+TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm, a->rn)
 
 static const FPScalar f_scalar_fmulx = {
     gen_helper_advsimd_mulxh,
     gen_helper_vfp_mulxs,
     gen_helper_vfp_mulxd,
 };
-TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx)
+TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx, a->rn)
 
 static void gen_fnmul_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 {
@@ -5173,42 +5238,42 @@ static const FPScalar f_scalar_fnmul = {
     gen_fnmul_s,
     gen_fnmul_d,
 };
-TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul)
+TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
 
 static const FPScalar f_scalar_fcmeq = {
     gen_helper_advsimd_ceq_f16,
     gen_helper_neon_ceq_f32,
     gen_helper_neon_ceq_f64,
 };
-TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq)
+TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq, a->rm)
 
 static const FPScalar f_scalar_fcmge = {
     gen_helper_advsimd_cge_f16,
     gen_helper_neon_cge_f32,
     gen_helper_neon_cge_f64,
 };
-TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge)
+TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge, a->rm)
 
 static const FPScalar f_scalar_fcmgt = {
     gen_helper_advsimd_cgt_f16,
     gen_helper_neon_cgt_f32,
     gen_helper_neon_cgt_f64,
 };
-TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt)
+TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt, a->rm)
 
 static const FPScalar f_scalar_facge = {
     gen_helper_advsimd_acge_f16,
     gen_helper_neon_acge_f32,
     gen_helper_neon_acge_f64,
 };
-TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge)
+TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge, a->rm)
 
 static const FPScalar f_scalar_facgt = {
     gen_helper_advsimd_acgt_f16,
     gen_helper_neon_acgt_f32,
     gen_helper_neon_acgt_f64,
 };
-TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt)
+TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt, a->rm)
 
 static void gen_fabd_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 {
@@ -5233,21 +5298,21 @@ static const FPScalar f_scalar_fabd = {
     gen_fabd_s,
     gen_fabd_d,
 };
-TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd)
+TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
 
 static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f16,
     gen_helper_recpsf_f32,
     gen_helper_recpsf_f64,
 };
-TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
+TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
 
 static const FPScalar f_scalar_frsqrts = {
     gen_helper_rsqrtsf_f16,
     gen_helper_rsqrtsf_f32,
     gen_helper_rsqrtsf_f64,
 };
-TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
+TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
 
 static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                        const FPScalar *f, bool swap)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 38/76] target/arm: Handle FPCR.NEP for 3-input scalar operations
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (36 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 37/76] target/arm: Define and use new write_fp_*reg_merging() functions Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:53   ` Richard Henderson
  2025-01-24 16:27 ` [PATCH 39/76] target/arm: Handle FPCR.NEP for BFCVT scalar Peter Maydell
                   ` (39 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Handle FPCR.NEP for the 3-input scalar operations which use
do_fmla_scalar_idx() and do_fmadd(), by making them call the
appropriate write_fp_*reg_merging() functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 19a4ae14c15..66c214ed278 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6356,7 +6356,7 @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
                 gen_vfp_negd(t1, t1);
             }
             gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_FPCR_A64));
-            write_fp_dreg(s, a->rd, t0);
+            write_fp_dreg_merging(s, a->rd, a->rd, t0);
         }
         break;
     case MO_32:
@@ -6370,7 +6370,7 @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
                 gen_vfp_negs(t1, t1);
             }
             gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_FPCR_A64));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_sreg_merging(s, a->rd, a->rd, t0);
         }
         break;
     case MO_16:
@@ -6388,7 +6388,7 @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
             }
             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
                                        fpstatus_ptr(FPST_FPCR_F16_A64));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_hreg_merging(s, a->rd, a->rd, t0);
         }
         break;
     default:
@@ -6867,7 +6867,7 @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             }
             fpst = fpstatus_ptr(FPST_FPCR_A64);
             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
-            write_fp_dreg(s, a->rd, ta);
+            write_fp_dreg_merging(s, a->rd, a->ra, ta);
         }
         break;
 
@@ -6885,7 +6885,7 @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             }
             fpst = fpstatus_ptr(FPST_FPCR_A64);
             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
-            write_fp_sreg(s, a->rd, ta);
+            write_fp_sreg_merging(s, a->rd, a->ra, ta);
         }
         break;
 
@@ -6906,7 +6906,7 @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             }
             fpst = fpstatus_ptr(FPST_FPCR_F16_A64);
             gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
-            write_fp_sreg(s, a->rd, ta);
+            write_fp_hreg_merging(s, a->rd, a->ra, ta);
         }
         break;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 39/76] target/arm: Handle FPCR.NEP for BFCVT scalar
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (37 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 38/76] target/arm: Handle FPCR.NEP for 3-input scalar operations Peter Maydell
@ 2025-01-24 16:27 ` Peter Maydell
  2025-01-25 17:55   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 40/76] target/arm: Handle FPCR.NEP for 1-input scalar operations Peter Maydell
                   ` (38 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:27 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Currently we implement BFCVT scalar via do_fp1_scalar().  This works
even though BFCVT is a narrowing operation from 32 to 16 bits,
because we can use write_fp_sreg() for float16. However, FPCR.NEP
support requires that we use write_fp_hreg_merging() for float16
outputs, so we can't continue to borrow the non-narrowing
do_fp1_scalar() function for this. Split out trans_BFCVT_s()
into its own implementation that honours FPCR.NEP.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 66c214ed278..944bdf8cafe 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8582,10 +8582,27 @@ static const FPScalar1 f_scalar_frintx = {
 };
 TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
 
-static const FPScalar1 f_scalar_bfcvt = {
-    .gen_s = gen_helper_bfcvt,
-};
-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
+static bool trans_BFCVT_s(DisasContext *s, arg_rr_e *a)
+{
+    ARMFPStatusFlavour fpsttype = s->fpcr_ah ? FPST_FPCR_AH : FPST_FPCR_A64;
+    TCGv_i32 t32;
+    int check;
+
+    if (!dc_isar_feature(aa64_bf16, s)) {
+        return false;
+    }
+
+    check = fp_access_check_scalar_hsd(s, a->esz);
+
+    if (check <= 0) {
+        return check == 0;
+    }
+
+    t32 = read_fp_sreg(s, a->rn);
+    gen_helper_bfcvt(t32, t32, fpstatus_ptr(fpsttype));
+    write_fp_hreg_merging(s, a->rd, a->rd, t32);
+    return true;
+}
 
 static const FPScalar1 f_scalar_frint32 = {
     NULL,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 40/76] target/arm: Handle FPCR.NEP for 1-input scalar operations
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (38 preceding siblings ...)
  2025-01-24 16:27 ` [PATCH 39/76] target/arm: Handle FPCR.NEP for BFCVT scalar Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 12:33   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 41/76] target/arm: Handle FPCR.NEP in do_cvtf_scalar() Peter Maydell
                   ` (37 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Handle FPCR.NEP for the 1-input scalar operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 944bdf8cafe..64994d3212f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8520,17 +8520,17 @@ static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
     case MO_64:
         t64 = read_fp_dreg(s, a->rn);
         f->gen_d(t64, t64, fpst);
-        write_fp_dreg(s, a->rd, t64);
+        write_fp_dreg_merging(s, a->rd, a->rd, t64);
         break;
     case MO_32:
         t32 = read_fp_sreg(s, a->rn);
         f->gen_s(t32, t32, fpst);
-        write_fp_sreg(s, a->rd, t32);
+        write_fp_sreg_merging(s, a->rd, a->rd, t32);
         break;
     case MO_16:
         t32 = read_fp_hreg(s, a->rn);
         f->gen_h(t32, t32, fpst);
-        write_fp_sreg(s, a->rd, t32);
+        write_fp_hreg_merging(s, a->rd, a->rd, t32);
         break;
     default:
         g_assert_not_reached();
@@ -8651,7 +8651,7 @@ static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_A64);
 
         gen_helper_vfp_fcvtds(tcg_rd, tcg_rn, fpst);
-        write_fp_dreg(s, a->rd, tcg_rd);
+        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -8664,8 +8664,8 @@ static bool trans_FCVT_s_hs(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_A64);
 
         gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-        /* write_fp_sreg is OK here because top half of result is zero */
-        write_fp_sreg(s, a->rd, tmp);
+        /* write_fp_hreg_merging is OK here because top half of result is zero */
+        write_fp_hreg_merging(s, a->rd, a->rd, tmp);
     }
     return true;
 }
@@ -8678,7 +8678,7 @@ static bool trans_FCVT_s_sd(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_A64);
 
         gen_helper_vfp_fcvtsd(tcg_rd, tcg_rn, fpst);
-        write_fp_sreg(s, a->rd, tcg_rd);
+        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -8692,8 +8692,8 @@ static bool trans_FCVT_s_hd(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_A64);
 
         gen_helper_vfp_fcvt_f64_to_f16(tcg_rd, tcg_rn, fpst, ahp);
-        /* write_fp_sreg is OK here because top half of tcg_rd is zero */
-        write_fp_sreg(s, a->rd, tcg_rd);
+        /* write_fp_hreg_merging is OK here because top half of tcg_rd is zero */
+        write_fp_hreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -8707,7 +8707,7 @@ static bool trans_FCVT_s_sh(DisasContext *s, arg_rr *a)
         TCGv_i32 tcg_ahp = get_ahp_flag();
 
         gen_helper_vfp_fcvt_f16_to_f32(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
-        write_fp_sreg(s, a->rd, tcg_rd);
+        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -8721,7 +8721,7 @@ static bool trans_FCVT_s_dh(DisasContext *s, arg_rr *a)
         TCGv_i32 tcg_ahp = get_ahp_flag();
 
         gen_helper_vfp_fcvt_f16_to_f64(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
-        write_fp_dreg(s, a->rd, tcg_rd);
+        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -8969,7 +8969,9 @@ static bool do_fcvt_f(DisasContext *s, arg_fcvt *a,
     do_fcvt_scalar(s, a->esz | (is_signed ? MO_SIGN : 0),
                    a->esz, tcg_int, a->shift, a->rn, rmode);
 
-    clear_vec(s, a->rd);
+    if (!s->fpcr_nep) {
+        clear_vec(s, a->rd);
+    }
     write_vec_element(s, tcg_int, a->rd, 0, a->esz);
     return true;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 41/76] target/arm: Handle FPCR.NEP in do_cvtf_scalar()
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (39 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 40/76] target/arm: Handle FPCR.NEP for 1-input scalar operations Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 12:33   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 42/76] target/arm: Handle FPCR.NEP for scalar FABS and FNEG Peter Maydell
                   ` (36 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Handle FPCR.NEP in the operations handled by do_cvtf_scalar().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 64994d3212f..6c20293961a 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8744,7 +8744,7 @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
         } else {
             gen_helper_vfp_uqtod(tcg_double, tcg_int, tcg_shift, tcg_fpstatus);
         }
-        write_fp_dreg(s, rd, tcg_double);
+        write_fp_dreg_merging(s, rd, rd, tcg_double);
         break;
 
     case MO_32:
@@ -8754,7 +8754,7 @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
         } else {
             gen_helper_vfp_uqtos(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
         }
-        write_fp_sreg(s, rd, tcg_single);
+        write_fp_sreg_merging(s, rd, rd, tcg_single);
         break;
 
     case MO_16:
@@ -8764,7 +8764,7 @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
         } else {
             gen_helper_vfp_uqtoh(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
         }
-        write_fp_sreg(s, rd, tcg_single);
+        write_fp_hreg_merging(s, rd, rd, tcg_single);
         break;
 
     default:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 42/76] target/arm: Handle FPCR.NEP for scalar FABS and FNEG
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (40 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 41/76] target/arm: Handle FPCR.NEP in do_cvtf_scalar() Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 12:34   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 43/76] target/arm: Handle FPCR.NEP for FCVTXN (scalar) Peter Maydell
                   ` (35 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Handle FPCR.NEP merging for scalar FABS and FNEG; this requires
an extra parameter to do_fp1_scalar_int(), since FMOV scalar
does not have the merging behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 6c20293961a..7412787b6b6 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8437,21 +8437,30 @@ typedef struct FPScalar1Int {
 } FPScalar1Int;
 
 static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
-                              const FPScalar1Int *f)
+                              const FPScalar1Int *f,
+                              bool merging)
 {
     switch (a->esz) {
     case MO_64:
         if (fp_access_check(s)) {
             TCGv_i64 t = read_fp_dreg(s, a->rn);
             f->gen_d(t, t);
-            write_fp_dreg(s, a->rd, t);
+            if (merging) {
+                write_fp_dreg_merging(s, a->rd, a->rd, t);
+            } else {
+                write_fp_dreg(s, a->rd, t);
+            }
         }
         break;
     case MO_32:
         if (fp_access_check(s)) {
             TCGv_i32 t = read_fp_sreg(s, a->rn);
             f->gen_s(t, t);
-            write_fp_sreg(s, a->rd, t);
+            if (merging) {
+                write_fp_sreg_merging(s, a->rd, a->rd, t);
+            } else {
+                write_fp_sreg(s, a->rd, t);
+            }
         }
         break;
     case MO_16:
@@ -8461,7 +8470,11 @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
         if (fp_access_check(s)) {
             TCGv_i32 t = read_fp_hreg(s, a->rn);
             f->gen_h(t, t);
-            write_fp_sreg(s, a->rd, t);
+            if (merging) {
+                write_fp_hreg_merging(s, a->rd, a->rd, t);
+            } else {
+                write_fp_sreg(s, a->rd, t);
+            }
         }
         break;
     default:
@@ -8475,21 +8488,21 @@ static const FPScalar1Int f_scalar_fmov = {
     tcg_gen_mov_i32,
     tcg_gen_mov_i64,
 };
-TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov)
+TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov, false)
 
 static const FPScalar1Int f_scalar_fabs = {
     gen_vfp_absh,
     gen_vfp_abss,
     gen_vfp_absd,
 };
-TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs)
+TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
 
 static const FPScalar1Int f_scalar_fneg = {
     gen_vfp_negh,
     gen_vfp_negs,
     gen_vfp_negd,
 };
-TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg)
+TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
 
 typedef struct FPScalar1 {
     void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 43/76] target/arm: Handle FPCR.NEP for FCVTXN (scalar)
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (41 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 42/76] target/arm: Handle FPCR.NEP for scalar FABS and FNEG Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 12:36   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 44/76] target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element Peter Maydell
                   ` (34 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Unlike the other users of do_2misc_narrow_scalar(), FCVTXN (scalar)
is always double-to-single and must honour FPCR.NEP.  Implement this
directly in a trans function rather than using
do_2misc_narrow_scalar().

We still need gen_fcvtxn_sd() and the f_scalar_fcvtxn[] array for
the FCVTXN (vector) insn, so we move those down in the file to
where they are used.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 43 ++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 7412787b6b6..6dc5648cb1b 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -9258,24 +9258,21 @@ static ArithOneOp * const f_scalar_uqxtn[] = {
 };
 TRANS(UQXTN_s, do_2misc_narrow_scalar, a, f_scalar_uqxtn)
 
-static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
+static bool trans_FCVTXN_s(DisasContext *s, arg_rr_e *a)
 {
-    /*
-     * 64 bit to 32 bit float conversion
-     * with von Neumann rounding (round to odd)
-     */
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_FPCR_A64));
-    tcg_gen_extu_i32_i64(d, tmp);
+    if (fp_access_check(s)) {
+        /*
+         * 64 bit to 32 bit float conversion
+         * with von Neumann rounding (round to odd)
+         */
+        TCGv_i64 src = read_fp_dreg(s, a->rn);
+        TCGv_i32 dst = tcg_temp_new_i32();
+        gen_helper_fcvtx_f64_to_f32(dst, src, fpstatus_ptr(FPST_FPCR_A64));
+        write_fp_sreg_merging(s, a->rd, a->rd, dst);
+    }
+    return true;
 }
 
-static ArithOneOp * const f_scalar_fcvtxn[] = {
-    NULL,
-    NULL,
-    gen_fcvtxn_sd,
-};
-TRANS(FCVTXN_s, do_2misc_narrow_scalar, a, f_scalar_fcvtxn)
-
 #undef WRAP_ENV
 
 static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn)
@@ -9377,11 +9374,27 @@ static void gen_fcvtn_sd(TCGv_i64 d, TCGv_i64 n)
     tcg_gen_extu_i32_i64(d, tmp);
 }
 
+static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
+{
+    /*
+     * 64 bit to 32 bit float conversion
+     * with von Neumann rounding (round to odd)
+     */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_FPCR_A64));
+    tcg_gen_extu_i32_i64(d, tmp);
+}
+
 static ArithOneOp * const f_vector_fcvtn[] = {
     NULL,
     gen_fcvtn_hs,
     gen_fcvtn_sd,
 };
+static ArithOneOp * const f_scalar_fcvtxn[] = {
+    NULL,
+    NULL,
+    gen_fcvtxn_sd,
+};
 TRANS(FCVTN_v, do_2misc_narrow_vector, a, f_vector_fcvtn)
 TRANS(FCVTXN_v, do_2misc_narrow_vector, a, f_scalar_fcvtxn)
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 44/76] target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (42 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 43/76] target/arm: Handle FPCR.NEP for FCVTXN (scalar) Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 12:36   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 45/76] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX Peter Maydell
                   ` (33 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

do_fp3_scalar_idx() is used only for the FMUL and FMULX scalar by
element instructions; these both need to merge the result with the Rn
register when FPCR.NEP is set.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 6dc5648cb1b..d3575ac1154 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6307,7 +6307,7 @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
 
             read_vec_element(s, t1, a->rm, a->idx, MO_64);
             f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_FPCR_A64));
-            write_fp_dreg(s, a->rd, t0);
+            write_fp_dreg_merging(s, a->rd, a->rn, t0);
         }
         break;
     case MO_32:
@@ -6317,7 +6317,7 @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
 
             read_vec_element_i32(s, t1, a->rm, a->idx, MO_32);
             f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_FPCR_A64));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_sreg_merging(s, a->rd, a->rn, t0);
         }
         break;
     case MO_16:
@@ -6330,7 +6330,7 @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
 
             read_vec_element_i32(s, t1, a->rm, a->idx, MO_16);
             f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_FPCR_F16_A64));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_hreg_merging(s, a->rd, a->rn, t0);
         }
         break;
     default:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 45/76] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (43 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 44/76] target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 12:43   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 46/76] target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX Peter Maydell
                   ` (32 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

When FPCR.AH == 1, floating point FMIN and FMAX have some odd special
cases:

 * comparing two zeroes (even of different sign) or comparing a NaN
   with anything always returns the second argument (possibly
   squashed to zero)
 * denormal outputs are not squashed to zero regardless of FZ or FZ16

Implement these semantics in new helper functions and select them at
translate time if FPCR.AH is 1 for the scalar FMAX and FMIN insns.
(We will convert the other FMAX and FMIN insns in subsequent
commits.)

Note that FMINNM and FMAXNM are not affected.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-a64.h    |  7 +++++++
 target/arm/tcg/helper-a64.c    | 36 ++++++++++++++++++++++++++++++++++
 target/arm/tcg/translate-a64.c | 23 ++++++++++++++++++++--
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index bac12fbe55b..ae0424f6de9 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -67,6 +67,13 @@ DEF_HELPER_4(advsimd_muladd2h, i32, i32, i32, i32, fpst)
 DEF_HELPER_2(advsimd_rinth_exact, f16, f16, fpst)
 DEF_HELPER_2(advsimd_rinth, f16, f16, fpst)
 
+DEF_HELPER_3(vfp_ah_minh, f16, f16, f16, fpst)
+DEF_HELPER_3(vfp_ah_mins, f32, f32, f32, fpst)
+DEF_HELPER_3(vfp_ah_mind, f64, f64, f64, fpst)
+DEF_HELPER_3(vfp_ah_maxh, f16, f16, f16, fpst)
+DEF_HELPER_3(vfp_ah_maxs, f32, f32, f32, fpst)
+DEF_HELPER_3(vfp_ah_maxd, f64, f64, f64, fpst)
+
 DEF_HELPER_2(exception_return, void, env, i64)
 DEF_HELPER_FLAGS_2(dc_zva, TCG_CALL_NO_WG, void, env, i64)
 
diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index 05036089dd7..406d76e1129 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -399,6 +399,42 @@ float32 HELPER(fcvtx_f64_to_f32)(float64 a, float_status *fpst)
     return r;
 }
 
+/*
+ * AH=1 min/max have some odd special cases:
+ * comparing two zeroes (even of different sign), (NaN, anything),
+ * or (anything, NaN) should return the second argument (possibly
+ * squashed to zero).
+ * Also, denormal outputs are not squashed to zero regardless of FZ or FZ16.
+ */
+#define AH_MINMAX_HELPER(NAME, CTYPE, FLOATTYPE, MINMAX)                \
+    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
+    {                                                                   \
+        bool save;                                                      \
+        CTYPE r;                                                        \
+        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
+        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
+        if (FLOATTYPE ## _is_zero(a) && FLOATTYPE ## _is_zero(b)) {     \
+            return b;                                                   \
+        }                                                               \
+        if (FLOATTYPE ## _is_any_nan(a) ||                              \
+            FLOATTYPE ## _is_any_nan(b)) {                              \
+            float_raise(float_flag_invalid, fpst);                      \
+            return b;                                                   \
+        }                                                               \
+        save = get_flush_to_zero(fpst);                                 \
+        set_flush_to_zero(false, fpst);                                 \
+        r = FLOATTYPE ## _ ## MINMAX(a, b, fpst);                       \
+        set_flush_to_zero(save, fpst);                                  \
+        return r;                                                       \
+    }
+
+AH_MINMAX_HELPER(vfp_ah_minh, dh_ctype_f16, float16, min)
+AH_MINMAX_HELPER(vfp_ah_mins, float32, float32, min)
+AH_MINMAX_HELPER(vfp_ah_mind, float64, float64, min)
+AH_MINMAX_HELPER(vfp_ah_maxh, dh_ctype_f16, float16, max)
+AH_MINMAX_HELPER(vfp_ah_maxs, float32, float32, max)
+AH_MINMAX_HELPER(vfp_ah_maxd, float64, float64, max)
+
 /* 64-bit versions of the CRC helpers. Note that although the operation
  * (and the prototypes of crc32c() and crc32() mean that only the bottom
  * 32 bits of the accumulator and result are used, we pass and return
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index d3575ac1154..a6f24ad9746 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5152,6 +5152,15 @@ static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
                                        select_fpst(s, a->esz));
 }
 
+/* Some insns need to call different helpers when FPCR.AH == 1 */
+static bool do_fp3_scalar_2fn(DisasContext *s, arg_rrr_e *a,
+                              const FPScalar *fnormal,
+                              const FPScalar *fah,
+                              int mergereg)
+{
+    return do_fp3_scalar(s, a, s->fpcr_ah ? fah : fnormal, mergereg);
+}
+
 static const FPScalar f_scalar_fadd = {
     gen_helper_vfp_addh,
     gen_helper_vfp_adds,
@@ -5185,14 +5194,24 @@ static const FPScalar f_scalar_fmax = {
     gen_helper_vfp_maxs,
     gen_helper_vfp_maxd,
 };
-TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
+static const FPScalar f_scalar_fmax_ah = {
+    gen_helper_vfp_ah_maxh,
+    gen_helper_vfp_ah_maxs,
+    gen_helper_vfp_ah_maxd,
+};
+TRANS(FMAX_s, do_fp3_scalar_2fn, a, &f_scalar_fmax, &f_scalar_fmax_ah, a->rn)
 
 static const FPScalar f_scalar_fmin = {
     gen_helper_vfp_minh,
     gen_helper_vfp_mins,
     gen_helper_vfp_mind,
 };
-TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
+static const FPScalar f_scalar_fmin_ah = {
+    gen_helper_vfp_ah_minh,
+    gen_helper_vfp_ah_mins,
+    gen_helper_vfp_ah_mind,
+};
+TRANS(FMIN_s, do_fp3_scalar_2fn, a, &f_scalar_fmin, &f_scalar_fmin_ah, a->rn)
 
 static const FPScalar f_scalar_fmaxnm = {
     gen_helper_vfp_maxnumh,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 46/76] target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (44 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 45/76] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 12:45   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 47/76] target/arm: Implement FPCR.AH semantics for FMAXV and FMINV Peter Maydell
                   ` (31 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Implement the FPCR.AH == 1 semantics for vector FMIN/FMAX, by
creating new _ah_ versions of the gvec helpers which invoke the
scalar fmin_ah and fmax_ah helpers on each element.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 21 +++++++++++++++++++--
 target/arm/tcg/vec_helper.c    |  8 ++++++++
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index fea43b319c3..f1b4606f763 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -972,6 +972,20 @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_5(gvec_ah_fmax_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_fmax_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_fmax_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_5(gvec_ah_fmin_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_fmin_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_fmin_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_4(sve_faddv_h, TCG_CALL_NO_RWG,
                    i64, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(sve_faddv_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index a6f24ad9746..330336f0828 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5607,6 +5607,13 @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
                                        FPST_FPCR_F16_A64 :FPST_FPCR_A64);
 }
 
+static bool do_fp3_vector_2fn(DisasContext *s, arg_qrrr_e *a, int data,
+                              gen_helper_gvec_3_ptr * const fnormal[3],
+                              gen_helper_gvec_3_ptr * const fah[3])
+{
+    return do_fp3_vector(s, a, data, s->fpcr_ah ? fah : fnormal);
+}
+
 static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
                              gen_helper_gvec_3_ptr * const f[3])
 {
@@ -5647,14 +5654,24 @@ static gen_helper_gvec_3_ptr * const f_vector_fmax[3] = {
     gen_helper_gvec_fmax_s,
     gen_helper_gvec_fmax_d,
 };
-TRANS(FMAX_v, do_fp3_vector, a, 0, f_vector_fmax)
+static gen_helper_gvec_3_ptr * const f_vector_fmax_ah[3] = {
+    gen_helper_gvec_ah_fmax_h,
+    gen_helper_gvec_ah_fmax_s,
+    gen_helper_gvec_ah_fmax_d,
+};
+TRANS(FMAX_v, do_fp3_vector_2fn, a, 0, f_vector_fmax, f_vector_fmax_ah)
 
 static gen_helper_gvec_3_ptr * const f_vector_fmin[3] = {
     gen_helper_gvec_fmin_h,
     gen_helper_gvec_fmin_s,
     gen_helper_gvec_fmin_d,
 };
-TRANS(FMIN_v, do_fp3_vector, a, 0, f_vector_fmin)
+static gen_helper_gvec_3_ptr * const f_vector_fmin_ah[3] = {
+    gen_helper_gvec_ah_fmin_h,
+    gen_helper_gvec_ah_fmin_s,
+    gen_helper_gvec_ah_fmin_d,
+};
+TRANS(FMIN_v, do_fp3_vector_2fn, a, 0, f_vector_fmin, f_vector_fmin_ah)
 
 static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[3] = {
     gen_helper_gvec_fmaxnum_h,
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 3fbca8bc8bf..c7af9a04a27 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -1448,6 +1448,14 @@ DO_3OP(gvec_rsqrts_h, helper_rsqrtsf_f16, float16)
 DO_3OP(gvec_rsqrts_s, helper_rsqrtsf_f32, float32)
 DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
 
+DO_3OP(gvec_ah_fmax_h, helper_vfp_ah_maxh, float16)
+DO_3OP(gvec_ah_fmax_s, helper_vfp_ah_maxs, float32)
+DO_3OP(gvec_ah_fmax_d, helper_vfp_ah_maxd, float64)
+
+DO_3OP(gvec_ah_fmin_h, helper_vfp_ah_minh, float16)
+DO_3OP(gvec_ah_fmin_s, helper_vfp_ah_mins, float32)
+DO_3OP(gvec_ah_fmin_d, helper_vfp_ah_mind, float64)
+
 #endif
 #undef DO_3OP
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 47/76] target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (45 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 46/76] target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 12:47   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 48/76] target/arm: Implement FPCR.AH semantics for FMINP and FMAXP Peter Maydell
                   ` (30 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Implement the FPCR.AH semantics for FMAXV and FMINV.  These are the
"recursively reduce all lanes of a vector to a scalar result" insns;
we just need to use the _ah_ helper for the reduction step when
FPCR.AH == 1.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 330336f0828..c07e22bad31 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -7029,27 +7029,35 @@ static TCGv_i32 do_reduction_op(DisasContext *s, int rn, MemOp esz,
 }
 
 static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a,
-                              NeonGenTwoSingleOpFn *fn)
+                            NeonGenTwoSingleOpFn *fnormal,
+                            NeonGenTwoSingleOpFn *fah)
 {
     if (fp_access_check(s)) {
         MemOp esz = a->esz;
         int elts = (a->q ? 16 : 8) >> esz;
         TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
-        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst, fn);
+        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst,
+                                       s->fpcr_ah ? fah : fnormal);
         write_fp_sreg(s, a->rd, res);
     }
     return true;
 }
 
-TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxnumh)
-TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minnumh)
-TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxh)
-TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minh)
+TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_maxnumh, gen_helper_vfp_maxnumh)
+TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_minnumh, gen_helper_vfp_minnumh)
+TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_maxh, gen_helper_vfp_ah_maxh)
+TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_minh, gen_helper_vfp_ah_minh)
 
-TRANS(FMAXNMV_s, do_fp_reduction, a, gen_helper_vfp_maxnums)
-TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums)
-TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs)
-TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins)
+TRANS(FMAXNMV_s, do_fp_reduction, a,
+      gen_helper_vfp_maxnums, gen_helper_vfp_maxnums)
+TRANS(FMINNMV_s, do_fp_reduction, a,
+      gen_helper_vfp_minnums, gen_helper_vfp_minnums)
+TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs, gen_helper_vfp_ah_maxs)
+TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins, gen_helper_vfp_ah_mins)
 
 /*
  * Floating-point Immediate
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 48/76] target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (46 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 47/76] target/arm: Implement FPCR.AH semantics for FMAXV and FMINV Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 12:49   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 49/76] target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV Peter Maydell
                   ` (29 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Implement the FPCR.AH semantics for the pairwise floating
point minimum/maximum insns FMINP and FMAXP.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
 target/arm/tcg/vec_helper.c    | 10 ++++++++++
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index f1b4606f763..8349752e99b 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -986,6 +986,20 @@ DEF_HELPER_FLAGS_5(gvec_ah_fmin_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_ah_fmin_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_5(gvec_ah_fmaxp_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_fmaxp_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_fmaxp_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_5(gvec_ah_fminp_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_fminp_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_fminp_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_4(sve_faddv_h, TCG_CALL_NO_RWG,
                    i64, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(sve_faddv_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c07e22bad31..9d164b80c22 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5776,14 +5776,24 @@ static gen_helper_gvec_3_ptr * const f_vector_fmaxp[3] = {
     gen_helper_gvec_fmaxp_s,
     gen_helper_gvec_fmaxp_d,
 };
-TRANS(FMAXP_v, do_fp3_vector, a, 0, f_vector_fmaxp)
+static gen_helper_gvec_3_ptr * const f_vector_ah_fmaxp[3] = {
+    gen_helper_gvec_ah_fmaxp_h,
+    gen_helper_gvec_ah_fmaxp_s,
+    gen_helper_gvec_ah_fmaxp_d,
+};
+TRANS(FMAXP_v, do_fp3_vector_2fn, a, 0, f_vector_fmaxp, f_vector_ah_fmaxp)
 
 static gen_helper_gvec_3_ptr * const f_vector_fminp[3] = {
     gen_helper_gvec_fminp_h,
     gen_helper_gvec_fminp_s,
     gen_helper_gvec_fminp_d,
 };
-TRANS(FMINP_v, do_fp3_vector, a, 0, f_vector_fminp)
+static gen_helper_gvec_3_ptr * const f_vector_ah_fminp[3] = {
+    gen_helper_gvec_ah_fminp_h,
+    gen_helper_gvec_ah_fminp_s,
+    gen_helper_gvec_ah_fminp_d,
+};
+TRANS(FMINP_v, do_fp3_vector_2fn, a, 0, f_vector_fminp, f_vector_ah_fminp)
 
 static gen_helper_gvec_3_ptr * const f_vector_fmaxnmp[3] = {
     gen_helper_gvec_fmaxnump_h,
@@ -6775,9 +6785,16 @@ static bool do_fp3_scalar_pair(DisasContext *s, arg_rr_e *a, const FPScalar *f)
     return true;
 }
 
+static bool do_fp3_scalar_pair_2fn(DisasContext *s, arg_rr_e *a,
+                                   const FPScalar *fnormal,
+                                   const FPScalar *fah)
+{
+    return do_fp3_scalar_pair(s, a, s->fpcr_ah ? fah : fnormal);
+}
+
 TRANS(FADDP_s, do_fp3_scalar_pair, a, &f_scalar_fadd)
-TRANS(FMAXP_s, do_fp3_scalar_pair, a, &f_scalar_fmax)
-TRANS(FMINP_s, do_fp3_scalar_pair, a, &f_scalar_fmin)
+TRANS(FMAXP_s, do_fp3_scalar_pair_2fn, a, &f_scalar_fmax, &f_scalar_fmax_ah)
+TRANS(FMINP_s, do_fp3_scalar_pair_2fn, a, &f_scalar_fmin, &f_scalar_fmin_ah)
 TRANS(FMAXNMP_s, do_fp3_scalar_pair, a, &f_scalar_fmaxnm)
 TRANS(FMINNMP_s, do_fp3_scalar_pair, a, &f_scalar_fminnm)
 
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index c7af9a04a27..d3f2eaa807e 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2444,6 +2444,16 @@ DO_3OP_PAIR(gvec_fminnump_h, float16_minnum, float16, H2)
 DO_3OP_PAIR(gvec_fminnump_s, float32_minnum, float32, H4)
 DO_3OP_PAIR(gvec_fminnump_d, float64_minnum, float64, )
 
+#ifdef TARGET_AARCH64
+DO_3OP_PAIR(gvec_ah_fmaxp_h, helper_vfp_ah_maxh, float16, H2)
+DO_3OP_PAIR(gvec_ah_fmaxp_s, helper_vfp_ah_maxs, float32, H4)
+DO_3OP_PAIR(gvec_ah_fmaxp_d, helper_vfp_ah_maxd, float64, )
+
+DO_3OP_PAIR(gvec_ah_fminp_h, helper_vfp_ah_minh, float16, H2)
+DO_3OP_PAIR(gvec_ah_fminp_s, helper_vfp_ah_mins, float32, H4)
+DO_3OP_PAIR(gvec_ah_fminp_d, helper_vfp_ah_mind, float64, )
+#endif
+
 #undef DO_3OP_PAIR
 
 #define DO_3OP_PAIR(NAME, FUNC, TYPE, H) \
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 49/76] target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (47 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 48/76] target/arm: Implement FPCR.AH semantics for FMINP and FMAXP Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 12:51   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 50/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate Peter Maydell
                   ` (28 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Implement the FPCR.AH semantics for the SVE FMAXV and FMINV
vector-reduction-to-scalar max/min operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 +++++++++++
 target/arm/tcg/sve_helper.c    | 43 +++++++++++++++++++++-------------
 target/arm/tcg/translate-sve.c | 16 +++++++++++--
 3 files changed, 55 insertions(+), 18 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index 8349752e99b..7ca95b8fa94 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -1035,6 +1035,20 @@ DEF_HELPER_FLAGS_4(sve_fminv_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_4(sve_fminv_d, TCG_CALL_NO_RWG,
                    i64, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_4(sve_ah_fmaxv_h, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_4(sve_ah_fmaxv_s, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_4(sve_ah_fmaxv_d, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_4(sve_ah_fminv_h, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_4(sve_ah_fminv_s, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_4(sve_ah_fminv_d, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_5(sve_fadda_h, TCG_CALL_NO_RWG,
                    i64, i64, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index 9837c5bc7ac..3631d85f23a 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -4190,7 +4190,7 @@ static TYPE NAME##_reduce(TYPE *data, float_status *status, uintptr_t n) \
         uintptr_t half = n / 2;                                       \
         TYPE lo = NAME##_reduce(data, status, half);                  \
         TYPE hi = NAME##_reduce(data + half, status, half);           \
-        return TYPE##_##FUNC(lo, hi, status);                         \
+        return FUNC(lo, hi, status);                                  \
     }                                                                 \
 }                                                                     \
 uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \
@@ -4211,26 +4211,37 @@ uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \
     return NAME##_reduce(data, s, maxsz / sizeof(TYPE));              \
 }
 
-DO_REDUCE(sve_faddv_h, float16, H1_2, add, float16_zero)
-DO_REDUCE(sve_faddv_s, float32, H1_4, add, float32_zero)
-DO_REDUCE(sve_faddv_d, float64, H1_8, add, float64_zero)
+DO_REDUCE(sve_faddv_h, float16, H1_2, float16_add, float16_zero)
+DO_REDUCE(sve_faddv_s, float32, H1_4, float32_add, float32_zero)
+DO_REDUCE(sve_faddv_d, float64, H1_8, float64_add, float64_zero)
 
 /* Identity is floatN_default_nan, without the function call.  */
-DO_REDUCE(sve_fminnmv_h, float16, H1_2, minnum, 0x7E00)
-DO_REDUCE(sve_fminnmv_s, float32, H1_4, minnum, 0x7FC00000)
-DO_REDUCE(sve_fminnmv_d, float64, H1_8, minnum, 0x7FF8000000000000ULL)
+DO_REDUCE(sve_fminnmv_h, float16, H1_2, float16_minnum, 0x7E00)
+DO_REDUCE(sve_fminnmv_s, float32, H1_4, float32_minnum, 0x7FC00000)
+DO_REDUCE(sve_fminnmv_d, float64, H1_8, float64_minnum, 0x7FF8000000000000ULL)
 
-DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, maxnum, 0x7E00)
-DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, maxnum, 0x7FC00000)
-DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, maxnum, 0x7FF8000000000000ULL)
+DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, float16_maxnum, 0x7E00)
+DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, float32_maxnum, 0x7FC00000)
+DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, float64_maxnum, 0x7FF8000000000000ULL)
 
-DO_REDUCE(sve_fminv_h, float16, H1_2, min, float16_infinity)
-DO_REDUCE(sve_fminv_s, float32, H1_4, min, float32_infinity)
-DO_REDUCE(sve_fminv_d, float64, H1_8, min, float64_infinity)
+DO_REDUCE(sve_fminv_h, float16, H1_2, float16_min, float16_infinity)
+DO_REDUCE(sve_fminv_s, float32, H1_4, float32_min, float32_infinity)
+DO_REDUCE(sve_fminv_d, float64, H1_8, float64_min, float64_infinity)
 
-DO_REDUCE(sve_fmaxv_h, float16, H1_2, max, float16_chs(float16_infinity))
-DO_REDUCE(sve_fmaxv_s, float32, H1_4, max, float32_chs(float32_infinity))
-DO_REDUCE(sve_fmaxv_d, float64, H1_8, max, float64_chs(float64_infinity))
+DO_REDUCE(sve_fmaxv_h, float16, H1_2, float16_max, float16_chs(float16_infinity))
+DO_REDUCE(sve_fmaxv_s, float32, H1_4, float32_max, float32_chs(float32_infinity))
+DO_REDUCE(sve_fmaxv_d, float64, H1_8, float64_max, float64_chs(float64_infinity))
+
+DO_REDUCE(sve_ah_fminv_h, float16, H1_2, helper_vfp_ah_minh, float16_infinity)
+DO_REDUCE(sve_ah_fminv_s, float32, H1_4, helper_vfp_ah_mins, float32_infinity)
+DO_REDUCE(sve_ah_fminv_d, float64, H1_8, helper_vfp_ah_mind, float64_infinity)
+
+DO_REDUCE(sve_ah_fmaxv_h, float16, H1_2, helper_vfp_ah_maxh,
+          float16_chs(float16_infinity))
+DO_REDUCE(sve_ah_fmaxv_s, float32, H1_4, helper_vfp_ah_maxs,
+          float32_chs(float32_infinity))
+DO_REDUCE(sve_ah_fmaxv_d, float64, H1_8, helper_vfp_ah_maxd,
+          float64_chs(float64_infinity))
 
 #undef DO_REDUCE
 
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index ad415c43565..effa23cefd7 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3588,11 +3588,23 @@ static bool do_reduce(DisasContext *s, arg_rpr_esz *a,
     };                                                                   \
     TRANS_FEAT(NAME, aa64_sve, do_reduce, a, name##_fns[a->esz])
 
+#define DO_VPZ_AH(NAME, name)                                            \
+    static gen_helper_fp_reduce * const name##_fns[4] = {                \
+        NULL,                      gen_helper_sve_##name##_h,            \
+        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d,            \
+    };                                                                   \
+    static gen_helper_fp_reduce * const name##_ah_fns[4] = {             \
+        NULL,                      gen_helper_sve_ah_##name##_h,         \
+        gen_helper_sve_ah_##name##_s, gen_helper_sve_ah_##name##_d,      \
+    };                                                                   \
+    TRANS_FEAT(NAME, aa64_sve, do_reduce, a,                             \
+               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz])
+
 DO_VPZ(FADDV, faddv)
 DO_VPZ(FMINNMV, fminnmv)
 DO_VPZ(FMAXNMV, fmaxnmv)
-DO_VPZ(FMINV, fminv)
-DO_VPZ(FMAXV, fmaxv)
+DO_VPZ_AH(FMINV, fminv)
+DO_VPZ_AH(FMAXV, fmaxv)
 
 #undef DO_VPZ
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 50/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (48 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 49/76] target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 12:54   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 51/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector Peter Maydell
                   ` (27 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Implement the FPCR.AH semantics for the SVE FMAX and FMIN operations
that take an immediate as the second operand.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/sve_helper.c    |  8 ++++++++
 target/arm/tcg/translate-sve.c | 25 +++++++++++++++++++++++--
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index 7ca95b8fa94..3c1d2624ed4 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -1231,6 +1231,20 @@ DEF_HELPER_FLAGS_6(sve_fmins_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fmins_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, i64, fpst, i32)
 
+DEF_HELPER_FLAGS_6(sve_ah_fmaxs_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, fpst, i32)
+DEF_HELPER_FLAGS_6(sve_ah_fmaxs_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, fpst, i32)
+DEF_HELPER_FLAGS_6(sve_ah_fmaxs_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, fpst, i32)
+
+DEF_HELPER_FLAGS_6(sve_ah_fmins_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, fpst, i32)
+DEF_HELPER_FLAGS_6(sve_ah_fmins_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, fpst, i32)
+DEF_HELPER_FLAGS_6(sve_ah_fmins_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, fpst, i32)
+
 DEF_HELPER_FLAGS_5(sve_fcvt_sh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(sve_fcvt_dh, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index 3631d85f23a..2f6fc82ee4f 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -4459,6 +4459,14 @@ DO_ZPZS_FP(sve_fmins_h, float16, H1_2, float16_min)
 DO_ZPZS_FP(sve_fmins_s, float32, H1_4, float32_min)
 DO_ZPZS_FP(sve_fmins_d, float64, H1_8, float64_min)
 
+DO_ZPZS_FP(sve_ah_fmaxs_h, float16, H1_2, helper_vfp_ah_maxh)
+DO_ZPZS_FP(sve_ah_fmaxs_s, float32, H1_4, helper_vfp_ah_maxs)
+DO_ZPZS_FP(sve_ah_fmaxs_d, float64, H1_8, helper_vfp_ah_maxd)
+
+DO_ZPZS_FP(sve_ah_fmins_h, float16, H1_2, helper_vfp_ah_minh)
+DO_ZPZS_FP(sve_ah_fmins_s, float32, H1_4, helper_vfp_ah_mins)
+DO_ZPZS_FP(sve_ah_fmins_d, float64, H1_8, helper_vfp_ah_mind)
+
 /* Fully general two-operand expander, controlled by a predicate,
  * With the extra float_status parameter.
  */
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index effa23cefd7..214aec7f83b 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3821,14 +3821,35 @@ static bool do_fp_imm(DisasContext *s, arg_rpri_esz *a, uint64_t imm,
     TRANS_FEAT(NAME##_zpzi, aa64_sve, do_fp_imm, a,                     \
                name##_const[a->esz][a->imm], name##_fns[a->esz])
 
+#define DO_FP_AH_IMM(NAME, name, const0, const1)                        \
+    static gen_helper_sve_fp2scalar * const name##_fns[4] = {           \
+        NULL, gen_helper_sve_##name##_h,                                \
+        gen_helper_sve_##name##_s,                                      \
+        gen_helper_sve_##name##_d                                       \
+    };                                                                  \
+    static gen_helper_sve_fp2scalar * const name##_ah_fns[4] = {        \
+        NULL, gen_helper_sve_ah_##name##_h,                             \
+        gen_helper_sve_ah_##name##_s,                                   \
+        gen_helper_sve_ah_##name##_d                                    \
+    };                                                                  \
+    static uint64_t const name##_const[4][2] = {                        \
+        { -1, -1 },                                                     \
+        { float16_##const0, float16_##const1 },                         \
+        { float32_##const0, float32_##const1 },                         \
+        { float64_##const0, float64_##const1 },                         \
+    };                                                                  \
+    TRANS_FEAT(NAME##_zpzi, aa64_sve, do_fp_imm, a,                     \
+               name##_const[a->esz][a->imm],                            \
+               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz])
+
 DO_FP_IMM(FADD, fadds, half, one)
 DO_FP_IMM(FSUB, fsubs, half, one)
 DO_FP_IMM(FMUL, fmuls, half, two)
 DO_FP_IMM(FSUBR, fsubrs, half, one)
 DO_FP_IMM(FMAXNM, fmaxnms, zero, one)
 DO_FP_IMM(FMINNM, fminnms, zero, one)
-DO_FP_IMM(FMAX, fmaxs, zero, one)
-DO_FP_IMM(FMIN, fmins, zero, one)
+DO_FP_AH_IMM(FMAX, fmaxs, zero, one)
+DO_FP_AH_IMM(FMIN, fmins, zero, one)
 
 #undef DO_FP_IMM
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 51/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (49 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 50/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 12:55   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 52/76] target/arm: Implement FPCR.AH handling of negation of NaN Peter Maydell
                   ` (26 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Implement the FPCR.AH semantics for the SVE FMAX and FMIN
operations that take two vector operands.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/sve_helper.c    |  8 ++++++++
 target/arm/tcg/translate-sve.c | 17 +++++++++++++++--
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index 3c1d2624ed4..918f2e61b7e 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -1140,6 +1140,20 @@ DEF_HELPER_FLAGS_6(sve_fmax_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fmax_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(sve_ah_fmin_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(sve_ah_fmin_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(sve_ah_fmin_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_6(sve_ah_fmax_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(sve_ah_fmax_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(sve_ah_fmax_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_6(sve_fminnum_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fminnum_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index 2f6fc82ee4f..a688b98d284 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -4347,6 +4347,14 @@ DO_ZPZZ_FP(sve_fmax_h, uint16_t, H1_2, float16_max)
 DO_ZPZZ_FP(sve_fmax_s, uint32_t, H1_4, float32_max)
 DO_ZPZZ_FP(sve_fmax_d, uint64_t, H1_8, float64_max)
 
+DO_ZPZZ_FP(sve_ah_fmin_h, uint16_t, H1_2, helper_vfp_ah_minh)
+DO_ZPZZ_FP(sve_ah_fmin_s, uint32_t, H1_4, helper_vfp_ah_mins)
+DO_ZPZZ_FP(sve_ah_fmin_d, uint64_t, H1_8, helper_vfp_ah_mind)
+
+DO_ZPZZ_FP(sve_ah_fmax_h, uint16_t, H1_2, helper_vfp_ah_maxh)
+DO_ZPZZ_FP(sve_ah_fmax_s, uint32_t, H1_4, helper_vfp_ah_maxs)
+DO_ZPZZ_FP(sve_ah_fmax_d, uint64_t, H1_8, helper_vfp_ah_maxd)
+
 DO_ZPZZ_FP(sve_fminnum_h, uint16_t, H1_2, float16_minnum)
 DO_ZPZZ_FP(sve_fminnum_s, uint32_t, H1_4, float32_minnum)
 DO_ZPZZ_FP(sve_fminnum_d, uint64_t, H1_8, float64_minnum)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 214aec7f83b..0fed92fa48a 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3759,11 +3759,24 @@ TRANS_FEAT_NONSTREAMING(FTSMUL, aa64_sve, gen_gvec_fpst_arg_zzz,
     };                                                          \
     TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz, name##_zpzz_fns[a->esz], a)
 
+#define DO_ZPZZ_AH_FP(NAME, FEAT, name, ah_name)                        \
+    static gen_helper_gvec_4_ptr * const name##_zpzz_fns[4] = {         \
+        NULL,                  gen_helper_##name##_h,                   \
+        gen_helper_##name##_s, gen_helper_##name##_d                    \
+    };                                                                  \
+    static gen_helper_gvec_4_ptr * const name##_ah_zpzz_fns[4] = {      \
+        NULL,                  gen_helper_##ah_name##_h,                \
+        gen_helper_##ah_name##_s, gen_helper_##ah_name##_d              \
+    };                                                                  \
+    TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz,                      \
+               s->fpcr_ah ? name##_ah_zpzz_fns[a->esz] :                \
+               name##_zpzz_fns[a->esz], a)
+
 DO_ZPZZ_FP(FADD_zpzz, aa64_sve, sve_fadd)
 DO_ZPZZ_FP(FSUB_zpzz, aa64_sve, sve_fsub)
 DO_ZPZZ_FP(FMUL_zpzz, aa64_sve, sve_fmul)
-DO_ZPZZ_FP(FMIN_zpzz, aa64_sve, sve_fmin)
-DO_ZPZZ_FP(FMAX_zpzz, aa64_sve, sve_fmax)
+DO_ZPZZ_AH_FP(FMIN_zpzz, aa64_sve, sve_fmin, sve_ah_fmin)
+DO_ZPZZ_AH_FP(FMAX_zpzz, aa64_sve, sve_fmax, sve_ah_fmax)
 DO_ZPZZ_FP(FMINNM_zpzz, aa64_sve, sve_fminnum)
 DO_ZPZZ_FP(FMAXNM_zpzz, aa64_sve, sve_fmaxnum)
 DO_ZPZZ_FP(FABD, aa64_sve, sve_fabd)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 52/76] target/arm: Implement FPCR.AH handling of negation of NaN
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (50 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 51/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:00   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 53/76] target/arm: Implement FPCR.AH handling for scalar FABS and FABD Peter Maydell
                   ` (25 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

FPCR.AH == 1 mandates that negation of a NaN value should not flip
its sign bit.  This means we can no longer use gen_vfp_neg*()
everywhere but must instead generate slightly more complex code when
FPCR.AH is set.

Make this change for the scalar FNEG and for those places in
translate-a64.c which were previously directly calling
gen_vfp_neg*().

This change in semantics also affects any other instruction whose
pseudocode calls FPNeg(); in following commits we extend this
change to the other affected instructions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 125 ++++++++++++++++++++++++++++++---
 1 file changed, 114 insertions(+), 11 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 9d164b80c22..085b29ee536 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -839,6 +839,74 @@ static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
                        is_q ? 16 : 8, vec_full_reg_size(s), data, fn);
 }
 
+/*
+ * When FPCR.AH == 1, NEG and ABS do not flip the sign bit of a NaN.
+ * These functions implement
+ *   d = floatN_is_any_nan(s) ? s : floatN_chs(s)
+ * which for float32 is
+ *   d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s ^ (1 << 31))
+ * and similarly for the other float sizes.
+ */
+static void gen_vfp_ah_negh(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
+
+    gen_vfp_negh(chs_s, s);
+    gen_vfp_absh(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7c00),
+                        s, chs_s);
+}
+
+static void gen_vfp_ah_negs(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
+
+    gen_vfp_negs(chs_s, s);
+    gen_vfp_abss(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7f800000UL),
+                        s, chs_s);
+}
+
+static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
+{
+    TCGv_i64 abs_s = tcg_temp_new_i64(), chs_s = tcg_temp_new_i64();
+
+    gen_vfp_negd(chs_s, s);
+    gen_vfp_absd(abs_s, s);
+    tcg_gen_movcond_i64(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
+                        s, chs_s);
+}
+
+static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
+{
+    if (dc->fpcr_ah) {
+        gen_vfp_ah_negh(d, s);
+    } else {
+        gen_vfp_negh(d, s);
+    }
+}
+
+static void gen_vfp_maybe_ah_negs(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
+{
+    if (dc->fpcr_ah) {
+        gen_vfp_ah_negs(d, s);
+    } else {
+        gen_vfp_negs(d, s);
+    }
+}
+
+static void gen_vfp_maybe_ah_negd(DisasContext *dc, TCGv_i64 d, TCGv_i64 s)
+{
+    if (dc->fpcr_ah) {
+        gen_vfp_ah_negd(d, s);
+    } else {
+        gen_vfp_negd(d, s);
+    }
+}
+
 /* Set ZF and NF based on a 64 bit result. This is alas fiddlier
  * than the 32 bit equivalent.
  */
@@ -5252,12 +5320,35 @@ static void gen_fnmul_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
     gen_vfp_negd(d, d);
 }
 
+static void gen_fnmul_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_mulh(d, n, m, s);
+    gen_vfp_ah_negh(d, d);
+}
+
+static void gen_fnmul_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_muls(d, n, m, s);
+    gen_vfp_ah_negs(d, d);
+}
+
+static void gen_fnmul_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
+{
+    gen_helper_vfp_muld(d, n, m, s);
+    gen_vfp_ah_negd(d, d);
+}
+
 static const FPScalar f_scalar_fnmul = {
     gen_fnmul_h,
     gen_fnmul_s,
     gen_fnmul_d,
 };
-TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
+static const FPScalar f_scalar_ah_fnmul = {
+    gen_fnmul_ah_h,
+    gen_fnmul_ah_s,
+    gen_fnmul_ah_d,
+};
+TRANS(FNMUL_s, do_fp3_scalar_2fn, a, &f_scalar_fnmul, &f_scalar_ah_fnmul, a->rn)
 
 static const FPScalar f_scalar_fcmeq = {
     gen_helper_advsimd_ceq_f16,
@@ -6399,7 +6490,7 @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
 
             read_vec_element(s, t2, a->rm, a->idx, MO_64);
             if (neg) {
-                gen_vfp_negd(t1, t1);
+                gen_vfp_maybe_ah_negd(s, t1, t1);
             }
             gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_FPCR_A64));
             write_fp_dreg_merging(s, a->rd, a->rd, t0);
@@ -6413,7 +6504,7 @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
 
             read_vec_element_i32(s, t2, a->rm, a->idx, MO_32);
             if (neg) {
-                gen_vfp_negs(t1, t1);
+                gen_vfp_maybe_ah_negs(s, t1, t1);
             }
             gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_FPCR_A64));
             write_fp_sreg_merging(s, a->rd, a->rd, t0);
@@ -6430,7 +6521,7 @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
 
             read_vec_element_i32(s, t2, a->rm, a->idx, MO_16);
             if (neg) {
-                gen_vfp_negh(t1, t1);
+                gen_vfp_maybe_ah_negh(s, t1, t1);
             }
             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
                                        fpstatus_ptr(FPST_FPCR_F16_A64));
@@ -6913,10 +7004,10 @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             TCGv_i64 ta = read_fp_dreg(s, a->ra);
 
             if (neg_a) {
-                gen_vfp_negd(ta, ta);
+                gen_vfp_maybe_ah_negd(s, ta, ta);
             }
             if (neg_n) {
-                gen_vfp_negd(tn, tn);
+                gen_vfp_maybe_ah_negd(s, tn, tn);
             }
             fpst = fpstatus_ptr(FPST_FPCR_A64);
             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
@@ -6931,10 +7022,10 @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             TCGv_i32 ta = read_fp_sreg(s, a->ra);
 
             if (neg_a) {
-                gen_vfp_negs(ta, ta);
+                gen_vfp_maybe_ah_negs(s, ta, ta);
             }
             if (neg_n) {
-                gen_vfp_negs(tn, tn);
+                gen_vfp_maybe_ah_negs(s, tn, tn);
             }
             fpst = fpstatus_ptr(FPST_FPCR_A64);
             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
@@ -6952,10 +7043,10 @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             TCGv_i32 ta = read_fp_hreg(s, a->ra);
 
             if (neg_a) {
-                gen_vfp_negh(ta, ta);
+                gen_vfp_maybe_ah_negh(s, ta, ta);
             }
             if (neg_n) {
-                gen_vfp_negh(tn, tn);
+                gen_vfp_maybe_ah_negh(s, tn, tn);
             }
             fpst = fpstatus_ptr(FPST_FPCR_F16_A64);
             gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
@@ -8544,6 +8635,13 @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
     return true;
 }
 
+static bool do_fp1_scalar_int_2fn(DisasContext *s, arg_rr_e *a,
+                                  const FPScalar1Int *fnormal,
+                                  const FPScalar1Int *fah)
+{
+    return do_fp1_scalar_int(s, a, s->fpcr_ah ? fah : fnormal, true);
+}
+
 static const FPScalar1Int f_scalar_fmov = {
     tcg_gen_mov_i32,
     tcg_gen_mov_i32,
@@ -8563,7 +8661,12 @@ static const FPScalar1Int f_scalar_fneg = {
     gen_vfp_negs,
     gen_vfp_negd,
 };
-TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
+static const FPScalar1Int f_scalar_ah_fneg = {
+    gen_vfp_ah_negh,
+    gen_vfp_ah_negs,
+    gen_vfp_ah_negd,
+};
+TRANS(FNEG_s, do_fp1_scalar_int_2fn, a, &f_scalar_fneg, &f_scalar_ah_fneg)
 
 typedef struct FPScalar1 {
     void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 53/76] target/arm: Implement FPCR.AH handling for scalar FABS and FABD
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (51 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 52/76] target/arm: Implement FPCR.AH handling of negation of NaN Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:01   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 54/76] target/arm: Handle FPCR.AH in vector FABD Peter Maydell
                   ` (24 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

FPCR.AH == 1 mandates that taking the absolute value of a NaN should
not change its sign bit.  This means we can no longer use
gen_vfp_abs*() everywhere but must instead generate slightly more
complex code when FPCR.AH is set.

Implement these semantics for scalar FABS and FABD.  This change also
affects all other instructions whose psuedocode calls FPAbs(); we
will extend the change to those instructions in following commits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 69 +++++++++++++++++++++++++++++++++-
 1 file changed, 67 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 085b29ee536..542e774790b 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -880,6 +880,43 @@ static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
                         s, chs_s);
 }
 
+/*
+ * These functions implement
+ *  d = floatN_is_any_nan(s) ? s : floatN_abs(s)
+ * which for float32 is
+ *  d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s & ~(1 << 31))
+ * and similarly for the other float sizes.
+ */
+static void gen_vfp_ah_absh(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32();
+
+    gen_vfp_absh(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7c00),
+                        s, abs_s);
+}
+
+static void gen_vfp_ah_abss(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32();
+
+    gen_vfp_abss(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7f800000UL),
+                        s, abs_s);
+}
+
+static void gen_vfp_ah_absd(TCGv_i64 d, TCGv_i64 s)
+{
+    TCGv_i64 abs_s = tcg_temp_new_i64();
+
+    gen_vfp_absd(abs_s, s);
+    tcg_gen_movcond_i64(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
+                        s, abs_s);
+}
+
 static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
 {
     if (dc->fpcr_ah) {
@@ -5403,12 +5440,35 @@ static void gen_fabd_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
     gen_vfp_absd(d, d);
 }
 
+static void gen_fabd_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_subh(d, n, m, s);
+    gen_vfp_ah_absh(d, d);
+}
+
+static void gen_fabd_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_subs(d, n, m, s);
+    gen_vfp_ah_abss(d, d);
+}
+
+static void gen_fabd_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
+{
+    gen_helper_vfp_subd(d, n, m, s);
+    gen_vfp_ah_absd(d, d);
+}
+
 static const FPScalar f_scalar_fabd = {
     gen_fabd_h,
     gen_fabd_s,
     gen_fabd_d,
 };
-TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
+static const FPScalar f_scalar_ah_fabd = {
+    gen_fabd_ah_h,
+    gen_fabd_ah_s,
+    gen_fabd_ah_d,
+};
+TRANS(FABD_s, do_fp3_scalar_2fn, a, &f_scalar_fabd, &f_scalar_ah_fabd, a->rn)
 
 static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f16,
@@ -8654,7 +8714,12 @@ static const FPScalar1Int f_scalar_fabs = {
     gen_vfp_abss,
     gen_vfp_absd,
 };
-TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
+static const FPScalar1Int f_scalar_ah_fabs = {
+    gen_vfp_ah_absh,
+    gen_vfp_ah_abss,
+    gen_vfp_ah_absd,
+};
+TRANS(FABS_s, do_fp1_scalar_int_2fn, a, &f_scalar_fabs, &f_scalar_ah_fabs)
 
 static const FPScalar1Int f_scalar_fneg = {
     gen_vfp_negh,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 54/76] target/arm: Handle FPCR.AH in vector FABD
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (52 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 53/76] target/arm: Implement FPCR.AH handling for scalar FABS and FABD Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:03   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 55/76] target/arm: Handle FPCR.AH in SVE FNEG Peter Maydell
                   ` (23 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Split the handling of vector FABD so that it calls a different set
of helpers when FPCR.AH is 1, which implement the "no negation of
the sign of a NaN" semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h            |  4 ++++
 target/arm/tcg/translate-a64.c |  7 ++++++-
 target/arm/tcg/vec_helper.c    | 23 +++++++++++++++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 15bad0773c0..43505d5fedc 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -722,6 +722,10 @@ DEF_HELPER_FLAGS_5(gvec_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fabd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_5(gvec_ah_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_fabd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_5(gvec_fceq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fceq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fceq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 542e774790b..ce9ab75bc2f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5899,7 +5899,12 @@ static gen_helper_gvec_3_ptr * const f_vector_fabd[3] = {
     gen_helper_gvec_fabd_s,
     gen_helper_gvec_fabd_d,
 };
-TRANS(FABD_v, do_fp3_vector, a, 0, f_vector_fabd)
+static gen_helper_gvec_3_ptr * const f_vector_ah_fabd[3] = {
+    gen_helper_gvec_ah_fabd_h,
+    gen_helper_gvec_ah_fabd_s,
+    gen_helper_gvec_ah_fabd_d,
+};
+TRANS(FABD_v, do_fp3_vector_2fn, a, 0, f_vector_fabd, f_vector_ah_fabd)
 
 static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
     gen_helper_gvec_recps_h,
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index d3f2eaa807e..3b87e5b8d6d 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -1302,6 +1302,25 @@ static float64 float64_abd(float64 op1, float64 op2, float_status *stat)
     return float64_abs(float64_sub(op1, op2, stat));
 }
 
+/* ABD when FPCR.AH = 1: avoid flipping sign bit of a NaN result */
+static float16 float16_ah_abd(float16 op1, float16 op2, float_status *stat)
+{
+    float16 r = float16_sub(op1, op2, stat);
+    return float16_is_any_nan(r) ? r : float16_abs(r);
+}
+
+static float32 float32_ah_abd(float32 op1, float32 op2, float_status *stat)
+{
+    float32 r = float32_sub(op1, op2, stat);
+    return float32_is_any_nan(r) ? r : float32_abs(r);
+}
+
+static float64 float64_ah_abd(float64 op1, float64 op2, float_status *stat)
+{
+    float64 r = float64_sub(op1, op2, stat);
+    return float64_is_any_nan(r) ? r : float64_abs(r);
+}
+
 /*
  * Reciprocal step. These are the AArch32 version which uses a
  * non-fused multiply-and-subtract.
@@ -1389,6 +1408,10 @@ DO_3OP(gvec_fabd_h, float16_abd, float16)
 DO_3OP(gvec_fabd_s, float32_abd, float32)
 DO_3OP(gvec_fabd_d, float64_abd, float64)
 
+DO_3OP(gvec_ah_fabd_h, float16_ah_abd, float16)
+DO_3OP(gvec_ah_fabd_s, float32_ah_abd, float32)
+DO_3OP(gvec_ah_fabd_d, float64_ah_abd, float64)
+
 DO_3OP(gvec_fceq_h, float16_ceq, float16)
 DO_3OP(gvec_fceq_s, float32_ceq, float32)
 DO_3OP(gvec_fceq_d, float64_ceq, float64)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 55/76] target/arm: Handle FPCR.AH in SVE FNEG
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (53 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 54/76] target/arm: Handle FPCR.AH in vector FABD Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:05   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 56/76] target/arm: Handle FPCR.AH in SVE FABS Peter Maydell
                   ` (22 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Make SVE FNEG honour the FPCR.AH "don't negate the sign of a NaN" semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 4 ++++
 target/arm/tcg/sve_helper.c    | 8 ++++++++
 target/arm/tcg/translate-sve.c | 7 ++++++-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index 918f2e61b7e..867a6d96e04 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -545,6 +545,10 @@ DEF_HELPER_FLAGS_4(sve_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_ah_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_ah_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_ah_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(sve_not_zpz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_not_zpz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_not_zpz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index a688b98d284..976f3be44e0 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -885,6 +885,14 @@ DO_ZPZ(sve_fneg_h, uint16_t, H1_2, DO_FNEG)
 DO_ZPZ(sve_fneg_s, uint32_t, H1_4, DO_FNEG)
 DO_ZPZ_D(sve_fneg_d, uint64_t, DO_FNEG)
 
+#define DO_AH_FNEG_H(N) (float16_is_any_nan(N) ? (N) : DO_FNEG(N))
+#define DO_AH_FNEG_S(N) (float32_is_any_nan(N) ? (N) : DO_FNEG(N))
+#define DO_AH_FNEG_D(N) (float64_is_any_nan(N) ? (N) : DO_FNEG(N))
+
+DO_ZPZ(sve_ah_fneg_h, uint16_t, H1_2, DO_AH_FNEG_H)
+DO_ZPZ(sve_ah_fneg_s, uint32_t, H1_4, DO_AH_FNEG_S)
+DO_ZPZ_D(sve_ah_fneg_d, uint64_t, DO_AH_FNEG_D)
+
 #define DO_NOT(N)    (~N)
 
 DO_ZPZ(sve_not_zpz_b, uint8_t, H1, DO_NOT)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 0fed92fa48a..c173627ad49 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -789,7 +789,12 @@ static gen_helper_gvec_3 * const fneg_fns[4] = {
     NULL,                  gen_helper_sve_fneg_h,
     gen_helper_sve_fneg_s, gen_helper_sve_fneg_d,
 };
-TRANS_FEAT(FNEG, aa64_sve, gen_gvec_ool_arg_zpz, fneg_fns[a->esz], a, 0)
+static gen_helper_gvec_3 * const fneg_ah_fns[4] = {
+    NULL,                  gen_helper_sve_ah_fneg_h,
+    gen_helper_sve_ah_fneg_s, gen_helper_sve_ah_fneg_d,
+};
+TRANS_FEAT(FNEG, aa64_sve, gen_gvec_ool_arg_zpz,
+           s->fpcr_ah ? fneg_ah_fns[a->esz] : fneg_fns[a->esz], a, 0)
 
 static gen_helper_gvec_3 * const sxtb_fns[4] = {
     NULL,                  gen_helper_sve_sxtb_h,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 56/76] target/arm: Handle FPCR.AH in SVE FABS
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (54 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 55/76] target/arm: Handle FPCR.AH in SVE FNEG Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:05   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 57/76] target/arm: Handle FPCR.AH in SVE FABD Peter Maydell
                   ` (21 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Make SVE FABS honour the FPCR.AH "don't negate the sign of a NaN" semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 4 ++++
 target/arm/tcg/sve_helper.c    | 8 ++++++++
 target/arm/tcg/translate-sve.c | 7 ++++++-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index 867a6d96e04..ff12f650c87 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -541,6 +541,10 @@ DEF_HELPER_FLAGS_4(sve_fabs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_fabs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_fabs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_ah_fabs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_ah_fabs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_ah_fabs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(sve_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index 976f3be44e0..5ce7d736475 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -879,6 +879,14 @@ DO_ZPZ(sve_fabs_h, uint16_t, H1_2, DO_FABS)
 DO_ZPZ(sve_fabs_s, uint32_t, H1_4, DO_FABS)
 DO_ZPZ_D(sve_fabs_d, uint64_t, DO_FABS)
 
+#define DO_AH_FABS_H(N) (float16_is_any_nan(N) ? (N) : DO_FABS(N))
+#define DO_AH_FABS_S(N) (float32_is_any_nan(N) ? (N) : DO_FABS(N))
+#define DO_AH_FABS_D(N) (float64_is_any_nan(N) ? (N) : DO_FABS(N))
+
+DO_ZPZ(sve_ah_fabs_h, uint16_t, H1_2, DO_AH_FABS_H)
+DO_ZPZ(sve_ah_fabs_s, uint32_t, H1_4, DO_AH_FABS_S)
+DO_ZPZ_D(sve_ah_fabs_d, uint64_t, DO_AH_FABS_D)
+
 #define DO_FNEG(N)    (N ^ ~((__typeof(N))-1 >> 1))
 
 DO_ZPZ(sve_fneg_h, uint16_t, H1_2, DO_FNEG)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index c173627ad49..c234a4910dd 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -783,7 +783,12 @@ static gen_helper_gvec_3 * const fabs_fns[4] = {
     NULL,                  gen_helper_sve_fabs_h,
     gen_helper_sve_fabs_s, gen_helper_sve_fabs_d,
 };
-TRANS_FEAT(FABS, aa64_sve, gen_gvec_ool_arg_zpz, fabs_fns[a->esz], a, 0)
+static gen_helper_gvec_3 * const fabs_ah_fns[4] = {
+    NULL,                  gen_helper_sve_ah_fabs_h,
+    gen_helper_sve_ah_fabs_s, gen_helper_sve_ah_fabs_d,
+};
+TRANS_FEAT(FABS, aa64_sve, gen_gvec_ool_arg_zpz,
+           s->fpcr_ah ? fabs_ah_fns[a->esz] : fabs_fns[a->esz], a, 0)
 
 static gen_helper_gvec_3 * const fneg_fns[4] = {
     NULL,                  gen_helper_sve_fneg_h,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 57/76] target/arm: Handle FPCR.AH in SVE FABD
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (55 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 56/76] target/arm: Handle FPCR.AH in SVE FABS Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:06   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 58/76] target/arm: Handle FPCR.AH in negation steps in FCADD Peter Maydell
                   ` (20 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Make the SVE FABD insn honour the FPCR.AH "don't negate the sign
of a NaN" semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    |  7 +++++++
 target/arm/tcg/sve_helper.c    | 22 ++++++++++++++++++++++
 target/arm/tcg/translate-sve.c |  2 +-
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index ff12f650c87..29c70f054af 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -1183,6 +1183,13 @@ DEF_HELPER_FLAGS_6(sve_fabd_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fabd_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(sve_ah_fabd_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(sve_ah_fabd_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(sve_ah_fabd_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_6(sve_fscalbn_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fscalbn_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index 5ce7d736475..8527a7495a6 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -4394,9 +4394,31 @@ static inline float64 abd_d(float64 a, float64 b, float_status *s)
     return float64_abs(float64_sub(a, b, s));
 }
 
+/* ABD when FPCR.AH = 1: avoid flipping sign bit of a NaN result */
+static float16 ah_abd_h(float16 op1, float16 op2, float_status *stat)
+{
+    float16 r = float16_sub(op1, op2, stat);
+    return float16_is_any_nan(r) ? r : float16_abs(r);
+}
+
+static float32 ah_abd_s(float32 op1, float32 op2, float_status *stat)
+{
+    float32 r = float32_sub(op1, op2, stat);
+    return float32_is_any_nan(r) ? r : float32_abs(r);
+}
+
+static float64 ah_abd_d(float64 op1, float64 op2, float_status *stat)
+{
+    float64 r = float64_sub(op1, op2, stat);
+    return float64_is_any_nan(r) ? r : float64_abs(r);
+}
+
 DO_ZPZZ_FP(sve_fabd_h, uint16_t, H1_2, abd_h)
 DO_ZPZZ_FP(sve_fabd_s, uint32_t, H1_4, abd_s)
 DO_ZPZZ_FP(sve_fabd_d, uint64_t, H1_8, abd_d)
+DO_ZPZZ_FP(sve_ah_fabd_h, uint16_t, H1_2, ah_abd_h)
+DO_ZPZZ_FP(sve_ah_fabd_s, uint32_t, H1_4, ah_abd_s)
+DO_ZPZZ_FP(sve_ah_fabd_d, uint64_t, H1_8, ah_abd_d)
 
 static inline float64 scalbn_d(float64 a, int64_t b, float_status *s)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index c234a4910dd..9200f7f8a49 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3789,7 +3789,7 @@ DO_ZPZZ_AH_FP(FMIN_zpzz, aa64_sve, sve_fmin, sve_ah_fmin)
 DO_ZPZZ_AH_FP(FMAX_zpzz, aa64_sve, sve_fmax, sve_ah_fmax)
 DO_ZPZZ_FP(FMINNM_zpzz, aa64_sve, sve_fminnum)
 DO_ZPZZ_FP(FMAXNM_zpzz, aa64_sve, sve_fmaxnum)
-DO_ZPZZ_FP(FABD, aa64_sve, sve_fabd)
+DO_ZPZZ_AH_FP(FABD, aa64_sve, sve_fabd, sve_ah_fabd)
 DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn)
 DO_ZPZZ_FP(FDIV, aa64_sve, sve_fdiv)
 DO_ZPZZ_FP(FMULX, aa64_sve, sve_fmulx)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 58/76] target/arm: Handle FPCR.AH in negation steps in FCADD
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (56 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 57/76] target/arm: Handle FPCR.AH in SVE FABD Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:08   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 59/76] target/arm: Handle FPCR.AH in negation steps in SVE FCADD Peter Maydell
                   ` (19 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

The negation steps in FCADD must honour FPCR.AH's "don't change the
sign of a NaN" semantics.  Implement this by encoding FPCR.AH into
the SIMD data field passed to the helper and using that to decide
whether to negate the values.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 10 +++++++--
 target/arm/tcg/vec_helper.c    | 39 ++++++++++++++++++++++++++++------
 2 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index ce9ab75bc2f..0827dff16b2 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6117,8 +6117,14 @@ static gen_helper_gvec_3_ptr * const f_vector_fcadd[3] = {
     gen_helper_gvec_fcadds,
     gen_helper_gvec_fcaddd,
 };
-TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0, f_vector_fcadd)
-TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1, f_vector_fcadd)
+/*
+ * Encode FPCR.AH into the data so the helper knows whether the
+ * negations it does should avoid flipping the sign bit on a NaN
+ */
+TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0 | (s->fpcr_ah << 1),
+           f_vector_fcadd)
+TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1 | (s->fpcr_ah << 1),
+           f_vector_fcadd)
 
 static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
 {
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 3b87e5b8d6d..382b5da4a9c 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -881,6 +881,7 @@ void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
     float16 *m = vm;
     uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
     uint32_t neg_imag = neg_real ^ 1;
+    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
     uintptr_t i;
 
     /* Shift boolean to the sign bit so we can xor to negate.  */
@@ -889,9 +890,17 @@ void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
 
     for (i = 0; i < opr_sz / 2; i += 2) {
         float16 e0 = n[H2(i)];
-        float16 e1 = m[H2(i + 1)] ^ neg_imag;
+        float16 e1 = m[H2(i + 1)];
         float16 e2 = n[H2(i + 1)];
-        float16 e3 = m[H2(i)] ^ neg_real;
+        float16 e3 = m[H2(i)];
+
+        /* FPNeg() mustn't flip sign of a NaN if FPCR.AH == 1 */
+        if (!(fpcr_ah && float16_is_any_nan(e1))) {
+            e1 ^= neg_imag;
+        }
+        if (!(fpcr_ah && float16_is_any_nan(e3))) {
+            e3 ^= neg_real;
+        }
 
         d[H2(i)] = float16_add(e0, e1, fpst);
         d[H2(i + 1)] = float16_add(e2, e3, fpst);
@@ -908,6 +917,7 @@ void HELPER(gvec_fcadds)(void *vd, void *vn, void *vm,
     float32 *m = vm;
     uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
     uint32_t neg_imag = neg_real ^ 1;
+    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
     uintptr_t i;
 
     /* Shift boolean to the sign bit so we can xor to negate.  */
@@ -916,9 +926,17 @@ void HELPER(gvec_fcadds)(void *vd, void *vn, void *vm,
 
     for (i = 0; i < opr_sz / 4; i += 2) {
         float32 e0 = n[H4(i)];
-        float32 e1 = m[H4(i + 1)] ^ neg_imag;
+        float32 e1 = m[H4(i + 1)];
         float32 e2 = n[H4(i + 1)];
-        float32 e3 = m[H4(i)] ^ neg_real;
+        float32 e3 = m[H4(i)];
+
+        /* FPNeg() mustn't flip sign of a NaN if FPCR.AH == 1 */
+        if (!(fpcr_ah && float32_is_any_nan(e1))) {
+            e1 ^= neg_imag;
+        }
+        if (!(fpcr_ah && float32_is_any_nan(e3))) {
+            e3 ^= neg_real;
+        }
 
         d[H4(i)] = float32_add(e0, e1, fpst);
         d[H4(i + 1)] = float32_add(e2, e3, fpst);
@@ -935,6 +953,7 @@ void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
     float64 *m = vm;
     uint64_t neg_real = extract64(desc, SIMD_DATA_SHIFT, 1);
     uint64_t neg_imag = neg_real ^ 1;
+    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
     uintptr_t i;
 
     /* Shift boolean to the sign bit so we can xor to negate.  */
@@ -943,9 +962,17 @@ void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
 
     for (i = 0; i < opr_sz / 8; i += 2) {
         float64 e0 = n[i];
-        float64 e1 = m[i + 1] ^ neg_imag;
+        float64 e1 = m[i + 1];
         float64 e2 = n[i + 1];
-        float64 e3 = m[i] ^ neg_real;
+        float64 e3 = m[i];
+
+        /* FPNeg() mustn't flip sign of a NaN if FPCR.AH == 1 */
+        if (!(fpcr_ah && float64_is_any_nan(e1))) {
+            e1 ^= neg_imag;
+        }
+        if (!(fpcr_ah && float64_is_any_nan(e3))) {
+            e3 ^= neg_real;
+        }
 
         d[i] = float64_add(e0, e1, fpst);
         d[i + 1] = float64_add(e2, e3, fpst);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 59/76] target/arm: Handle FPCR.AH in negation steps in SVE FCADD
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (57 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 58/76] target/arm: Handle FPCR.AH in negation steps in FCADD Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:10   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 60/76] target/arm: Handle FPCR.AH in FMLSL Peter Maydell
                   ` (18 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

The negation steps in FCADD must honour FPCR.AH's "don't change the
sign of a NaN" semantics.  Implement this in the same way we did for
the base ASIMD FCADD, by encoding FPCR.AH into the SIMD data field
passed to the helper and using that to decide whether to negate the
values.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sve_helper.c    | 45 +++++++++++++++++++++++++++-------
 target/arm/tcg/translate-sve.c |  2 +-
 2 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index 8527a7495a6..dc5a35b46ef 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -5131,7 +5131,9 @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
 {
     intptr_t j, i = simd_oprsz(desc);
     uint64_t *g = vg;
-    float16 neg_imag = float16_set_sign(0, simd_data(desc));
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    float16 neg_imag = float16_set_sign(0, rot);
     float16 neg_real = float16_chs(neg_imag);
 
     do {
@@ -5144,9 +5146,16 @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
             i -= 2 * sizeof(float16);
 
             e0 = *(float16 *)(vn + H1_2(i));
-            e1 = *(float16 *)(vm + H1_2(j)) ^ neg_real;
+            e1 = *(float16 *)(vm + H1_2(j));
             e2 = *(float16 *)(vn + H1_2(j));
-            e3 = *(float16 *)(vm + H1_2(i)) ^ neg_imag;
+            e3 = *(float16 *)(vm + H1_2(i));
+
+            if (neg_real && !(fpcr_ah && float16_is_any_nan(e1))) {
+                e1 ^= neg_real;
+            }
+            if (neg_imag && !(fpcr_ah && float16_is_any_nan(e3))) {
+                e3 ^= neg_imag;
+            }
 
             if (likely((pg >> (i & 63)) & 1)) {
                 *(float16 *)(vd + H1_2(i)) = float16_add(e0, e1, s);
@@ -5163,7 +5172,9 @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
 {
     intptr_t j, i = simd_oprsz(desc);
     uint64_t *g = vg;
-    float32 neg_imag = float32_set_sign(0, simd_data(desc));
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    float32 neg_imag = float32_set_sign(0, rot);
     float32 neg_real = float32_chs(neg_imag);
 
     do {
@@ -5176,9 +5187,16 @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
             i -= 2 * sizeof(float32);
 
             e0 = *(float32 *)(vn + H1_2(i));
-            e1 = *(float32 *)(vm + H1_2(j)) ^ neg_real;
+            e1 = *(float32 *)(vm + H1_2(j));
             e2 = *(float32 *)(vn + H1_2(j));
-            e3 = *(float32 *)(vm + H1_2(i)) ^ neg_imag;
+            e3 = *(float32 *)(vm + H1_2(i));
+
+            if (neg_real && !(fpcr_ah && float32_is_any_nan(e1))) {
+                e1 ^= neg_real;
+            }
+            if (neg_imag && !(fpcr_ah && float32_is_any_nan(e3))) {
+                e3 ^= neg_imag;
+            }
 
             if (likely((pg >> (i & 63)) & 1)) {
                 *(float32 *)(vd + H1_2(i)) = float32_add(e0, e1, s);
@@ -5195,7 +5213,9 @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
 {
     intptr_t j, i = simd_oprsz(desc);
     uint64_t *g = vg;
-    float64 neg_imag = float64_set_sign(0, simd_data(desc));
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    float64 neg_imag = float64_set_sign(0, rot);
     float64 neg_real = float64_chs(neg_imag);
 
     do {
@@ -5208,9 +5228,16 @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
             i -= 2 * sizeof(float64);
 
             e0 = *(float64 *)(vn + H1_2(i));
-            e1 = *(float64 *)(vm + H1_2(j)) ^ neg_real;
+            e1 = *(float64 *)(vm + H1_2(j));
             e2 = *(float64 *)(vn + H1_2(j));
-            e3 = *(float64 *)(vm + H1_2(i)) ^ neg_imag;
+            e3 = *(float64 *)(vm + H1_2(i));
+
+            if (neg_real && !(fpcr_ah && float64_is_any_nan(e1))) {
+                e1 ^= neg_real;
+            }
+            if (neg_imag && !(fpcr_ah && float64_is_any_nan(e3))) {
+                e3 ^= neg_imag;
+            }
 
             if (likely((pg >> (i & 63)) & 1)) {
                 *(float64 *)(vd + H1_2(i)) = float64_add(e0, e1, s);
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 9200f7f8a49..0696192148c 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3916,7 +3916,7 @@ static gen_helper_gvec_4_ptr * const fcadd_fns[] = {
     gen_helper_sve_fcadd_s, gen_helper_sve_fcadd_d,
 };
 TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
-           a->rd, a->rn, a->rm, a->pg, a->rot,
+           a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
            a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
 #define DO_FMLA(NAME, name) \
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 60/76] target/arm: Handle FPCR.AH in FMLSL
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (58 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 59/76] target/arm: Handle FPCR.AH in negation steps in SVE FCADD Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:13   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 61/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns Peter Maydell
                   ` (17 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Honour the FPCR.AH "don't negate the sign of a NaN" semantics in
FMLSL. We pass in the value of FPCR.AH in the SIMD data field, and
use this to determine whether we should suppress the negation for
NaN inputs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c |  4 ++--
 target/arm/tcg/vec_helper.c    | 28 ++++++++++++++++++++++++----
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 0827dff16b2..e22c2a148ab 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5968,7 +5968,7 @@ TRANS(FMINNMP_v, do_fp3_vector, a, 0, f_vector_fminnmp)
 static bool do_fmlal(DisasContext *s, arg_qrrr_e *a, bool is_s, bool is_2)
 {
     if (fp_access_check(s)) {
-        int data = (is_2 << 1) | is_s;
+        int data = (s->fpcr_ah << 2) | (is_2 << 1) | is_s;
         tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd),
                            vec_full_reg_offset(s, a->rn),
                            vec_full_reg_offset(s, a->rm), tcg_env,
@@ -6738,7 +6738,7 @@ TRANS(FMLS_vi, do_fmla_vector_idx, a, true)
 static bool do_fmlal_idx(DisasContext *s, arg_qrrx_e *a, bool is_s, bool is_2)
 {
     if (fp_access_check(s)) {
-        int data = (a->idx << 2) | (is_2 << 1) | is_s;
+        int data = (s->fpcr_ah << 5) | (a->idx << 2) | (is_2 << 1) | is_s;
         tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd),
                            vec_full_reg_offset(s, a->rn),
                            vec_full_reg_offset(s, a->rm), tcg_env,
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 382b5da4a9c..aa42c50f9fe 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2083,6 +2083,26 @@ static uint64_t load4_f16(uint64_t *ptr, int is_q, int is_2)
     return ptr[is_q & is_2] >> ((is_2 & ~is_q) << 5);
 }
 
+static uint64_t neg4_f16(uint64_t v, bool fpcr_ah)
+{
+    /*
+     * Negate all inputs for FMLSL at once. This is slightly complicated
+     * by the need to avoid flipping the sign of a NaN when FPCR.AH == 1
+     */
+    uint64_t mask = 0x8000800080008000ull;
+    if (fpcr_ah) {
+        uint64_t tmp = v, signbit = 0x8000;
+        for (int i = 0; i < 4; i++) {
+            if (float16_is_any_nan(extract64(tmp, 0, 16))) {
+                mask ^= signbit;
+            }
+            tmp >>= 16;
+            signbit <<= 16;
+        }
+    }
+    return v ^ mask;
+}
+
 /*
  * Note that FMLAL requires oprsz == 8 or oprsz == 16,
  * as there is not yet SVE versions that might use blocking.
@@ -2094,6 +2114,7 @@ static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
     intptr_t i, oprsz = simd_oprsz(desc);
     int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
     int is_q = oprsz == 16;
     uint64_t n_4, m_4;
 
@@ -2101,9 +2122,8 @@ static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
     n_4 = load4_f16(vn, is_q, is_2);
     m_4 = load4_f16(vm, is_q, is_2);
 
-    /* Negate all inputs for FMLSL at once.  */
     if (is_s) {
-        n_4 ^= 0x8000800080008000ull;
+        n_4 = neg4_f16(n_4, fpcr_ah);
     }
 
     for (i = 0; i < oprsz / 4; i++) {
@@ -2155,6 +2175,7 @@ static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
     int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     int index = extract32(desc, SIMD_DATA_SHIFT + 2, 3);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 5, 1);
     int is_q = oprsz == 16;
     uint64_t n_4;
     float32 m_1;
@@ -2162,9 +2183,8 @@ static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
     /* Pre-load all of the f16 data, avoiding overlap issues.  */
     n_4 = load4_f16(vn, is_q, is_2);
 
-    /* Negate all inputs for FMLSL at once.  */
     if (is_s) {
-        n_4 ^= 0x8000800080008000ull;
+        n_4 = neg4_f16(n_4, fpcr_ah);
     }
 
     m_1 = float16_to_float32_by_bits(((float16 *)vm)[H2(index)], fz16);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 61/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (59 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 60/76] target/arm: Handle FPCR.AH in FMLSL Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:14   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 62/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns Peter Maydell
                   ` (16 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Handle the FPCR.AH semantics that we do not change the sign of an
input NaN in the FRECPS and FRSQRTS scalar insns, by providing
new helper functions that do the CHS part of the operation
differently.

Since the extra helper functions would be very repetitive if written
out longhand, we condense them and the existing non-AH helpers into
being emitted via macros.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-a64.h    |   6 ++
 target/arm/tcg/helper-a64.c    | 128 ++++++++++++++-------------------
 target/arm/tcg/translate-a64.c |  25 +++++--
 3 files changed, 78 insertions(+), 81 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index ae0424f6de9..85023465b76 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -38,9 +38,15 @@ DEF_HELPER_FLAGS_3(neon_cgt_f64, TCG_CALL_NO_RWG, i64, i64, i64, fpst)
 DEF_HELPER_FLAGS_3(recpsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
 DEF_HELPER_FLAGS_3(recpsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
 DEF_HELPER_FLAGS_3(recpsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
+DEF_HELPER_FLAGS_3(recpsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
+DEF_HELPER_FLAGS_3(recpsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
+DEF_HELPER_FLAGS_3(recpsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
 DEF_HELPER_FLAGS_3(rsqrtsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
 DEF_HELPER_FLAGS_3(rsqrtsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
 DEF_HELPER_FLAGS_3(rsqrtsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
+DEF_HELPER_FLAGS_3(rsqrtsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
+DEF_HELPER_FLAGS_3(rsqrtsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
+DEF_HELPER_FLAGS_3(rsqrtsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
 DEF_HELPER_FLAGS_2(frecpx_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
 DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index 406d76e1129..ba21efd0bb0 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -208,88 +208,66 @@ uint64_t HELPER(neon_cgt_f64)(float64 a, float64 b, float_status *fpst)
     return -float64_lt(b, a, fpst);
 }
 
-/* Reciprocal step and sqrt step. Note that unlike the A32/T32
+static float16 float16_ah_chs(float16 a)
+{
+    return float16_is_any_nan(a) ? a : float16_chs(a);
+}
+
+static float32 float32_ah_chs(float32 a)
+{
+    return float32_is_any_nan(a) ? a : float32_chs(a);
+}
+
+static float64 float64_ah_chs(float64 a)
+{
+    return float64_is_any_nan(a) ? a : float64_chs(a);
+}
+/*
+ * Reciprocal step and sqrt step. Note that unlike the A32/T32
  * versions, these do a fully fused multiply-add or
  * multiply-add-and-halve.
+ * The FPCR.AH == 1 versions need to avoid flipping the sign of NaN.
  */
-
-uint32_t HELPER(recpsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
-{
-    a = float16_squash_input_denormal(a, fpst);
-    b = float16_squash_input_denormal(b, fpst);
-
-    a = float16_chs(a);
-    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
-        (float16_is_infinity(b) && float16_is_zero(a))) {
-        return float16_two;
+#define DO_RECPS(NAME, CTYPE, FLOATTYPE, CHSFN)                         \
+    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
+    {                                                                   \
+        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
+        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
+        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
+        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
+            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
+            return FLOATTYPE ## _two;                                   \
+        }                                                               \
+        return FLOATTYPE ## _muladd(a, b, FLOATTYPE ## _two, 0, fpst);  \
     }
-    return float16_muladd(a, b, float16_two, 0, fpst);
-}
 
-float32 HELPER(recpsf_f32)(float32 a, float32 b, float_status *fpst)
-{
-    a = float32_squash_input_denormal(a, fpst);
-    b = float32_squash_input_denormal(b, fpst);
+DO_RECPS(recpsf_f16, uint32_t, float16, chs)
+DO_RECPS(recpsf_f32, float32, float32, chs)
+DO_RECPS(recpsf_f64, float64, float64, chs)
+DO_RECPS(recpsf_ah_f16, uint32_t, float16, ah_chs)
+DO_RECPS(recpsf_ah_f32, float32, float32, ah_chs)
+DO_RECPS(recpsf_ah_f64, float64, float64, ah_chs)
 
-    a = float32_chs(a);
-    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
-        (float32_is_infinity(b) && float32_is_zero(a))) {
-        return float32_two;
-    }
-    return float32_muladd(a, b, float32_two, 0, fpst);
-}
+#define DO_RSQRTSF(NAME, CTYPE, FLOATTYPE, CHSFN)                       \
+    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
+    {                                                                   \
+        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
+        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
+        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
+        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
+            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
+            return FLOATTYPE ## _one_point_five;                        \
+        }                                                               \
+        return FLOATTYPE ## _muladd_scalbn(a, b, FLOATTYPE ## _three,   \
+                                           -1, 0, fpst);                \
+    }                                                                   \
 
-float64 HELPER(recpsf_f64)(float64 a, float64 b, float_status *fpst)
-{
-    a = float64_squash_input_denormal(a, fpst);
-    b = float64_squash_input_denormal(b, fpst);
-
-    a = float64_chs(a);
-    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
-        (float64_is_infinity(b) && float64_is_zero(a))) {
-        return float64_two;
-    }
-    return float64_muladd(a, b, float64_two, 0, fpst);
-}
-
-uint32_t HELPER(rsqrtsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
-{
-    a = float16_squash_input_denormal(a, fpst);
-    b = float16_squash_input_denormal(b, fpst);
-
-    a = float16_chs(a);
-    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
-        (float16_is_infinity(b) && float16_is_zero(a))) {
-        return float16_one_point_five;
-    }
-    return float16_muladd_scalbn(a, b, float16_three, -1, 0, fpst);
-}
-
-float32 HELPER(rsqrtsf_f32)(float32 a, float32 b, float_status *fpst)
-{
-    a = float32_squash_input_denormal(a, fpst);
-    b = float32_squash_input_denormal(b, fpst);
-
-    a = float32_chs(a);
-    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
-        (float32_is_infinity(b) && float32_is_zero(a))) {
-        return float32_one_point_five;
-    }
-    return float32_muladd_scalbn(a, b, float32_three, -1, 0, fpst);
-}
-
-float64 HELPER(rsqrtsf_f64)(float64 a, float64 b, float_status *fpst)
-{
-    a = float64_squash_input_denormal(a, fpst);
-    b = float64_squash_input_denormal(b, fpst);
-
-    a = float64_chs(a);
-    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
-        (float64_is_infinity(b) && float64_is_zero(a))) {
-        return float64_one_point_five;
-    }
-    return float64_muladd_scalbn(a, b, float64_three, -1, 0, fpst);
-}
+DO_RSQRTSF(rsqrtsf_f16, uint32_t, float16, chs)
+DO_RSQRTSF(rsqrtsf_f32, float32, float32, chs)
+DO_RSQRTSF(rsqrtsf_f64, float64, float64, chs)
+DO_RSQRTSF(rsqrtsf_ah_f16, uint32_t, float16, ah_chs)
+DO_RSQRTSF(rsqrtsf_ah_f32, float32, float32, ah_chs)
+DO_RSQRTSF(rsqrtsf_ah_f64, float64, float64, ah_chs)
 
 /* Floating-point reciprocal exponent - see FPRecpX in ARM ARM */
 uint32_t HELPER(frecpx_f16)(uint32_t a, float_status *fpst)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index e22c2a148ab..977a1589e53 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5250,11 +5250,12 @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
                                        FPST_FPCR_F16_A64 : FPST_FPCR_A64);
 }
 
-static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
-                             int mergereg)
+static bool do_fp3_scalar_ah_2fn(DisasContext *s, arg_rrr_e *a,
+                                 const FPScalar *fnormal, const FPScalar *fah,
+                                 int mergereg)
 {
-    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
-                                       select_fpst(s, a->esz));
+    return do_fp3_scalar_with_fpsttype(s, a, s->fpcr_ah ? fah : fnormal,
+                                       mergereg, select_fpst(s, a->esz));
 }
 
 /* Some insns need to call different helpers when FPCR.AH == 1 */
@@ -5475,14 +5476,26 @@ static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f32,
     gen_helper_recpsf_f64,
 };
-TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
+static const FPScalar f_scalar_ah_frecps = {
+    gen_helper_recpsf_ah_f16,
+    gen_helper_recpsf_ah_f32,
+    gen_helper_recpsf_ah_f64,
+};
+TRANS(FRECPS_s, do_fp3_scalar_ah_2fn, a,
+      &f_scalar_frecps, &f_scalar_ah_frecps, a->rn)
 
 static const FPScalar f_scalar_frsqrts = {
     gen_helper_rsqrtsf_f16,
     gen_helper_rsqrtsf_f32,
     gen_helper_rsqrtsf_f64,
 };
-TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
+static const FPScalar f_scalar_ah_frsqrts = {
+    gen_helper_rsqrtsf_ah_f16,
+    gen_helper_rsqrtsf_ah_f32,
+    gen_helper_rsqrtsf_ah_f64,
+};
+TRANS(FRSQRTS_s, do_fp3_scalar_ah_2fn, a,
+      &f_scalar_frsqrts, &f_scalar_ah_frsqrts, a->rn)
 
 static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                        const FPScalar *f, bool swap)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 62/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (60 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 61/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:15   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 63/76] target/arm: Handle FPCR.AH in negation step in FMLS (indexed) Peter Maydell
                   ` (15 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Handle the FPCR.AH "don't negate the sign of a NaN" semantics
in the vector versions of FRECPS and FRSQRTS, by implementing
new vector wrappers that call the _ah_ scalar helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 21 ++++++++++++++++-----
 target/arm/tcg/translate-sve.c |  7 ++++++-
 target/arm/tcg/vec_helper.c    |  8 ++++++++
 4 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index 29c70f054af..a2e96a498dd 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -980,6 +980,20 @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_5(gvec_ah_recps_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_recps_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_recps_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_5(gvec_ah_rsqrts_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_rsqrts_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_rsqrts_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_5(gvec_ah_fmax_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_ah_fmax_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 977a1589e53..3fe8e041093 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5778,10 +5778,11 @@ static bool do_fp3_vector_2fn(DisasContext *s, arg_qrrr_e *a, int data,
     return do_fp3_vector(s, a, data, s->fpcr_ah ? fah : fnormal);
 }
 
-static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
-                             gen_helper_gvec_3_ptr * const f[3])
+static bool do_fp3_vector_ah_2fn(DisasContext *s, arg_qrrr_e *a, int data,
+                                 gen_helper_gvec_3_ptr * const fnormal[3],
+                                 gen_helper_gvec_3_ptr * const fah[3])
 {
-    return do_fp3_vector_with_fpsttype(s, a, data, f,
+    return do_fp3_vector_with_fpsttype(s, a, data, s->fpcr_ah ? fah : fnormal,
                                        select_fpst(s, a->esz));
 }
 
@@ -5924,14 +5925,24 @@ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
     gen_helper_gvec_recps_s,
     gen_helper_gvec_recps_d,
 };
-TRANS(FRECPS_v, do_fp3_vector_ah, a, 0, f_vector_frecps)
+static gen_helper_gvec_3_ptr * const f_vector_ah_frecps[3] = {
+    gen_helper_gvec_ah_recps_h,
+    gen_helper_gvec_ah_recps_s,
+    gen_helper_gvec_ah_recps_d,
+};
+TRANS(FRECPS_v, do_fp3_vector_ah_2fn, a, 0, f_vector_frecps, f_vector_ah_frecps)
 
 static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
     gen_helper_gvec_rsqrts_h,
     gen_helper_gvec_rsqrts_s,
     gen_helper_gvec_rsqrts_d,
 };
-TRANS(FRSQRTS_v, do_fp3_vector_ah, a, 0, f_vector_frsqrts)
+static gen_helper_gvec_3_ptr * const f_vector_ah_frsqrts[3] = {
+    gen_helper_gvec_ah_rsqrts_h,
+    gen_helper_gvec_ah_rsqrts_s,
+    gen_helper_gvec_ah_rsqrts_d,
+};
+TRANS(FRSQRTS_v, do_fp3_vector_ah_2fn, a, 0, f_vector_frsqrts, f_vector_ah_frsqrts)
 
 static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
     gen_helper_gvec_faddp_h,
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 0696192148c..eef3623fd3a 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3741,7 +3741,12 @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
         NULL, gen_helper_gvec_##name##_h,                           \
         gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
     };                                                              \
-    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz, name##_fns[a->esz], a, 0)
+    static gen_helper_gvec_3_ptr * const name##_ah_fns[4] = {       \
+        NULL, gen_helper_gvec_ah_##name##_h,                        \
+        gen_helper_gvec_ah_##name##_s, gen_helper_gvec_ah_##name##_d    \
+    };                                                              \
+    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz,            \
+               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz], a, 0)
 
 DO_FP3(FADD_zzz, fadd)
 DO_FP3(FSUB_zzz, fsub)
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index aa42c50f9fe..bf6f6a97636 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -1498,6 +1498,14 @@ DO_3OP(gvec_rsqrts_h, helper_rsqrtsf_f16, float16)
 DO_3OP(gvec_rsqrts_s, helper_rsqrtsf_f32, float32)
 DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
 
+DO_3OP(gvec_ah_recps_h, helper_recpsf_ah_f16, float16)
+DO_3OP(gvec_ah_recps_s, helper_recpsf_ah_f32, float32)
+DO_3OP(gvec_ah_recps_d, helper_recpsf_ah_f64, float64)
+
+DO_3OP(gvec_ah_rsqrts_h, helper_rsqrtsf_ah_f16, float16)
+DO_3OP(gvec_ah_rsqrts_s, helper_rsqrtsf_ah_f32, float32)
+DO_3OP(gvec_ah_rsqrts_d, helper_rsqrtsf_ah_f64, float64)
+
 DO_3OP(gvec_ah_fmax_h, helper_vfp_ah_maxh, float16)
 DO_3OP(gvec_ah_fmax_s, helper_vfp_ah_maxs, float32)
 DO_3OP(gvec_ah_fmax_d, helper_vfp_ah_maxd, float64)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 63/76] target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (61 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 62/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:16   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 64/76] target/arm: Handle FPCR.AH in negation in FMLS (vector) Peter Maydell
                   ` (14 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Handle the FPCR.AH "don't negate the sign of a NaN" semantics in FMLS
(indexed), by passing through FPCR.AH in the SIMD data word, for the
helper to use to determine whether to negate.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 2 +-
 target/arm/tcg/translate-sve.c | 2 +-
 target/arm/tcg/vec_helper.c    | 9 +++++++--
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 3fe8e041093..c688275106f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6751,7 +6751,7 @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
                       esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64,
-                      (a->idx << 1) | neg,
+                      (s->fpcr_ah << 5) | (a->idx << 1) | neg,
                       fns[esz - 1]);
     return true;
 }
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index eef3623fd3a..a7033fe93ab 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3533,7 +3533,7 @@ static bool do_FMLA_zzxz(DisasContext *s, arg_rrxr_esz *a, bool sub)
         gen_helper_gvec_fmla_idx_d,
     };
     return gen_gvec_fpst_zzzz(s, fns[a->esz], a->rd, a->rn, a->rm, a->ra,
-                              (a->index << 1) | sub,
+                              (s->fpcr_ah << 5) | (a->index << 1) | sub,
                               a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64);
 }
 
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index bf6f6a97636..5e9663382a9 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -1708,13 +1708,18 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
     intptr_t i, j, oprsz = simd_oprsz(desc);                               \
     intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
     TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1);                    \
-    intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1);                          \
+    intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 1, 3);                \
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 5, 1);                \
     TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
     op1_neg <<= (8 * sizeof(TYPE) - 1);                                    \
     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
         TYPE mm = m[H(i + idx)];                                           \
         for (j = 0; j < segment; j++) {                                    \
-            d[i + j] = TYPE##_muladd(n[i + j] ^ op1_neg,                   \
+            TYPE nval = n[i + j];                                          \
+            if (!(fpcr_ah && TYPE ## _is_any_nan(nval))) {                 \
+                nval ^= op1_neg;                                           \
+            }                                                              \
+            d[i + j] = TYPE##_muladd(nval,                                 \
                                      mm, a[i + j], 0, stat);               \
         }                                                                  \
     }                                                                      \
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 64/76] target/arm: Handle FPCR.AH in negation in FMLS (vector)
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (62 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 63/76] target/arm: Handle FPCR.AH in negation step in FMLS (indexed) Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:17   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 65/76] target/arm: Handle FPCR.AH in negation step in SVE " Peter Maydell
                   ` (13 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Handle the FPCR.AH "don't negate the sign of a NaN" semantics
in FMLS (vector), by implementing a new set of helpers for
the AH=1 case.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h            |  4 ++++
 target/arm/tcg/translate-a64.c |  7 ++++++-
 target/arm/tcg/vec_helper.c    | 25 +++++++++++++++++++++++++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 43505d5fedc..0a8b4c946e1 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -782,6 +782,10 @@ DEF_HELPER_FLAGS_5(gvec_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_5(gvec_ah_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_5(gvec_ah_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c688275106f..0b57e35f999 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5871,7 +5871,12 @@ static gen_helper_gvec_3_ptr * const f_vector_fmls[3] = {
     gen_helper_gvec_vfms_s,
     gen_helper_gvec_vfms_d,
 };
-TRANS(FMLS_v, do_fp3_vector, a, 0, f_vector_fmls)
+static gen_helper_gvec_3_ptr * const f_vector_fmls_ah[3] = {
+    gen_helper_gvec_ah_vfms_h,
+    gen_helper_gvec_ah_vfms_s,
+    gen_helper_gvec_ah_vfms_d,
+};
+TRANS(FMLS_v, do_fp3_vector_2fn, a, 0, f_vector_fmls, f_vector_fmls_ah)
 
 static gen_helper_gvec_3_ptr * const f_vector_fcmeq[3] = {
     gen_helper_gvec_fceq_h,
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 5e9663382a9..c720b435d58 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -1579,6 +1579,27 @@ static float64 float64_mulsub_f(float64 dest, float64 op1, float64 op2,
     return float64_muladd(float64_chs(op1), op2, dest, 0, stat);
 }
 
+static float16 float16_ah_mulsub_f(float16 dest, float16 op1, float16 op2,
+                                 float_status *stat)
+{
+    op1 = float16_is_any_nan(op1) ? op1 : float16_chs(op1);
+    return float16_muladd(op1, op2, dest, 0, stat);
+}
+
+static float32 float32_ah_mulsub_f(float32 dest, float32 op1, float32 op2,
+                                 float_status *stat)
+{
+    op1 = float32_is_any_nan(op1) ? op1 : float32_chs(op1);
+    return float32_muladd(op1, op2, dest, 0, stat);
+}
+
+static float64 float64_ah_mulsub_f(float64 dest, float64 op1, float64 op2,
+                                 float_status *stat)
+{
+    op1 = float64_is_any_nan(op1) ? op1 : float64_chs(op1);
+    return float64_muladd(op1, op2, dest, 0, stat);
+}
+
 #define DO_MULADD(NAME, FUNC, TYPE)                                        \
 void HELPER(NAME)(void *vd, void *vn, void *vm,                            \
                   float_status *stat, uint32_t desc)                       \
@@ -1605,6 +1626,10 @@ DO_MULADD(gvec_vfms_h, float16_mulsub_f, float16)
 DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32)
 DO_MULADD(gvec_vfms_d, float64_mulsub_f, float64)
 
+DO_MULADD(gvec_ah_vfms_h, float16_ah_mulsub_f, float16)
+DO_MULADD(gvec_ah_vfms_s, float32_ah_mulsub_f, float32)
+DO_MULADD(gvec_ah_vfms_d, float64_ah_mulsub_f, float64)
+
 /* For the indexed ops, SVE applies the index per 128-bit vector segment.
  * For AdvSIMD, there is of course only one such vector segment.
  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 65/76] target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (63 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 64/76] target/arm: Handle FPCR.AH in negation in FMLS (vector) Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:19   ` Richard Henderson
  2025-01-27 20:41   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 66/76] target/arm: Handle FPCR.AH in SVE FTSSEL Peter Maydell
                   ` (12 subsequent siblings)
  77 siblings, 2 replies; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Handle the FPCR.AH "don't negate the sign of a NaN" semantics fro the
SVE FMLS (vector) insns, by providing new helpers for the AH=1 case
which end up passing fpcr_ah = true to the do_fmla_zpzzz_* functions
that do the work.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    |  21 ++++++
 target/arm/tcg/sve_helper.c    | 114 +++++++++++++++++++++++++++------
 target/arm/tcg/translate-sve.c |  18 ++++--
 3 files changed, 126 insertions(+), 27 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index a2e96a498dd..0b1b5887834 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -1475,6 +1475,27 @@ DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index dc5a35b46ef..90bcf680fa4 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -4802,7 +4802,7 @@ DO_ZPZ_FP(flogb_d, float64, H1_8, do_float64_logb_as_int)
 
 static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
-                            uint16_t neg1, uint16_t neg3)
+                            uint16_t neg1, uint16_t neg3, bool fpcr_ah)
 {
     intptr_t i = simd_oprsz(desc);
     uint64_t *g = vg;
@@ -4814,9 +4814,15 @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
             if (likely((pg >> (i & 63)) & 1)) {
                 float16 e1, e2, e3, r;
 
-                e1 = *(uint16_t *)(vn + H1_2(i)) ^ neg1;
+                e1 = *(uint16_t *)(vn + H1_2(i));
                 e2 = *(uint16_t *)(vm + H1_2(i));
-                e3 = *(uint16_t *)(va + H1_2(i)) ^ neg3;
+                e3 = *(uint16_t *)(va + H1_2(i));
+                if (neg1 && !(fpcr_ah && float16_is_any_nan(e1))) {
+                    e1 ^= neg1;
+                }
+                if (neg3 && !(fpcr_ah && float16_is_any_nan(e3))) {
+                    e3 ^= neg3;
+                }
                 r = float16_muladd(e1, e2, e3, 0, status);
                 *(uint16_t *)(vd + H1_2(i)) = r;
             }
@@ -4827,30 +4833,48 @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
 void HELPER(sve_fmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0, false);
 }
 
 void HELPER(sve_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0, false);
 }
 
 void HELPER(sve_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000, false);
 }
 
 void HELPER(sve_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000, false);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0, true);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000, true);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000, true);
 }
 
 static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
-                            uint32_t neg1, uint32_t neg3)
+                            uint32_t neg1, uint32_t neg3, bool fpcr_ah)
 {
     intptr_t i = simd_oprsz(desc);
     uint64_t *g = vg;
@@ -4862,9 +4886,15 @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
             if (likely((pg >> (i & 63)) & 1)) {
                 float32 e1, e2, e3, r;
 
-                e1 = *(uint32_t *)(vn + H1_4(i)) ^ neg1;
+                e1 = *(uint32_t *)(vn + H1_4(i));
                 e2 = *(uint32_t *)(vm + H1_4(i));
-                e3 = *(uint32_t *)(va + H1_4(i)) ^ neg3;
+                e3 = *(uint32_t *)(va + H1_4(i));
+                if (neg1 && !(fpcr_ah && float32_is_any_nan(e1))) {
+                    e1 ^= neg1;
+                }
+                if (neg3 && !(fpcr_ah && float32_is_any_nan(e3))) {
+                    e3 ^= neg3;
+                }
                 r = float32_muladd(e1, e2, e3, 0, status);
                 *(uint32_t *)(vd + H1_4(i)) = r;
             }
@@ -4875,30 +4905,48 @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
 void HELPER(sve_fmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0, false);
 }
 
 void HELPER(sve_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0, false);
 }
 
 void HELPER(sve_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000, false);
 }
 
 void HELPER(sve_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000, false);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0, true);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000, true);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000, true);
 }
 
 static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
-                            uint64_t neg1, uint64_t neg3)
+                            uint64_t neg1, uint64_t neg3, bool fpcr_ah)
 {
     intptr_t i = simd_oprsz(desc);
     uint64_t *g = vg;
@@ -4910,9 +4958,15 @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
             if (likely((pg >> (i & 63)) & 1)) {
                 float64 e1, e2, e3, r;
 
-                e1 = *(uint64_t *)(vn + i) ^ neg1;
+                e1 = *(uint64_t *)(vn + i);
                 e2 = *(uint64_t *)(vm + i);
-                e3 = *(uint64_t *)(va + i) ^ neg3;
+                e3 = *(uint64_t *)(va + i);
+                if (neg1 && !(fpcr_ah && float64_is_any_nan(e1))) {
+                    e1 ^= neg1;
+                }
+                if (neg3 && !(fpcr_ah && float64_is_any_nan(e3))) {
+                    e3 ^= neg3;
+                }
                 r = float64_muladd(e1, e2, e3, 0, status);
                 *(uint64_t *)(vd + i) = r;
             }
@@ -4923,25 +4977,43 @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
 void HELPER(sve_fmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0, false);
 }
 
 void HELPER(sve_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0, false);
 }
 
 void HELPER(sve_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN, false);
 }
 
 void HELPER(sve_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN, false);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0, true);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN, true);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN, true);
 }
 
 /* Two operand floating-point comparison controlled by a predicate.
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index a7033fe93ab..663634e3a39 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3924,19 +3924,25 @@ TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
            a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
            a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
-#define DO_FMLA(NAME, name) \
+#define DO_FMLA(NAME, name, ah_name)                                    \
     static gen_helper_gvec_5_ptr * const name##_fns[4] = {              \
         NULL, gen_helper_sve_##name##_h,                                \
         gen_helper_sve_##name##_s, gen_helper_sve_##name##_d            \
     };                                                                  \
-    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp, name##_fns[a->esz], \
+    static gen_helper_gvec_5_ptr * const name##_ah_fns[4] = {           \
+        NULL, gen_helper_sve_##ah_name##_h,                             \
+        gen_helper_sve_##ah_name##_s, gen_helper_sve_##ah_name##_d      \
+    };                                                                  \
+    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp,                     \
+               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz], \
                a->rd, a->rn, a->rm, a->ra, a->pg, 0,                    \
                a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
-DO_FMLA(FMLA_zpzzz, fmla_zpzzz)
-DO_FMLA(FMLS_zpzzz, fmls_zpzzz)
-DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz)
-DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz)
+/* We don't need an ah_fmla_zpzzz because fmla doesn't negate anything */
+DO_FMLA(FMLA_zpzzz, fmla_zpzzz, fmla_zpzzz)
+DO_FMLA(FMLS_zpzzz, fmls_zpzzz, ah_fmls_zpzzz)
+DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz, ah_fnmla_zpzzz)
+DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz, ah_fnmls_zpzzz)
 
 #undef DO_FMLA
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 66/76] target/arm: Handle FPCR.AH in SVE FTSSEL
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (64 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 65/76] target/arm: Handle FPCR.AH in negation step in SVE " Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:20   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 67/76] target/arm: Handle FPCR.AH in SVE FTMAD Peter Maydell
                   ` (11 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

The negation step in the SVE FTSSEL insn mustn't negate a NaN when
FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field
and use that to determine whether to do the negation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sve_helper.c    | 18 +++++++++++++++---
 target/arm/tcg/translate-sve.c |  4 ++--
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index 90bcf680fa4..a39a3ed0cf9 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -2555,6 +2555,7 @@ void HELPER(sve_fexpa_d)(void *vd, void *vn, uint32_t desc)
 void HELPER(sve_ftssel_h)(void *vd, void *vn, void *vm, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc) / 2;
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT, 1);
     uint16_t *d = vd, *n = vn, *m = vm;
     for (i = 0; i < opr_sz; i += 1) {
         uint16_t nn = n[i];
@@ -2562,13 +2563,17 @@ void HELPER(sve_ftssel_h)(void *vd, void *vn, void *vm, uint32_t desc)
         if (mm & 1) {
             nn = float16_one;
         }
-        d[i] = nn ^ (mm & 2) << 14;
+        if ((mm & 2) && !(fpcr_ah && float16_is_any_nan(nn))) {
+            nn ^= (1 << 15);
+        }
+        d[i] = nn;
     }
 }
 
 void HELPER(sve_ftssel_s)(void *vd, void *vn, void *vm, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc) / 4;
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT, 1);
     uint32_t *d = vd, *n = vn, *m = vm;
     for (i = 0; i < opr_sz; i += 1) {
         uint32_t nn = n[i];
@@ -2576,13 +2581,17 @@ void HELPER(sve_ftssel_s)(void *vd, void *vn, void *vm, uint32_t desc)
         if (mm & 1) {
             nn = float32_one;
         }
-        d[i] = nn ^ (mm & 2) << 30;
+        if ((mm & 2) && !(fpcr_ah && float32_is_any_nan(nn))) {
+            nn ^= (1U << 31);
+        }
+        d[i] = nn;
     }
 }
 
 void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT, 1);
     uint64_t *d = vd, *n = vn, *m = vm;
     for (i = 0; i < opr_sz; i += 1) {
         uint64_t nn = n[i];
@@ -2590,7 +2599,10 @@ void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
         if (mm & 1) {
             nn = float64_one;
         }
-        d[i] = nn ^ (mm & 2) << 62;
+        if ((mm & 2) && !(fpcr_ah && float64_is_any_nan(nn))) {
+            nn ^= (1ULL << 63);
+        }
+        d[i] = nn;
     }
 }
 
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 663634e3a39..2d70b0faad2 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -1238,14 +1238,14 @@ static gen_helper_gvec_2 * const fexpa_fns[4] = {
     gen_helper_sve_fexpa_s, gen_helper_sve_fexpa_d,
 };
 TRANS_FEAT_NONSTREAMING(FEXPA, aa64_sve, gen_gvec_ool_zz,
-                        fexpa_fns[a->esz], a->rd, a->rn, 0)
+                        fexpa_fns[a->esz], a->rd, a->rn, s->fpcr_ah)
 
 static gen_helper_gvec_3 * const ftssel_fns[4] = {
     NULL,                    gen_helper_sve_ftssel_h,
     gen_helper_sve_ftssel_s, gen_helper_sve_ftssel_d,
 };
 TRANS_FEAT_NONSTREAMING(FTSSEL, aa64_sve, gen_gvec_ool_arg_zzz,
-                        ftssel_fns[a->esz], a, 0)
+                        ftssel_fns[a->esz], a, s->fpcr_ah)
 
 /*
  *** SVE Predicate Logical Operations Group
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 67/76] target/arm: Handle FPCR.AH in SVE FTMAD
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (65 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 66/76] target/arm: Handle FPCR.AH in SVE FTSSEL Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:21   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 68/76] target/arm: Enable FEAT_AFP for '-cpu max' Peter Maydell
                   ` (10 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

The negation step in the SVE FTMAD insn mustn't negate a NaN when
FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field
and use that to determine whether to do the negation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sve_helper.c    | 21 +++++++++++++++------
 target/arm/tcg/translate-sve.c |  3 ++-
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index a39a3ed0cf9..3f38e078291 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -5143,13 +5143,16 @@ void HELPER(sve_ftmad_h)(void *vd, void *vn, void *vm,
         0x3c00, 0xb800, 0x293a, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
     };
     intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float16);
-    intptr_t x = simd_data(desc);
+    intptr_t x = extract32(desc, SIMD_DATA_SHIFT, 3);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 3, 1);
     float16 *d = vd, *n = vn, *m = vm;
     for (i = 0; i < opr_sz; i++) {
         float16 mm = m[i];
         intptr_t xx = x;
         if (float16_is_neg(mm)) {
-            mm = float16_abs(mm);
+            if (!(fpcr_ah && float16_is_any_nan(mm))) {
+                mm = float16_abs(mm);
+            }
             xx += 8;
         }
         d[i] = float16_muladd(n[i], mm, coeff[xx], 0, s);
@@ -5166,13 +5169,16 @@ void HELPER(sve_ftmad_s)(void *vd, void *vn, void *vm,
         0x37cd37cc, 0x00000000, 0x00000000, 0x00000000,
     };
     intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float32);
-    intptr_t x = simd_data(desc);
+    intptr_t x = extract32(desc, SIMD_DATA_SHIFT, 3);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 3, 1);
     float32 *d = vd, *n = vn, *m = vm;
     for (i = 0; i < opr_sz; i++) {
         float32 mm = m[i];
         intptr_t xx = x;
         if (float32_is_neg(mm)) {
-            mm = float32_abs(mm);
+            if (!(fpcr_ah && float32_is_any_nan(mm))) {
+                mm = float32_abs(mm);
+            }
             xx += 8;
         }
         d[i] = float32_muladd(n[i], mm, coeff[xx], 0, s);
@@ -5193,13 +5199,16 @@ void HELPER(sve_ftmad_d)(void *vd, void *vn, void *vm,
         0x3e21ee96d2641b13ull, 0xbda8f76380fbb401ull,
     };
     intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float64);
-    intptr_t x = simd_data(desc);
+    intptr_t x = extract32(desc, SIMD_DATA_SHIFT, 3);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 3, 1);
     float64 *d = vd, *n = vn, *m = vm;
     for (i = 0; i < opr_sz; i++) {
         float64 mm = m[i];
         intptr_t xx = x;
         if (float64_is_neg(mm)) {
-            mm = float64_abs(mm);
+            if (!(fpcr_ah && float64_is_any_nan(mm))) {
+                mm = float64_abs(mm);
+            }
             xx += 8;
         }
         d[i] = float64_muladd(n[i], mm, coeff[xx], 0, s);
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 2d70b0faad2..26bdda8f96e 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3682,7 +3682,8 @@ static gen_helper_gvec_3_ptr * const ftmad_fns[4] = {
     gen_helper_sve_ftmad_s, gen_helper_sve_ftmad_d,
 };
 TRANS_FEAT_NONSTREAMING(FTMAD, aa64_sve, gen_gvec_fpst_zzz,
-                        ftmad_fns[a->esz], a->rd, a->rn, a->rm, a->imm,
+                        ftmad_fns[a->esz], a->rd, a->rn, a->rm,
+                        a->imm | (s->fpcr_ah << 3),
                         a->esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64)
 
 /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 68/76] target/arm: Enable FEAT_AFP for '-cpu max'
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (66 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 67/76] target/arm: Handle FPCR.AH in SVE FTMAD Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:21   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 69/76] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper Peter Maydell
                   ` (9 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Now that we have completed the handling for FPCR.{AH,FIZ,NEP}, we
can enable FEAT_AFP for '-cpu max', and document that we support it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/tcg/cpu64.c        | 1 +
 2 files changed, 2 insertions(+)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 60176d08597..63b4cdf5fb1 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -20,6 +20,7 @@ the following architecture extensions:
 - FEAT_AA64EL3 (Support for AArch64 at EL3)
 - FEAT_AdvSIMD (Advanced SIMD Extension)
 - FEAT_AES (AESD and AESE instructions)
+- FEAT_AFP (Alternate floating-point behavior)
 - FEAT_Armv9_Crypto (Armv9 Cryptographic Extension)
 - FEAT_ASID16 (16 bit ASID)
 - FEAT_BBM at level 2 (Translation table break-before-make levels)
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 93573ceeb1a..0bc68aac177 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -1218,6 +1218,7 @@ void aarch64_max_tcg_initfn(Object *obj)
     t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1);      /* FEAT_XNX */
     t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 2);      /* FEAT_ETS2 */
     t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1);      /* FEAT_HCX */
+    t = FIELD_DP64(t, ID_AA64MMFR1, AFP, 1);      /* FEAT_AFP */
     t = FIELD_DP64(t, ID_AA64MMFR1, TIDCP1, 1);   /* FEAT_TIDCP1 */
     t = FIELD_DP64(t, ID_AA64MMFR1, CMOW, 1);     /* FEAT_CMOW */
     cpu->isar.id_aa64mmfr1 = t;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 69/76] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (67 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 68/76] target/arm: Enable FEAT_AFP for '-cpu max' Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:23   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 70/76] target/arm: Implement increased precision FRECPE Peter Maydell
                   ` (8 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

FEAT_RPRES implements an "increased precision" variant of the single
precision FRECPE and FRSQRTE instructions from an 8 bit to a 12
bit mantissa. This applies only when FPCR.AH == 1. Note that the
halfprec and double versions of these insns retain the 8 bit
precision regardless.

In this commit we add all the plumbing to make these instructions
call a new helper function when the increased-precision is in
effect. In the following commit we will provide the actual change
in behaviour in the helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu-features.h      |  5 +++++
 target/arm/helper.h            |  4 ++++
 target/arm/tcg/translate-a64.c | 34 ++++++++++++++++++++++++++++++----
 target/arm/tcg/translate-sve.c | 16 ++++++++++++++--
 target/arm/tcg/vec_helper.c    |  2 ++
 target/arm/vfp_helper.c        | 32 ++++++++++++++++++++++++++++++--
 6 files changed, 85 insertions(+), 8 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index 7bf24c506b3..525e4cee12f 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -597,6 +597,11 @@ static inline bool isar_feature_aa64_mops(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, MOPS);
 }
 
+static inline bool isar_feature_aa64_rpres(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, RPRES);
+}
+
 static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
 {
     /* We always set the AdvSIMD and FP fields identically.  */
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 0a8b4c946e1..dbad1f5d741 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -245,9 +245,11 @@ DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, fpst)
 
 DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
 DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
+DEF_HELPER_FLAGS_2(recpe_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
 DEF_HELPER_FLAGS_2(rsqrte_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
 DEF_HELPER_FLAGS_2(rsqrte_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
+DEF_HELPER_FLAGS_2(rsqrte_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 DEF_HELPER_FLAGS_2(rsqrte_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
 DEF_HELPER_FLAGS_1(recpe_u32, TCG_CALL_NO_RWG, i32, i32)
 DEF_HELPER_FLAGS_1(rsqrte_u32, TCG_CALL_NO_RWG, i32, i32)
@@ -680,10 +682,12 @@ DEF_HELPER_FLAGS_4(gvec_vrintx_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
 DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_4(gvec_frecpe_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
 DEF_HELPER_FLAGS_4(gvec_frsqrte_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frsqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_4(gvec_frsqrte_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frsqrte_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
 DEF_HELPER_FLAGS_4(gvec_fcgt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 0b57e35f999..3e2fe46464f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8909,7 +8909,14 @@ static const FPScalar1 f_scalar_frecpe = {
     gen_helper_recpe_f32,
     gen_helper_recpe_f64,
 };
-TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
+static const FPScalar1 f_scalar_frecpe_rpres = {
+    gen_helper_recpe_f16,
+    gen_helper_recpe_rpres_f32,
+    gen_helper_recpe_f64,
+};
+TRANS(FRECPE_s, do_fp1_scalar_ah, a,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+      &f_scalar_frecpe_rpres : &f_scalar_frecpe, -1)
 
 static const FPScalar1 f_scalar_frecpx = {
     gen_helper_frecpx_f16,
@@ -8923,7 +8930,14 @@ static const FPScalar1 f_scalar_frsqrte = {
     gen_helper_rsqrte_f32,
     gen_helper_rsqrte_f64,
 };
-TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
+static const FPScalar1 f_scalar_frsqrte_rpres = {
+    gen_helper_rsqrte_f16,
+    gen_helper_rsqrte_rpres_f32,
+    gen_helper_rsqrte_f64,
+};
+TRANS(FRSQRTE_s, do_fp1_scalar_ah, a,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+      &f_scalar_frsqrte_rpres : &f_scalar_frsqrte, -1)
 
 static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
 {
@@ -9954,14 +9968,26 @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
     gen_helper_gvec_frecpe_s,
     gen_helper_gvec_frecpe_d,
 };
-TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
+static gen_helper_gvec_2_ptr * const f_frecpe_rpres[] = {
+    gen_helper_gvec_frecpe_h,
+    gen_helper_gvec_frecpe_rpres_s,
+    gen_helper_gvec_frecpe_d,
+};
+TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frecpe_rpres : f_frecpe)
 
 static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
     gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s,
     gen_helper_gvec_frsqrte_d,
 };
-TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
+static gen_helper_gvec_2_ptr * const f_frsqrte_rpres[] = {
+    gen_helper_gvec_frsqrte_h,
+    gen_helper_gvec_frsqrte_rpres_s,
+    gen_helper_gvec_frsqrte_d,
+};
+TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frsqrte_rpres : f_frsqrte)
 
 static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 26bdda8f96e..454f7ff9008 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3626,13 +3626,25 @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
     NULL,                     gen_helper_gvec_frecpe_h,
     gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
 };
-TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
+static gen_helper_gvec_2_ptr * const frecpe_rpres_fns[] = {
+    NULL,                           gen_helper_gvec_frecpe_h,
+    gen_helper_gvec_frecpe_rpres_s, gen_helper_gvec_frecpe_d,
+};
+TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
+           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+           frecpe_rpres_fns[a->esz] : frecpe_fns[a->esz], a, 0)
 
 static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
     NULL,                      gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
 };
-TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
+static gen_helper_gvec_2_ptr * const frsqrte_rpres_fns[] = {
+    NULL,                            gen_helper_gvec_frsqrte_h,
+    gen_helper_gvec_frsqrte_rpres_s, gen_helper_gvec_frsqrte_d,
+};
+TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
+           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+           frsqrte_rpres_fns[a->esz] : frsqrte_fns[a->esz], a, 0)
 
 /*
  *** SVE Floating Point Compare with Zero Group
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index c720b435d58..b369c9f45b3 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -1237,10 +1237,12 @@ void HELPER(NAME)(void *vd, void *vn, float_status *stat, uint32_t desc)  \
 
 DO_2OP(gvec_frecpe_h, helper_recpe_f16, float16)
 DO_2OP(gvec_frecpe_s, helper_recpe_f32, float32)
+DO_2OP(gvec_frecpe_rpres_s, helper_recpe_rpres_f32, float32)
 DO_2OP(gvec_frecpe_d, helper_recpe_f64, float64)
 
 DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
 DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
+DO_2OP(gvec_frsqrte_rpres_s, helper_rsqrte_rpres_f32, float32)
 DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
 
 DO_2OP(gvec_vrintx_h, float16_round_to_int, float16)
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 50a8a659577..1b7ecc14621 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -839,7 +839,11 @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
     return make_float16(f16_val);
 }
 
-float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
+/*
+ * FEAT_RPRES means the f32 FRECPE has an "increased precision" variant
+ * which is used when FPCR.AH == 1.
+ */
+static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
 {
     float32 f32 = float32_squash_input_denormal(input, fpst);
     uint32_t f32_val = float32_val(f32);
@@ -888,6 +892,16 @@ float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
     return make_float32(f32_val);
 }
 
+float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
+{
+    return do_recpe_f32(input, fpst, false);
+}
+
+float32 HELPER(recpe_rpres_f32)(float32 input, float_status *fpst)
+{
+    return do_recpe_f32(input, fpst, true);
+}
+
 float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
 {
     float64 f64 = float64_squash_input_denormal(input, fpst);
@@ -1033,7 +1047,11 @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
     return make_float16(val);
 }
 
-float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
+/*
+ * FEAT_RPRES means the f32 FRSQRTE has an "increased precision" variant
+ * which is used when FPCR.AH == 1.
+ */
+static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
 {
     float32 f32 = float32_squash_input_denormal(input, s);
     uint32_t val = float32_val(f32);
@@ -1078,6 +1096,16 @@ float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
     return make_float32(val);
 }
 
+float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
+{
+    return do_rsqrte_f32(input, s, false);
+}
+
+float32 HELPER(rsqrte_rpres_f32)(float32 input, float_status *s)
+{
+    return do_rsqrte_f32(input, s, true);
+}
+
 float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
 {
     float64 f64 = float64_squash_input_denormal(input, s);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 70/76] target/arm: Implement increased precision FRECPE
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (68 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 69/76] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:26   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 71/76] target/arm: Implement increased precision FRSQRTE Peter Maydell
                   ` (7 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Implement the increased precision variation of FRECPE.  In the
pseudocode this corresponds to the handling of the
"increasedprecision" boolean in the FPRecipEstimate() and
RecipEstimate() functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 54 +++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 8 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 1b7ecc14621..79e58c5bb2a 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -731,6 +731,33 @@ static int recip_estimate(int input)
     return r;
 }
 
+/*
+ * Increased precision version:
+ * input is a 13 bit fixed point number
+ * input range 2048 .. 4095 for a number from 0.5 <= x < 1.0.
+ * result range 4096 .. 8191 for a number from 1.0 to 2.0
+ */
+static int recip_estimate_incprec(int input)
+{
+    int a, b, r;
+    assert(2048 <= input && input < 4096);
+    a = (input * 2) + 1;
+    /*
+     * The pseudocode expresses this as an operation on infinite
+     * precision reals where it calculates 2^25 / a and then looks
+     * at the error between that and the rounded-down-to-integer
+     * value to see if it should instead round up. We instead
+     * follow the same approach as the pseudocode for the 8-bit
+     * precision version, and calculate (2 * (2^25 / a)) as an
+     * integer so we can do the "add one and halve" to round it.
+     * So the 1 << 26 here is correct.
+     */
+    b = (1 << 26) / a;
+    r = (b + 1) >> 1;
+    assert(4096 <= r && r < 8192);
+    return r;
+}
+
 /*
  * Common wrapper to call recip_estimate
  *
@@ -740,7 +767,8 @@ static int recip_estimate(int input)
  * callee.
  */
 
-static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
+static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac,
+                                    bool increasedprecision)
 {
     uint32_t scaled, estimate;
     uint64_t result_frac;
@@ -756,12 +784,22 @@ static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
         }
     }
 
-    /* scaled = UInt('1':fraction<51:44>) */
-    scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
-    estimate = recip_estimate(scaled);
+    if (increasedprecision) {
+        /* scaled = UInt('1':fraction<51:41>) */
+        scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
+        estimate = recip_estimate_incprec(scaled);
+    } else {
+        /* scaled = UInt('1':fraction<51:44>) */
+        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+        estimate = recip_estimate(scaled);
+    }
 
     result_exp = exp_off - *exp;
-    result_frac = deposit64(0, 44, 8, estimate);
+    if (increasedprecision) {
+        result_frac = deposit64(0, 40, 12, estimate);
+    } else {
+        result_frac = deposit64(0, 44, 8, estimate);
+    }
     if (result_exp == 0) {
         result_frac = deposit64(result_frac >> 1, 51, 1, 1);
     } else if (result_exp == -1) {
@@ -830,7 +868,7 @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
     }
 
     f64_frac = call_recip_estimate(&f16_exp, 29,
-                                   ((uint64_t) f16_frac) << (52 - 10));
+                                   ((uint64_t) f16_frac) << (52 - 10), false);
 
     /* result = sign : result_exp<4:0> : fraction<51:42> */
     f16_val = deposit32(0, 15, 1, f16_sign);
@@ -883,7 +921,7 @@ static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
     }
 
     f64_frac = call_recip_estimate(&f32_exp, 253,
-                                   ((uint64_t) f32_frac) << (52 - 23));
+                                   ((uint64_t) f32_frac) << (52 - 23), rpres);
 
     /* result = sign : result_exp<7:0> : fraction<51:29> */
     f32_val = deposit32(0, 31, 1, f32_sign);
@@ -941,7 +979,7 @@ float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
         return float64_set_sign(float64_zero, float64_is_neg(f64));
     }
 
-    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac);
+    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac, false);
 
     /* result = sign : result_exp<10:0> : fraction<51:0>; */
     f64_val = deposit64(0, 63, 1, f64_sign);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 71/76] target/arm: Implement increased precision FRSQRTE
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (69 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 70/76] target/arm: Implement increased precision FRECPE Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:28   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 72/76] target/arm: Enable FEAT_RPRES for -cpu max Peter Maydell
                   ` (6 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Implement the increased precision variation of FRSQRTE.  In the
pseudocode this corresponds to the handling of the
"increasedprecision" boolean in the FPRSqrtEstimate() and
RecipSqrtEstimate() functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 77 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 64 insertions(+), 13 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 79e58c5bb2a..e63455c4bb9 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -1013,8 +1013,36 @@ static int do_recip_sqrt_estimate(int a)
     return estimate;
 }
 
+static int do_recip_sqrt_estimate_incprec(int a)
+{
+    /*
+     * The Arm ARM describes the 12-bit precision version of RecipSqrtEstimate
+     * in terms of an infinite-precision floating point calculation of a
+     * square root. We implement this using the same kind of pure integer
+     * algorithm as the 8-bit mantissa, to get the same bit-for-bit result.
+     */
+    int64_t b, estimate;
 
-static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
+    assert(1024 <= a && a < 4096);
+    if (a < 2048) {
+        a = a * 2 + 1;
+    } else {
+        a = (a >> 1) << 1;
+        a = (a + 1) * 2;
+    }
+    b = 8192;
+    while (a * (b + 1) * (b + 1) < (1ULL << 39)) {
+        b += 1;
+    }
+    estimate = (b + 1) / 2;
+
+    assert(4096 <= estimate && estimate < 8192);
+
+    return estimate;
+}
+
+static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac,
+                                    bool increasedprecision)
 {
     int estimate;
     uint32_t scaled;
@@ -1027,17 +1055,32 @@ static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
         frac = extract64(frac, 0, 51) << 1;
     }
 
-    if (*exp & 1) {
-        /* scaled = UInt('01':fraction<51:45>) */
-        scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
+    if (increasedprecision) {
+        if (*exp & 1) {
+            /* scaled = UInt('01':fraction<51:42>) */
+            scaled = deposit32(1 << 10, 0, 10, extract64(frac, 42, 10));
+        } else {
+            /* scaled = UInt('1':fraction<51:41>) */
+            scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
+        }
+        estimate = do_recip_sqrt_estimate_incprec(scaled);
     } else {
-        /* scaled = UInt('1':fraction<51:44>) */
-        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+        if (*exp & 1) {
+            /* scaled = UInt('01':fraction<51:45>) */
+            scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
+        } else {
+            /* scaled = UInt('1':fraction<51:44>) */
+            scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+        }
+        estimate = do_recip_sqrt_estimate(scaled);
     }
-    estimate = do_recip_sqrt_estimate(scaled);
 
     *exp = (exp_off - *exp) / 2;
-    return extract64(estimate, 0, 8) << 44;
+    if (increasedprecision) {
+        return extract64(estimate, 0, 12) << 40;
+    } else {
+        return extract64(estimate, 0, 8) << 44;
+    }
 }
 
 uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
@@ -1076,7 +1119,7 @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
 
     f64_frac = ((uint64_t) f16_frac) << (52 - 10);
 
-    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac);
+    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac, false);
 
     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(2) */
     val = deposit32(0, 15, 1, f16_sign);
@@ -1125,12 +1168,20 @@ static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
 
     f64_frac = ((uint64_t) f32_frac) << 29;
 
-    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac);
+    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac, rpres);
 
-    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(15) */
+    /*
+     * result = sign : result_exp<7:0> : estimate<7:0> : Zeros(15)
+     * or for increased precision
+     * result = sign : result_exp<7:0> : estimate<11:0> : Zeros(11)
+     */
     val = deposit32(0, 31, 1, f32_sign);
     val = deposit32(val, 23, 8, f32_exp);
-    val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
+    if (rpres) {
+        val = deposit32(val, 11, 12, extract64(f64_frac, 52 - 12, 12));
+    } else {
+        val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
+    }
     return make_float32(val);
 }
 
@@ -1174,7 +1225,7 @@ float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
         return float64_zero;
     }
 
-    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac);
+    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac, false);
 
     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(44) */
     val = deposit64(0, 61, 1, f64_sign);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 72/76] target/arm: Enable FEAT_RPRES for -cpu max
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (70 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 71/76] target/arm: Implement increased precision FRSQRTE Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:29   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 73/76] target/i386: Detect flush-to-zero after rounding Peter Maydell
                   ` (5 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Now the emulation is complete, we can enable FEAT_RPRES for the 'max'
CPU type.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/tcg/cpu64.c        | 1 +
 2 files changed, 2 insertions(+)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 63b4cdf5fb1..78c2fd2113c 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -118,6 +118,7 @@ the following architecture extensions:
 - FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
 - FEAT_RME (Realm Management Extension) (NB: support status in QEMU is experimental)
 - FEAT_RNG (Random number generator)
+- FEAT_RPRES (Increased precision of FRECPE and FRSQRTE)
 - FEAT_S2FWB (Stage 2 forced Write-Back)
 - FEAT_SB (Speculation Barrier)
 - FEAT_SEL2 (Secure EL2)
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 0bc68aac177..29ab0ac79da 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -1167,6 +1167,7 @@ void aarch64_max_tcg_initfn(Object *obj)
     cpu->isar.id_aa64isar1 = t;
 
     t = cpu->isar.id_aa64isar2;
+    t = FIELD_DP64(t, ID_AA64ISAR2, RPRES, 1);    /* FEAT_RPRES */
     t = FIELD_DP64(t, ID_AA64ISAR2, MOPS, 1);     /* FEAT_MOPS */
     t = FIELD_DP64(t, ID_AA64ISAR2, BC, 1);       /* FEAT_HBC */
     t = FIELD_DP64(t, ID_AA64ISAR2, WFXT, 2);     /* FEAT_WFxT */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 73/76] target/i386: Detect flush-to-zero after rounding
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (71 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 72/76] target/arm: Enable FEAT_RPRES for -cpu max Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:30   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 74/76] target/i386: Use correct type for get_float_exception_flags() values Peter Maydell
                   ` (4 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

The Intel SDM section 10.2.3.3 on the MXCSR.FTZ bit says that we
flush outputs to zero when we detect underflow, which is after
rounding.  Set the detect_ftz flag accordingly.

This allows us to enable the test in fma.c which checks this
behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/i386/tcg/fpu_helper.c | 8 ++++----
 tests/tcg/x86_64/fma.c       | 5 -----
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index 9bf23fdd0f6..5c233fdf5b4 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -189,13 +189,13 @@ void cpu_init_fp_statuses(CPUX86State *env)
     set_float_default_nan_pattern(0b11000000, &env->mmx_status);
     set_float_default_nan_pattern(0b11000000, &env->sse_status);
     /*
-     * TODO: x86 does flush-to-zero detection after rounding (the SDM
+     * x86 does flush-to-zero detection after rounding (the SDM
      * section 10.2.3.3 on the FTZ bit of MXCSR says that we flush
      * when we detect underflow, which x86 does after rounding).
      */
-    set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status);
-    set_float_detect_ftz(detect_ftz_before_rounding, &env->mmx_status);
-    set_float_detect_ftz(detect_ftz_before_rounding, &env->sse_status);
+    set_float_detect_ftz(detect_ftz_after_rounding, &env->fp_status);
+    set_float_detect_ftz(detect_ftz_after_rounding, &env->mmx_status);
+    set_float_detect_ftz(detect_ftz_after_rounding, &env->sse_status);
 }
 
 static inline uint8_t save_exception_flags(CPUX86State *env)
diff --git a/tests/tcg/x86_64/fma.c b/tests/tcg/x86_64/fma.c
index 09c622ebc00..46f863005ed 100644
--- a/tests/tcg/x86_64/fma.c
+++ b/tests/tcg/x86_64/fma.c
@@ -79,14 +79,9 @@ static testdata tests[] = {
     /*
      * Flushing of denormal outputs to zero should also happen after
      * rounding, so setting FTZ should not affect the result or the flags.
-     * QEMU currently does not emulate this correctly because we do the
-     * flush-to-zero check before rounding, so we incorrectly produce a
-     * zero result and set Underflow as well as Precision.
      */
-#ifdef ENABLE_FAILING_TESTS
     { 0x3fdfffffffffffff, 0x001fffffffffffff, 0x801fffffffffffff, true,
       0x8010000000000000, 0x20 }, /* Enabling FTZ shouldn't change flags */
-#endif
 };
 
 int main(void)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 74/76] target/i386: Use correct type for get_float_exception_flags() values
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (72 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 73/76] target/i386: Detect flush-to-zero after rounding Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:30   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 75/76] target/i386: Wire up MXCSR.DE and FPUS.DE correctly Peter Maydell
                   ` (3 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

The softfloat get_float_exception_flags() function returns 'int', but
in various places in target/i386 we incorrectly store the returned
value into a uint8_t.  This currently has no ill effects because i386
doesn't care about any of the float_flag enum values above 0x40.
However, we want to start using float_flag_input_denormal_used, which
is 0x4000.

Switch to using 'int' so that we can handle all the possible valid
float_flag_* values. This includes changing the return type of
save_exception_flags() and the argument to merge_exception_flags().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/i386/ops_sse.h        | 16 +++----
 target/i386/tcg/fpu_helper.c | 82 ++++++++++++++++++------------------
 2 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index f0aa1894aa2..a2e4d480399 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -842,7 +842,7 @@ int64_t helper_cvttsd2sq(CPUX86State *env, ZMMReg *s)
 
 void glue(helper_rsqrtps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s)
 {
-    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
+    int old_flags = get_float_exception_flags(&env->sse_status);
     int i;
     for (i = 0; i < 2 << SHIFT; i++) {
         d->ZMM_S(i) = float32_div(float32_one,
@@ -855,7 +855,7 @@ void glue(helper_rsqrtps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s)
 #if SHIFT == 1
 void helper_rsqrtss(CPUX86State *env, ZMMReg *d, ZMMReg *v, ZMMReg *s)
 {
-    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
+    int old_flags = get_float_exception_flags(&env->sse_status);
     int i;
     d->ZMM_S(0) = float32_div(float32_one,
                               float32_sqrt(s->ZMM_S(0), &env->sse_status),
@@ -869,7 +869,7 @@ void helper_rsqrtss(CPUX86State *env, ZMMReg *d, ZMMReg *v, ZMMReg *s)
 
 void glue(helper_rcpps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s)
 {
-    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
+    int old_flags = get_float_exception_flags(&env->sse_status);
     int i;
     for (i = 0; i < 2 << SHIFT; i++) {
         d->ZMM_S(i) = float32_div(float32_one, s->ZMM_S(i), &env->sse_status);
@@ -880,7 +880,7 @@ void glue(helper_rcpps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s)
 #if SHIFT == 1
 void helper_rcpss(CPUX86State *env, ZMMReg *d, ZMMReg *v, ZMMReg *s)
 {
-    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
+    int old_flags = get_float_exception_flags(&env->sse_status);
     int i;
     d->ZMM_S(0) = float32_div(float32_one, s->ZMM_S(0), &env->sse_status);
     for (i = 1; i < 2 << SHIFT; i++) {
@@ -1714,7 +1714,7 @@ void glue(helper_phminposuw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
 void glue(helper_roundps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
                                   uint32_t mode)
 {
-    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
+    int old_flags = get_float_exception_flags(&env->sse_status);
     signed char prev_rounding_mode;
     int i;
 
@@ -1738,7 +1738,7 @@ void glue(helper_roundps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
 void glue(helper_roundpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
                                   uint32_t mode)
 {
-    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
+    int old_flags = get_float_exception_flags(&env->sse_status);
     signed char prev_rounding_mode;
     int i;
 
@@ -1763,7 +1763,7 @@ void glue(helper_roundpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
 void glue(helper_roundss, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s,
                                   uint32_t mode)
 {
-    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
+    int old_flags = get_float_exception_flags(&env->sse_status);
     signed char prev_rounding_mode;
     int i;
 
@@ -1788,7 +1788,7 @@ void glue(helper_roundss, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s,
 void glue(helper_roundsd, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s,
                                   uint32_t mode)
 {
-    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
+    int old_flags = get_float_exception_flags(&env->sse_status);
     signed char prev_rounding_mode;
     int i;
 
diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index 5c233fdf5b4..97b46307d56 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -198,16 +198,16 @@ void cpu_init_fp_statuses(CPUX86State *env)
     set_float_detect_ftz(detect_ftz_after_rounding, &env->sse_status);
 }
 
-static inline uint8_t save_exception_flags(CPUX86State *env)
+static inline int save_exception_flags(CPUX86State *env)
 {
-    uint8_t old_flags = get_float_exception_flags(&env->fp_status);
+    int old_flags = get_float_exception_flags(&env->fp_status);
     set_float_exception_flags(0, &env->fp_status);
     return old_flags;
 }
 
-static void merge_exception_flags(CPUX86State *env, uint8_t old_flags)
+static void merge_exception_flags(CPUX86State *env, int old_flags)
 {
-    uint8_t new_flags = get_float_exception_flags(&env->fp_status);
+    int new_flags = get_float_exception_flags(&env->fp_status);
     float_raise(old_flags, &env->fp_status);
     fpu_set_exception(env,
                       ((new_flags & float_flag_invalid ? FPUS_IE : 0) |
@@ -220,7 +220,7 @@ static void merge_exception_flags(CPUX86State *env, uint8_t old_flags)
 
 static inline floatx80 helper_fdiv(CPUX86State *env, floatx80 a, floatx80 b)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     floatx80 ret = floatx80_div(a, b, &env->fp_status);
     merge_exception_flags(env, old_flags);
     return ret;
@@ -240,7 +240,7 @@ static void fpu_raise_exception(CPUX86State *env, uintptr_t retaddr)
 
 void helper_flds_FT0(CPUX86State *env, uint32_t val)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     union {
         float32 f;
         uint32_t i;
@@ -253,7 +253,7 @@ void helper_flds_FT0(CPUX86State *env, uint32_t val)
 
 void helper_fldl_FT0(CPUX86State *env, uint64_t val)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     union {
         float64 f;
         uint64_t i;
@@ -271,7 +271,7 @@ void helper_fildl_FT0(CPUX86State *env, int32_t val)
 
 void helper_flds_ST0(CPUX86State *env, uint32_t val)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     int new_fpstt;
     union {
         float32 f;
@@ -288,7 +288,7 @@ void helper_flds_ST0(CPUX86State *env, uint32_t val)
 
 void helper_fldl_ST0(CPUX86State *env, uint64_t val)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     int new_fpstt;
     union {
         float64 f;
@@ -338,7 +338,7 @@ void helper_fildll_ST0(CPUX86State *env, int64_t val)
 
 uint32_t helper_fsts_ST0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     union {
         float32 f;
         uint32_t i;
@@ -351,7 +351,7 @@ uint32_t helper_fsts_ST0(CPUX86State *env)
 
 uint64_t helper_fstl_ST0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     union {
         float64 f;
         uint64_t i;
@@ -364,7 +364,7 @@ uint64_t helper_fstl_ST0(CPUX86State *env)
 
 int32_t helper_fist_ST0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     int32_t val;
 
     val = floatx80_to_int32(ST0, &env->fp_status);
@@ -378,7 +378,7 @@ int32_t helper_fist_ST0(CPUX86State *env)
 
 int32_t helper_fistl_ST0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     int32_t val;
 
     val = floatx80_to_int32(ST0, &env->fp_status);
@@ -391,7 +391,7 @@ int32_t helper_fistl_ST0(CPUX86State *env)
 
 int64_t helper_fistll_ST0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     int64_t val;
 
     val = floatx80_to_int64(ST0, &env->fp_status);
@@ -404,7 +404,7 @@ int64_t helper_fistll_ST0(CPUX86State *env)
 
 int32_t helper_fistt_ST0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     int32_t val;
 
     val = floatx80_to_int32_round_to_zero(ST0, &env->fp_status);
@@ -418,7 +418,7 @@ int32_t helper_fistt_ST0(CPUX86State *env)
 
 int32_t helper_fisttl_ST0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     int32_t val;
 
     val = floatx80_to_int32_round_to_zero(ST0, &env->fp_status);
@@ -431,7 +431,7 @@ int32_t helper_fisttl_ST0(CPUX86State *env)
 
 int64_t helper_fisttll_ST0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     int64_t val;
 
     val = floatx80_to_int64_round_to_zero(ST0, &env->fp_status);
@@ -527,7 +527,7 @@ static const int fcom_ccval[4] = {0x0100, 0x4000, 0x0000, 0x4500};
 
 void helper_fcom_ST0_FT0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     FloatRelation ret;
 
     ret = floatx80_compare(ST0, FT0, &env->fp_status);
@@ -537,7 +537,7 @@ void helper_fcom_ST0_FT0(CPUX86State *env)
 
 void helper_fucom_ST0_FT0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     FloatRelation ret;
 
     ret = floatx80_compare_quiet(ST0, FT0, &env->fp_status);
@@ -549,7 +549,7 @@ static const int fcomi_ccval[4] = {CC_C, CC_Z, 0, CC_Z | CC_P | CC_C};
 
 void helper_fcomi_ST0_FT0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     int eflags;
     FloatRelation ret;
 
@@ -562,7 +562,7 @@ void helper_fcomi_ST0_FT0(CPUX86State *env)
 
 void helper_fucomi_ST0_FT0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     int eflags;
     FloatRelation ret;
 
@@ -575,28 +575,28 @@ void helper_fucomi_ST0_FT0(CPUX86State *env)
 
 void helper_fadd_ST0_FT0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     ST0 = floatx80_add(ST0, FT0, &env->fp_status);
     merge_exception_flags(env, old_flags);
 }
 
 void helper_fmul_ST0_FT0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     ST0 = floatx80_mul(ST0, FT0, &env->fp_status);
     merge_exception_flags(env, old_flags);
 }
 
 void helper_fsub_ST0_FT0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     ST0 = floatx80_sub(ST0, FT0, &env->fp_status);
     merge_exception_flags(env, old_flags);
 }
 
 void helper_fsubr_ST0_FT0(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     ST0 = floatx80_sub(FT0, ST0, &env->fp_status);
     merge_exception_flags(env, old_flags);
 }
@@ -615,28 +615,28 @@ void helper_fdivr_ST0_FT0(CPUX86State *env)
 
 void helper_fadd_STN_ST0(CPUX86State *env, int st_index)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     ST(st_index) = floatx80_add(ST(st_index), ST0, &env->fp_status);
     merge_exception_flags(env, old_flags);
 }
 
 void helper_fmul_STN_ST0(CPUX86State *env, int st_index)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     ST(st_index) = floatx80_mul(ST(st_index), ST0, &env->fp_status);
     merge_exception_flags(env, old_flags);
 }
 
 void helper_fsub_STN_ST0(CPUX86State *env, int st_index)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     ST(st_index) = floatx80_sub(ST(st_index), ST0, &env->fp_status);
     merge_exception_flags(env, old_flags);
 }
 
 void helper_fsubr_STN_ST0(CPUX86State *env, int st_index)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     ST(st_index) = floatx80_sub(ST0, ST(st_index), &env->fp_status);
     merge_exception_flags(env, old_flags);
 }
@@ -861,7 +861,7 @@ void helper_fbld_ST0(CPUX86State *env, target_ulong ptr)
 
 void helper_fbst_ST0(CPUX86State *env, target_ulong ptr)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     int v;
     target_ulong mem_ref, mem_end;
     int64_t val;
@@ -1136,7 +1136,7 @@ static const struct f2xm1_data f2xm1_table[65] = {
 
 void helper_f2xm1(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     uint64_t sig = extractFloatx80Frac(ST0);
     int32_t exp = extractFloatx80Exp(ST0);
     bool sign = extractFloatx80Sign(ST0);
@@ -1369,7 +1369,7 @@ static const struct fpatan_data fpatan_table[9] = {
 
 void helper_fpatan(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     uint64_t arg0_sig = extractFloatx80Frac(ST0);
     int32_t arg0_exp = extractFloatx80Exp(ST0);
     bool arg0_sign = extractFloatx80Sign(ST0);
@@ -1806,7 +1806,7 @@ void helper_fpatan(CPUX86State *env)
 
 void helper_fxtract(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     CPU_LDoubleU temp;
 
     temp.d = ST0;
@@ -1855,7 +1855,7 @@ void helper_fxtract(CPUX86State *env)
 
 static void helper_fprem_common(CPUX86State *env, bool mod)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     uint64_t quotient;
     CPU_LDoubleU temp0, temp1;
     int exp0, exp1, expdiff;
@@ -2050,7 +2050,7 @@ static void helper_fyl2x_common(CPUX86State *env, floatx80 arg, int32_t *exp,
 
 void helper_fyl2xp1(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     uint64_t arg0_sig = extractFloatx80Frac(ST0);
     int32_t arg0_exp = extractFloatx80Exp(ST0);
     bool arg0_sign = extractFloatx80Sign(ST0);
@@ -2148,7 +2148,7 @@ void helper_fyl2xp1(CPUX86State *env)
 
 void helper_fyl2x(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     uint64_t arg0_sig = extractFloatx80Frac(ST0);
     int32_t arg0_exp = extractFloatx80Exp(ST0);
     bool arg0_sign = extractFloatx80Sign(ST0);
@@ -2295,7 +2295,7 @@ void helper_fyl2x(CPUX86State *env)
 
 void helper_fsqrt(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     if (floatx80_is_neg(ST0)) {
         env->fpus &= ~0x4700;  /* (C3,C2,C1,C0) <-- 0000 */
         env->fpus |= 0x400;
@@ -2321,14 +2321,14 @@ void helper_fsincos(CPUX86State *env)
 
 void helper_frndint(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     ST0 = floatx80_round_to_int(ST0, &env->fp_status);
     merge_exception_flags(env, old_flags);
 }
 
 void helper_fscale(CPUX86State *env)
 {
-    uint8_t old_flags = save_exception_flags(env);
+    int old_flags = save_exception_flags(env);
     if (floatx80_invalid_encoding(ST1) || floatx80_invalid_encoding(ST0)) {
         float_raise(float_flag_invalid, &env->fp_status);
         ST0 = floatx80_default_nan(&env->fp_status);
@@ -2366,7 +2366,7 @@ void helper_fscale(CPUX86State *env)
     } else {
         int n;
         FloatX80RoundPrec save = env->fp_status.floatx80_rounding_precision;
-        uint8_t save_flags = get_float_exception_flags(&env->fp_status);
+        int save_flags = get_float_exception_flags(&env->fp_status);
         set_float_exception_flags(0, &env->fp_status);
         n = floatx80_to_int32_round_to_zero(ST1, &env->fp_status);
         set_float_exception_flags(save_flags, &env->fp_status);
@@ -3266,7 +3266,7 @@ void update_mxcsr_status(CPUX86State *env)
 
 void update_mxcsr_from_sse_status(CPUX86State *env)
 {
-    uint8_t flags = get_float_exception_flags(&env->sse_status);
+    int flags = get_float_exception_flags(&env->sse_status);
     /*
      * The MXCSR denormal flag has opposite semantics to
      * float_flag_input_denormal_flushed (the softfloat code sets that flag
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 75/76] target/i386: Wire up MXCSR.DE and FPUS.DE correctly
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (73 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 74/76] target/i386: Use correct type for get_float_exception_flags() values Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:31   ` Richard Henderson
  2025-01-24 16:28 ` [PATCH 76/76] tests/tcg/x86_64/fma: add test for exact-denormal output Peter Maydell
                   ` (2 subsequent siblings)
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

The x86 DE bit in the FPU and MXCSR status is supposed to be set
when an input denormal is consumed. We didn't previously report
this from softfloat, so the x86 code either simply didn't set
the DE bit or else incorrectly wired it up to denormal_flushed,
depending on which register you looked at.

Now we have input_denormal_used we can wire up these DE bits
with the semantics they are supposed to have.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/i386/tcg/fpu_helper.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index 97b46307d56..fd1cd823e9e 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -215,7 +215,7 @@ static void merge_exception_flags(CPUX86State *env, int old_flags)
                        (new_flags & float_flag_overflow ? FPUS_OE : 0) |
                        (new_flags & float_flag_underflow ? FPUS_UE : 0) |
                        (new_flags & float_flag_inexact ? FPUS_PE : 0) |
-                       (new_flags & float_flag_input_denormal_flushed ? FPUS_DE : 0)));
+                       (new_flags & float_flag_input_denormal_used ? FPUS_DE : 0)));
 }
 
 static inline floatx80 helper_fdiv(CPUX86State *env, floatx80 a, floatx80 b)
@@ -3251,6 +3251,7 @@ void update_mxcsr_status(CPUX86State *env)
 
     /* Set exception flags.  */
     set_float_exception_flags((mxcsr & FPUS_IE ? float_flag_invalid : 0) |
+                              (mxcsr & FPUS_DE ? float_flag_input_denormal_used : 0) |
                               (mxcsr & FPUS_ZE ? float_flag_divbyzero : 0) |
                               (mxcsr & FPUS_OE ? float_flag_overflow : 0) |
                               (mxcsr & FPUS_UE ? float_flag_underflow : 0) |
@@ -3267,14 +3268,8 @@ void update_mxcsr_status(CPUX86State *env)
 void update_mxcsr_from_sse_status(CPUX86State *env)
 {
     int flags = get_float_exception_flags(&env->sse_status);
-    /*
-     * The MXCSR denormal flag has opposite semantics to
-     * float_flag_input_denormal_flushed (the softfloat code sets that flag
-     * only when flushing input denormals to zero, but SSE sets it
-     * only when not flushing them to zero), so is not converted
-     * here.
-     */
     env->mxcsr |= ((flags & float_flag_invalid ? FPUS_IE : 0) |
+                   (flags & float_flag_input_denormal_used ? FPUS_DE : 0) |
                    (flags & float_flag_divbyzero ? FPUS_ZE : 0) |
                    (flags & float_flag_overflow ? FPUS_OE : 0) |
                    (flags & float_flag_underflow ? FPUS_UE : 0) |
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [PATCH 76/76] tests/tcg/x86_64/fma: add test for exact-denormal output
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (74 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 75/76] target/i386: Wire up MXCSR.DE and FPUS.DE correctly Peter Maydell
@ 2025-01-24 16:28 ` Peter Maydell
  2025-01-26 13:32   ` Richard Henderson
  2025-01-24 16:35 ` [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
  2025-01-28 13:23 ` Peter Maydell
  77 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:28 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

Add some fma test cases that check for correct handling of FTZ and
for the flag that indicates that the input denormal was consumed.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/tcg/x86_64/fma.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tests/tcg/x86_64/fma.c b/tests/tcg/x86_64/fma.c
index 46f863005ed..34219614c0a 100644
--- a/tests/tcg/x86_64/fma.c
+++ b/tests/tcg/x86_64/fma.c
@@ -82,6 +82,18 @@ static testdata tests[] = {
      */
     { 0x3fdfffffffffffff, 0x001fffffffffffff, 0x801fffffffffffff, true,
       0x8010000000000000, 0x20 }, /* Enabling FTZ shouldn't change flags */
+    /*
+     * normal * 0 + a denormal. With FTZ disabled this gives an exact
+     * result (equal to the input denormal) that has consumed the denormal.
+     */
+    { 0x3cc8000000000000, 0x0000000000000000, 0x8008000000000000, false,
+      0x8008000000000000, 0x2 }, /* Denormal */
+    /*
+     * With FTZ enabled, this consumes the denormal, returns zero (because
+     * flushed) and indicates also Underflow and Precision.
+     */
+    { 0x3cc8000000000000, 0x0000000000000000, 0x8008000000000000, true,
+      0x8000000000000000, 0x32 }, /* Precision, Underflow, Denormal */
 };
 
 int main(void)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 167+ messages in thread

* Re: [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (75 preceding siblings ...)
  2025-01-24 16:28 ` [PATCH 76/76] tests/tcg/x86_64/fma: add test for exact-denormal output Peter Maydell
@ 2025-01-24 16:35 ` Peter Maydell
  2025-01-28 13:23 ` Peter Maydell
  77 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2025-01-24 16:35 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

On Fri, 24 Jan 2025 at 16:28, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> This patchset implements emulation of the Arm FEAT_AFP and FEAT_RPRES
> extensions, which are floating-point related. It's based on the
> small i386 bugfix series I sent out a while back:
>
> Based-on: 20250116112536.4117889-1-peter.maydell@linaro.org
> ("target/i386: Fix 0 * Inf + QNaN regression")


> I have also some patchs which make target/i386 use the "detect
> flush to zero after rounding" and "report when input denormal is
> consumed" softfloat features added here; I don't include them in
> this patchset (though you can find them in that git branch I
> mentioned earlier) becaus I haven't done as much testing on the
> i386 side and in any case this patchset is already pretty long.
> I expect I'll send them out when this series has been merged.

...having said which, I was so eager to get the series out
once I'd finished the last test run that I forgot that I
didn't intend to send out the first two or the last four
patches in this series; whoops. Feel free to ignore them.

(The patch numbering in the explanation of the series structure
in the cover letter is all going to be off by 2 as a result,
as well. This doesn't seem worth resending a monster patchset
just to fix, though.)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 02/76] tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases
  2025-01-24 16:27 ` [PATCH 02/76] tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases Peter Maydell
@ 2025-01-24 17:15   ` Alex Bennée
  2025-01-27  9:54     ` Peter Maydell
  0 siblings, 1 reply; 167+ messages in thread
From: Alex Bennée @ 2025-01-24 17:15 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, qemu-devel

Peter Maydell <peter.maydell@linaro.org> writes:

> Add a test case which tests some corner case behaviour of
> fused-multiply-add on x86:
>  * 0 * Inf + SNaN should raise Invalid
>  * 0 * Inf + QNaN shouldh not raise Invalid
>  * tininess should be detected after rounding
>
> There is also one currently-disabled test case:
>  * flush-to-zero should be done after rounding
>
> This is disabled because QEMU's emulation currently does this
> incorrectly (and so would fail the test).  The test case is kept in
> but disabled, as the justification for why the test running harness
> has support for testing both with and without FTZ set.
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  tests/tcg/x86_64/fma.c           | 109 +++++++++++++++++++++++++++++++
>  tests/tcg/x86_64/Makefile.target |   1 +
>  2 files changed, 110 insertions(+)
>  create mode 100644 tests/tcg/x86_64/fma.c
>
> diff --git a/tests/tcg/x86_64/fma.c b/tests/tcg/x86_64/fma.c
> new file mode 100644
> index 00000000000..09c622ebc00
> --- /dev/null
> +++ b/tests/tcg/x86_64/fma.c
> @@ -0,0 +1,109 @@
> +/*
> + * Test some fused multiply add corner cases.
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <inttypes.h>
> +
> +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> +
> +/*
> + * Perform one "n * m + a" operation using the vfmadd insn and return
> + * the result; on return *mxcsr_p is set to the bottom 6 bits of MXCSR
> + * (the Flag bits). If ftz is true then we set MXCSR.FTZ while doing
> + * the operation.
> + * We print the operation and its results to stdout.
> + */
> +static uint64_t do_fmadd(uint64_t n, uint64_t m, uint64_t a,
> +                         bool ftz, uint32_t *mxcsr_p)
> +{
> +    uint64_t r;
> +    uint32_t mxcsr = 0;
> +    uint32_t ftz_bit = ftz ? (1 << 15) : 0;
> +    uint32_t saved_mxcsr = 0;
> +
> +    asm volatile("stmxcsr %[saved_mxcsr]\n"
> +                 "stmxcsr %[mxcsr]\n"
> +                 "andl $0xffff7fc0, %[mxcsr]\n"
> +                 "orl %[ftz_bit], %[mxcsr]\n"
> +                 "ldmxcsr %[mxcsr]\n"
> +                 "movq %[a], %%xmm0\n"
> +                 "movq %[m], %%xmm1\n"
> +                 "movq %[n], %%xmm2\n"
> +                 /* xmm0 = xmm0 + xmm2 * xmm1 */
> +                 "vfmadd231sd %%xmm1, %%xmm2, %%xmm0\n"
> +                 "movq %%xmm0, %[r]\n"
> +                 "stmxcsr %[mxcsr]\n"
> +                 "ldmxcsr %[saved_mxcsr]\n"
> +                 : [r] "=r" (r), [mxcsr] "=m" (mxcsr),
> +                   [saved_mxcsr] "=m" (saved_mxcsr)
> +                 : [n] "r" (n), [m] "r" (m), [a] "r" (a),
> +                   [ftz_bit] "r" (ftz_bit)
> +                 : "xmm0", "xmm1", "xmm2");
> +    *mxcsr_p = mxcsr & 0x3f;
> +    printf("vfmadd132sd 0x%" PRIx64 " 0x%" PRIx64 " 0x%" PRIx64
> +           " = 0x%" PRIx64 " MXCSR flags 0x%" PRIx32 "\n",
> +           n, m, a, r, *mxcsr_p);
> +    return r;
> +}
> +
> +typedef struct testdata {
> +    /* Input n, m, a */
> +    uint64_t n;
> +    uint64_t m;
> +    uint64_t a;
> +    bool ftz;
> +    /* Expected result */
> +    uint64_t expected_r;
> +    /* Expected low 6 bits of MXCSR (the Flag bits) */
> +    uint32_t expected_mxcsr;
> +} testdata;
> +
> +static testdata tests[] = {
> +    { 0, 0x7ff0000000000000, 0x7ff000000000aaaa, false, /* 0 * Inf + SNaN */
> +      0x7ff800000000aaaa, 1 }, /* Should be QNaN and does raise Invalid */
> +    { 0, 0x7ff0000000000000, 0x7ff800000000aaaa, false, /* 0 * Inf + QNaN */
> +      0x7ff800000000aaaa, 0 }, /* Should be QNaN and does *not* raise Invalid */
> +    /*
> +     * These inputs give a result which is tiny before rounding but which
> +     * becomes non-tiny after rounding. x86 is a "detect tininess after
> +     * rounding" architecture, so it should give a non-denormal result and
> +     * not set the Underflow flag (only the Precision flag for an inexact
> +     * result).
> +     */
> +    { 0x3fdfffffffffffff, 0x001fffffffffffff, 0x801fffffffffffff, false,
> +      0x8010000000000000, 0x20 },
> +    /*
> +     * Flushing of denormal outputs to zero should also happen after
> +     * rounding, so setting FTZ should not affect the result or the flags.
> +     * QEMU currently does not emulate this correctly because we do the
> +     * flush-to-zero check before rounding, so we incorrectly produce a
> +     * zero result and set Underflow as well as Precision.
> +     */
> +#ifdef ENABLE_FAILING_TESTS
> +    { 0x3fdfffffffffffff, 0x001fffffffffffff, 0x801fffffffffffff, true,
> +      0x8010000000000000, 0x20 }, /* Enabling FTZ shouldn't change flags */
> +#endif

We could extend the multiarch/float_madds test to handle doubles as well
(or create a new multiarch test).

If the right FMA instructions can be forced via cflags you could then
add a specific binary to exercise the vfmadd231sd instruction like we do
for neon:

  float_madds: CFLAGS+=-mfpu=neon-vfpv4

The test does basic testing but ideally you would add a reference output
to check against.

> +};
> +
> +int main(void)
> +{
> +    bool passed = true;
> +    for (int i = 0; i < ARRAY_SIZE(tests); i++) {
> +        uint32_t mxcsr;
> +        uint64_t r = do_fmadd(tests[i].n, tests[i].m, tests[i].a,
> +                              tests[i].ftz, &mxcsr);
> +        if (r != tests[i].expected_r) {
> +            printf("expected result 0x%" PRIx64 "\n", tests[i].expected_r);
> +            passed = false;
> +        }
> +        if (mxcsr != tests[i].expected_mxcsr) {
> +            printf("expected MXCSR flags 0x%x\n", tests[i].expected_mxcsr);
> +            passed = false;
> +        }
> +    }
> +    return passed ? 0 : 1;
> +}
> diff --git a/tests/tcg/x86_64/Makefile.target b/tests/tcg/x86_64/Makefile.target
> index d6dff559c7d..be20fc64e88 100644
> --- a/tests/tcg/x86_64/Makefile.target
> +++ b/tests/tcg/x86_64/Makefile.target
> @@ -18,6 +18,7 @@ X86_64_TESTS += adox
>  X86_64_TESTS += test-1648
>  X86_64_TESTS += test-2175
>  X86_64_TESTS += cross-modifying-code
> +X86_64_TESTS += fma
>  TESTS=$(MULTIARCH_TESTS) $(X86_64_TESTS) test-x86_64
>  else
>  TESTS=$(MULTIARCH_TESTS)

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 03/76] target/arm: arm_reset_sve_state() should set FPSR,  not FPCR
  2025-01-24 16:27 ` [PATCH 03/76] target/arm: arm_reset_sve_state() should set FPSR, not FPCR Peter Maydell
@ 2025-01-25 15:07   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:07 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> The pseudocode ResetSVEState() does:
>      FPSR = ZeroExtend(0x0800009f<31:0>, 64);
> but QEMU's arm_reset_sve_state() called vfp_set_fpcr() by accident.
> 
> Before the advent of FEAT_AFP, this was only setting a collection of
> RES0 bits, which vfp_set_fpsr() would then ignore, so the only effect
> was that we didn't actually set the FPSR the way we are supposed to
> do.  Once FEAT_AFP is implemented, setting the bottom bits of FPSR
> will change the floating point behaviour.
> 
> Call vfp_set_fpsr(), as we ought to.
> 
> (Note for stable backports: commit 7f2a01e7368f9 moved this function
> from sme_helper.c to helper.c, but it had the same bug before the
> move too.)
> 
> Cc: qemu-stable@nongnu.org
> Fixes: f84734b87461 ("target/arm: Implement SMSTART, SMSTOP")
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/helper.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 63997678513..40bdfc851a5 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -6413,7 +6413,7 @@ static void arm_reset_sve_state(CPUARMState *env)
>       memset(env->vfp.zregs, 0, sizeof(env->vfp.zregs));
>       /* Recall that FFR is stored as pregs[16]. */
>       memset(env->vfp.pregs, 0, sizeof(env->vfp.pregs));
> -    vfp_set_fpcr(env, 0x0800009f);
> +    vfp_set_fpsr(env, 0x0800009f);
>   }
>   
>   void aarch64_set_svcr(CPUARMState *env, uint64_t new, uint64_t mask)

Oops.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host()
  2025-01-24 16:27 ` [PATCH 04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host() Peter Maydell
@ 2025-01-25 15:07   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:07 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> Use the FPSR_ named constants in vfp_exceptbits_from_host(),
> rather than hardcoded magic numbers.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/vfp_helper.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 05/76] target/arm: Use uint32_t in vfp_exceptbits_from_host()
  2025-01-24 16:27 ` [PATCH 05/76] target/arm: Use uint32_t " Peter Maydell
@ 2025-01-25 15:08   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:08 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> In vfp_exceptbits_from_host(), we accumulate the FPSR flags in
> an "int", and our return type is also "int". However, the only
> callsite returns the same information as a uint32_t, and
> more generally we handle FPSR values in the code as uint32_t,
> not int. Bring this function in to line with that convention.
> 
> There is no behaviour change because none of the FPSR bits
> we set in this function are bit 31. The input argument to
> the function remains 'int' because that is the return type
> of the softfloat get_float_exception_flags().
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/vfp_helper.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
> index fcc9e5d382e..afc41420eb1 100644
> --- a/target/arm/vfp_helper.c
> +++ b/target/arm/vfp_helper.c
> @@ -34,9 +34,9 @@
>   #ifdef CONFIG_TCG
>   
>   /* Convert host exception flags to vfp form.  */
> -static inline int vfp_exceptbits_from_host(int host_bits)
> +static inline uint32_t vfp_exceptbits_from_host(int host_bits)
>   {
> -    int target_bits = 0;
> +    uint32_t target_bits = 0;
>   
>       if (host_bits & float_flag_invalid) {
>           target_bits |= FPSR_IOC;

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 06/76] target/arm: Define new fp_status_a32 and fp_status_a64
  2025-01-24 16:27 ` [PATCH 06/76] target/arm: Define new fp_status_a32 and fp_status_a64 Peter Maydell
@ 2025-01-25 15:12   ` Richard Henderson
  2025-01-27  4:59   ` Richard Henderson
  1 sibling, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:12 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> We want to split the existing fp_status in the Arm CPUState into
> separate float_status fields for AArch32 and AArch64.  (This is
> because new control bits defined by FEAT_AFP only have an effect for
> AArch64, not AArch32.) To make this split we will:
>   * define new fp_status_a32 and fp_status_a64 which have
>     identical behaviour to the existing fp_status
>   * move existing uses of fp_status to fp_status_a32 or
>     fp_status_a64 as appropriate
>   * delete the old fp_status when it has no uses left
> 
> In this patch we add the new float_status fields.
> 
> We will also need to split fp_status_f16, but we will do that
> as a separate series of patches.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/cpu.h           |  4 ++++
>   target/arm/tcg/translate.h | 12 ++++++++++++
>   target/arm/cpu.c           |  2 ++
>   target/arm/vfp_helper.c    | 12 ++++++++++++
>   4 files changed, 30 insertions(+)

Hmm.  My mistake for being slow about posting SME2 patches.
I converted fp_status* to an array indexed by ARMFPStatusFlavour.

> @@ -702,6 +708,12 @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
>       case FPST_FPCR:
>           offset = offsetof(CPUARMState, vfp.fp_status);
>           break;
> +    case FPST_FPCR_A32:
> +        offset = offsetof(CPUARMState, vfp.fp_status_a32);
> +        break;
> +    case FPST_FPCR_A64:
> +        offset = offsetof(CPUARMState, vfp.fp_status_a64);
> +        break;
>       case FPST_FPCR_F16:
>           offset = offsetof(CPUARMState, vfp.fp_status_f16);
>           break;

... which eliminated this.

Anyway,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 07/76] target/arm: Use vfp.fp_status_a64 in A64-only helper functions
  2025-01-24 16:27 ` [PATCH 07/76] target/arm: Use vfp.fp_status_a64 in A64-only helper functions Peter Maydell
@ 2025-01-25 15:15   ` Richard Henderson
  2025-01-28 12:35     ` Peter Maydell
  0 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:15 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> @@ -2808,7 +2808,7 @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
>        */
>       bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
>   
> -    *statusp = env->vfp.fp_status;
> +    *statusp = env->vfp.fp_status_a64;
>       set_default_nan_mode(true, statusp);
>   
>       if (ebf) {

Is this really correct?  !ebf includes aa32.


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 08/76] target/arm: Use fp_status_a32 in vjvct helper
  2025-01-24 16:27 ` [PATCH 08/76] target/arm: Use fp_status_a32 in vjvct helper Peter Maydell
@ 2025-01-25 15:16   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:16 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> Use fp_status_a32 in the vjcvt helper function; this is called only
> from the A32/T32 decoder and is not used inside a
> set_rmode/restore_rmode sequence.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/vfp_helper.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
> index 7475f97e0ce..0671ba3a88b 100644
> --- a/target/arm/vfp_helper.c
> +++ b/target/arm/vfp_helper.c
> @@ -1144,7 +1144,7 @@ uint64_t HELPER(fjcvtzs)(float64 value, float_status *status)
>   
>   uint32_t HELPER(vjcvt)(float64 value, CPUARMState *env)
>   {
> -    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status);
> +    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status_a32);
>       uint32_t result = pair;
>       uint32_t z = (pair >> 32) == 0;
>   

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers
  2025-01-24 16:27 ` [PATCH 09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers Peter Maydell
@ 2025-01-25 15:18   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:18 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> The helpers vfp_cmps, vfp_cmpes, vfp_cmpd, vfp_cmped are used only from
> the A32 decoder; the A64 decoder uses separate vfp_cmps_a64 etc helpers
> (because for A64 we update the main NZCV flags and for A32 we update
> the FPSCR NZCV flags). So we can make these helpers use the fp_status_a32
> field instead of fp_status.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> We could in theory make A32 use the a64 helpers and do the setting
> of vfp.fpsr NZCV in the generated code from the helper return value,
> but it doesn't seem worthwhile to me.
> ---
>   target/arm/vfp_helper.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
> index 0671ba3a88b..034f26e5daa 100644
> --- a/target/arm/vfp_helper.c
> +++ b/target/arm/vfp_helper.c
> @@ -373,8 +373,8 @@ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
>           FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
>   }
>   DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status_f16)
> -DO_VFP_cmp(s, float32, float32, fp_status)
> -DO_VFP_cmp(d, float64, float64, fp_status)
> +DO_VFP_cmp(s, float32, float32, fp_status_a32)
> +DO_VFP_cmp(d, float64, float64, fp_status_a32)
>   #undef DO_VFP_cmp
>   
>   /* Integer to float and float to integer conversions */

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 10/76] target/arm: Use FPST_FPCR_A32 in A32 decoder
  2025-01-24 16:27 ` [PATCH 10/76] target/arm: Use FPST_FPCR_A32 in A32 decoder Peter Maydell
@ 2025-01-25 15:18   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:18 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> In the A32 decoder, use FPST_FPCR_A32 rather than FPST_FPCR.  By
> doing an automated conversion of the whole file we avoid possibly
> using more than one fpst value in a set_rmode/op/restore_rmode
> sequence.
> 
> Patch created with
>    perl -p -i -e 's/FPST_FPCR(?!_)/FPST_FPCR_A32/g' target/arm/tcg/translate-vfp.c
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-vfp.c | 54 +++++++++++++++++-----------------
>   1 file changed, 27 insertions(+), 27 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 11/76] target/arm: Use FPST_FPCR_A64 in A64 decoder
  2025-01-24 16:27 ` [PATCH 11/76] target/arm: Use FPST_FPCR_A64 in A64 decoder Peter Maydell
@ 2025-01-25 15:19   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:19 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> In the A64 decoder, use FPST_FPCR_A32 rather than FPST_FPCR.  By
> doing an automated conversion of the whole file we avoid possibly
> using more than one fpst value in a set_rmode/op/restore_rmode
> sequence.
> 
> Patch created with
> 
>    perl -p -i -e 's/FPST_FPCR(?!_)/FPST_FPCR_A64/g' target/arm/tcg/translate-{a64,sve,sme}.c
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c |  70 +++++++++++-----------
>   target/arm/tcg/translate-sme.c |   4 +-
>   target/arm/tcg/translate-sve.c | 106 ++++++++++++++++-----------------
>   3 files changed, 90 insertions(+), 90 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 12/76] target/arm: Remove now-unused vfp.fp_status and FPST_FPCR
  2025-01-24 16:27 ` [PATCH 12/76] target/arm: Remove now-unused vfp.fp_status and FPST_FPCR Peter Maydell
@ 2025-01-25 15:20   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:20 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> Now we have moved all the uses of vfp.fp_status and FPST_FPCR
> to either the A32 or A64 fields, we can remove these.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/cpu.h           | 2 --
>   target/arm/tcg/translate.h | 6 ------
>   target/arm/cpu.c           | 1 -
>   target/arm/vfp_helper.c    | 8 +-------
>   4 files changed, 1 insertion(+), 16 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64
  2025-01-24 16:27 ` [PATCH 13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64 Peter Maydell
@ 2025-01-25 15:21   ` Richard Henderson
  2025-01-27  5:00   ` Richard Henderson
  1 sibling, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:21 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> As the first part of splitting the existing fp_status_f16
> into separate float_status fields for AArch32 and AArch64
> (so that we can make FEAT_AFP control bits apply only
> for AArch64), define the two new fp_status_f16_a32 and
> fp_status_f16_a64 fields, but don't use them yet.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/cpu.h           |  4 ++++
>   target/arm/tcg/translate.h | 12 ++++++++++++
>   target/arm/cpu.c           |  2 ++
>   target/arm/vfp_helper.c    | 14 ++++++++++++++
>   4 files changed, 32 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 14/76] target/arm: Use fp_status_f16_a32 in AArch32-only helpers
  2025-01-24 16:27 ` [PATCH 14/76] target/arm: Use fp_status_f16_a32 in AArch32-only helpers Peter Maydell
@ 2025-01-25 15:21   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:21 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> We directly use fp_status_f16 in a handful of helpers that
> are AArch32-specific; switch to fp_status_f16_a32 for these.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/tcg/vec_helper.c | 4 ++--
>   target/arm/vfp_helper.c     | 2 +-
>   2 files changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 15/76] target/arm: Use fp_status_f16_a64 in AArch64-only helpers
  2025-01-24 16:27 ` [PATCH 15/76] target/arm: Use fp_status_f16_a64 in AArch64-only helpers Peter Maydell
@ 2025-01-25 15:22   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:22 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> We directly use fp_status_f16 in a handful of helpers that are
> AArch64-specific; switch to fp_status_f16_a64 for these.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/sme_helper.c | 4 ++--
>   target/arm/tcg/vec_helper.c | 8 ++++----
>   2 files changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 16/76] target/arm: Use FPST_FPCR_F16_A32 in A32 decoder
  2025-01-24 16:27 ` [PATCH 16/76] target/arm: Use FPST_FPCR_F16_A32 in A32 decoder Peter Maydell
@ 2025-01-25 15:23   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:23 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> In the A32 decoder, use FPST_FPCR_F16_A32 rather than FPST_FPCR_F16.
> By doing an automated conversion of the whole file we avoid possibly
> using more than one fpst value in a set_rmode/op/restore_rmode
> sequence.
> 
> Patch created with
>    perl -p -i -e 's/FPST_FPCR_F16(?!_)/FPST_FPCR_F16_A32/g' target/arm/tcg/translate-vfp.c
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-vfp.c | 24 ++++++++++++------------
>   1 file changed, 12 insertions(+), 12 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 17/76] target/arm: Use FPST_FPCR_F16_A64 in A64 decoder
  2025-01-24 16:27 ` [PATCH 17/76] target/arm: Use FPST_FPCR_F16_A64 in A64 decoder Peter Maydell
@ 2025-01-25 15:23   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:23 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> In the A32 decoder, use FPST_FPCR_F16_A32 rather than FPST_FPCR_F16.
> By doing an automated conversion of the whole file we avoid possibly
> using more than one fpst value in a set_rmode/op/restore_rmode
> sequence.
> 
> Patch created with
>    perl -p -i -e 's/FPST_FPCR_F16(?!_)/FPST_FPCR_F16_A64/g' target/arm/tcg/translate-{a64,sve,sme}.c
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 32 ++++++++---------
>   target/arm/tcg/translate-sve.c | 66 +++++++++++++++++-----------------
>   2 files changed, 49 insertions(+), 49 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 18/76] target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16
  2025-01-24 16:27 ` [PATCH 18/76] target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16 Peter Maydell
@ 2025-01-25 15:23   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:23 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> Now we have moved all the uses of vfp.fp_status_f16 and FPST_FPCR_F16
> to the new A32 or A64 fields, we can remove these.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/cpu.h           | 2 --
>   target/arm/tcg/translate.h | 6 ------
>   target/arm/cpu.c           | 1 -
>   target/arm/vfp_helper.c    | 7 -------
>   4 files changed, 16 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 19/76] fpu: Rename float_flag_input_denormal to float_flag_input_denormal_flushed
  2025-01-24 16:27 ` [PATCH 19/76] fpu: Rename float_flag_input_denormal to float_flag_input_denormal_flushed Peter Maydell
@ 2025-01-25 15:25   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:25 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> Our float_flag_input_denormal exception flag is set when the fpu code
> flushes an input denormal to zero.  This is what many guest
> architectures (eg classic Arm behaviour) require, but it is not the
> only donarmal-related reason we might want to set an exception flag.
> The x86 behaviour (which we do not currently model correctly) wants
> to see an exception flag when a denormal input is*not* flushed to
> zero and is actually used in an arithmetic operation. Arm's FEAT_AFP
> also wants these semantics.
> 
> Rename float_flag_input_denormal to float_flag_input_denormal_flushed
> to make it clearer when it is set and to allow us to add a new
> float_flag_input_denormal_used next to it for the x86/FEAT_AFP
> semantics.
> 
> Commit created with
>   for f in `git grep -l float_flag_input_denormal`; do sed -i -e 's/float_flag_input_denormal/float_flag_input_denormal_flushed/' $f; done
> 
> and manual editing of softfloat-types.h and softfloat.c to clean
> up the indentation afterwards and to fix a comment which wasn't
> using the full name of the flag.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   include/fpu/softfloat-types.h |  5 +++--
>   fpu/softfloat.c               |  4 ++--
>   target/arm/tcg/sve_helper.c   |  6 +++---
>   target/arm/vfp_helper.c       | 10 +++++-----
>   target/i386/tcg/fpu_helper.c  |  6 +++---
>   target/mips/tcg/msa_helper.c  |  2 +-
>   target/rx/op_helper.c         |  2 +-
>   fpu/softfloat-parts.c.inc     |  2 +-
>   8 files changed, 19 insertions(+), 18 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 20/76] fpu: Rename float_flag_output_denormal to float_flag_output_denormal_flushed
  2025-01-24 16:27 ` [PATCH 20/76] fpu: Rename float_flag_output_denormal to float_flag_output_denormal_flushed Peter Maydell
@ 2025-01-25 15:26   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:26 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> Our float_flag_output_denormal exception flag is set when
> the fpu code flushes an output denormal to zero. Rename
> it to float_flag_output_denormal_flushed:
>   * this keeps it parallel with the flag for flushing
>     input denormals, which we just renamed
>   * it makes it clearer that it doesn't mean "set when
>     the output is a denormal"
> 
> Commit created with
>   for f in `git grep -l float_flag_output_denormal`; do sed -i -e 's/float_flag_output_denormal/float_flag_output_denormal_flushed/' $f; done
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   include/fpu/softfloat-types.h | 3 ++-
>   fpu/softfloat.c               | 2 +-
>   target/arm/vfp_helper.c       | 2 +-
>   target/i386/tcg/fpu_helper.c  | 2 +-
>   target/m68k/fpu_helper.c      | 2 +-
>   target/mips/tcg/msa_helper.c  | 2 +-
>   target/rx/op_helper.c         | 2 +-
>   target/tricore/fpu_helper.c   | 6 +++---
>   fpu/softfloat-parts.c.inc     | 2 +-
>   9 files changed, 12 insertions(+), 11 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 21/76] fpu: Fix a comment in softfloat-types.h
  2025-01-24 16:27 ` [PATCH 21/76] fpu: Fix a comment in softfloat-types.h Peter Maydell
@ 2025-01-25 15:27   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:27 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> In softfloat-types.h a comment documents that if the float_status
> field flush_to_zero is set then we flush denormalised results to 0
> and set the inexact flag.  This isn't correct: the status flag that
> we set when flush_to_zero causes us to flush an output to zero is
> float_flag_output_denormal_flushed.
> 
> Correct the comment.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   include/fpu/softfloat-types.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 22/76] fpu: Add float_class_denormal
  2025-01-24 16:27 ` [PATCH 22/76] fpu: Add float_class_denormal Peter Maydell
@ 2025-01-25 15:31   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:31 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> Currently in softfloat we canonicalize input denormals and so the
> code that implements floating point operations does not need to care
> whether the input value was originally normal or denormal.  However,
> both x86 and Arm FEAT_AFP require that an exception flag is set if:
>   * an input is denormal
>   * that input is not squashed to zero
>   * that input is actually used in the calculation (e.g. we
>     did not find the other input was a NaN)
> 
> So we need to track that the input was a non-squashed denormal.  To
> do this we add a new value to the FloatClass enum.  In this commit we
> add the value and adjust the code everywhere that looks at FloatClass
> values so that the new float_class_denormal behaves identically to
> float_class_normal.  We will add the code that does the "raise a new
> float exception flag if an input was an unsquashed denormal and we
> used it" in a subsequent commit.
> 
> There should be no behavioural change in this commit.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   fpu/softfloat.c           | 32 ++++++++++++++++++++++++++++---
>   fpu/softfloat-parts.c.inc | 40 ++++++++++++++++++++++++---------------
>   2 files changed, 54 insertions(+), 18 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 23/76] fpu: Implement float_flag_input_denormal_used
  2025-01-24 16:27 ` [PATCH 23/76] fpu: Implement float_flag_input_denormal_used Peter Maydell
@ 2025-01-25 15:42   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 15:42 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> @@ -4411,6 +4431,11 @@ float32_hs_compare(float32 xa, float32 xb, float_status *s, bool is_quiet)
>           goto soft;
>       }
>   
> +    if (unlikely(float32_is_denormal(ua.s) || float32_is_denormal(ub.s))) {
> +        /* We may need to set the input_denormal_used flag */
> +        goto soft;
> +    }
> +
>       float32_input_flush2(&ua.s, &ub.s, s);
>       if (isgreaterequal(ua.h, ub.h)) {
>           if (isgreater(ua.h, ub.h)) {

This obviates the float32_input_flush2 check.

> @@ -4462,6 +4487,12 @@ float64_hs_compare(float64 xa, float64 xb, float_status *s, bool is_quiet)
>      }
>  
>      float64_input_flush2(&ua.s, &ub.s, s);
> +
> +    if (unlikely(float64_is_denormal(ua.s) || float64_is_denormal(ub.s))) {
> +        /* We may need to set the input_denormal_used flag */
> +        goto soft;
> +    }
> +
>      if (isgreaterequal(ua.h, ub.h)) {
>          if (isgreater(ua.h, ub.h)) {
>              return float_relation_greater;

Likewise, though you're shadowing in the wrong direction this time.

Otherwise it looks ok.

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 24/76] fpu: allow flushing of output denormals to be after rounding
  2025-01-24 16:27 ` [PATCH 24/76] fpu: allow flushing of output denormals to be after rounding Peter Maydell
@ 2025-01-25 16:41   ` Richard Henderson
  2025-01-27 10:01     ` Peter Maydell
  2025-01-29 13:04     ` Peter Maydell
  0 siblings, 2 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 16:41 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> Currently we handle flushing of output denormals in uncanon_normal
> always before we deal with rounding.  This works for architectures
> that detect tininess before rounding, but is usually not the right
> place when the architecture detects tininess after rounding.  For
> example, for x86 the SDM states that the MXCSR FTZ control bit causes
> outputs to be flushed to zero "when it detects a floating-point
> underflow condition".  This means that we mustn't flush to zero if
> the input is such that after rounding it is no longer tiny.
> 
> At least one of our guest architectures does underflow detection
> after rounding but flushing of denormals before rounding (MIPS MSA);

Whacky, but yes, I see that in the msa docs.

> Add an ftz_detection flag.  For consistency with
> tininess_before_rounding, we make it default to "detect ftz after
> rounding"; this means that we need to explicitly set the flag to
> "detect ftz before rounding" on every existing architecture that sets
> flush_to_zero, so that this commit has no behaviour change.
> (This means more code change here but for the long term a less
> confusing API.)

Do we really want flush_to_zero to be separate from ftz_detection?

E.g.

enum {
   float_ftz_disabled,
   float_ftz_after_rounding,
   float_ftz_before_rounding,
}

BTW, I'm not keen on your "detect_*" names, without "float_" prefix like (almost?) 
everything else.


> diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
> index 0122b35008a..324e67de259 100644
> --- a/fpu/softfloat-parts.c.inc
> +++ b/fpu/softfloat-parts.c.inc
> @@ -334,7 +334,8 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
>               p->frac_lo &= ~round_mask;
>           }
>           frac_shr(p, frac_shift);
> -    } else if (s->flush_to_zero) {
> +    } else if (s->flush_to_zero &&
> +               s->ftz_detection == detect_ftz_before_rounding) {

else if (s->flush_to_zero == float_flush_to_zero_before_rounding)


>           flags |= float_flag_output_denormal_flushed;
>           p->cls = float_class_zero;
>           exp = 0;
> @@ -381,11 +382,19 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
>           exp = (p->frac_hi & DECOMPOSED_IMPLICIT_BIT) && !fmt->m68k_denormal;
>           frac_shr(p, frac_shift);
>   
> -        if (is_tiny && (flags & float_flag_inexact)) {
> -            flags |= float_flag_underflow;
> -        }
> -        if (exp == 0 && frac_eqz(p)) {
> -            p->cls = float_class_zero;
> +        if (is_tiny) {
> +            if (s->flush_to_zero) {
> +                assert(s->ftz_detection == detect_ftz_after_rounding);

if (s->flush_to_zero == float_flush_to_zero_after_rounding)

and no assert.

> +                flags |= float_flag_output_denormal_flushed;
> +                p->cls = float_class_zero;
> +                exp = 0;
> +                frac_clear(p);
> +            } else if (flags & float_flag_inexact) {
> +                flags |= float_flag_underflow;
> +            }
> +            if (exp == 0 && frac_eqz(p)) {
> +                p->cls = float_class_zero;
> +            }
>           }
>       }
>       p->exp = exp;



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 25/76] target/arm: Remove redundant advsimd float16 helpers
  2025-01-24 16:27 ` [PATCH 25/76] target/arm: Remove redundant advsimd float16 helpers Peter Maydell
@ 2025-01-25 16:59   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 16:59 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> The advsimd_addh etc helpers defined in helper-a64.c are identical to
> the vfp_addh etc helpers defined in helper-vfp.c: both take two
> float16 inputs (in a uint32_t type) plus a float_status* and are
> simple wrappers around the softfloat float16_* functions.
> 
> (The duplication seems to be a historical accident: we added the
> advsimd helpers in 2018 as part of the A64 implementation, and at
> that time there was no f16 emulation in A32.  Then later we added the
> A32 f16 handling by extending the existing VFP helper macros to
> generate f16 versions as well as f32 and f64, and didn't realise we
> could clean things up.)
> 
> Remove the now-unnecessary advsimd helpers and make the places that
> generated calls to them use the vfp helpers instead. Many of the
> helper functions were already unused.
> 
> (The remaining advsimd_ helpers are those which don't have vfp
> versions.)
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-a64.h    |  8 --------
>   target/arm/tcg/helper-a64.c    |  9 ---------
>   target/arm/tcg/translate-a64.c | 16 ++++++++--------
>   3 files changed, 8 insertions(+), 25 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 26/76] target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions
  2025-01-24 16:27 ` [PATCH 26/76] target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions Peter Maydell
@ 2025-01-25 17:01   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:01 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> We should be using the F16-specific float_status for conversions from
> half-precision, because halfprec inputs never set Input Denormal.
> 
> Without FEAT_AHP, using the wrong fpst here had no effect, because
> the only difference between the F16_A64 and A64 fpst is its handling
> of flush-to-zero on input and output, and the helper functions
> vfp_fcvt_f16_to_* and vfp_fcvt_*_to_f16 all explicitly squash the
> relevant flushing flags, and flush_inputs_to_zero was the only way
> that IDC could be set.
> 
> With FEAT_AHP, the FPCR.AH=1 behaviour sets IDC for
> input_denormal_used, which we will only ignore in
> vfp_get_fpsr_from_host() for the F16_A64 fpst; so it matters that we
> use that one for f16 inputs (and the normal one for single/double to
> f16 conversions).
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 9 ++++++---
>   target/arm/tcg/translate-sve.c | 4 ++--
>   2 files changed, 8 insertions(+), 5 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 27/76] target/arm: Define FPCR AH, FIZ, NEP bits
  2025-01-24 16:27 ` [PATCH 27/76] target/arm: Define FPCR AH, FIZ, NEP bits Peter Maydell
@ 2025-01-25 17:08   ` Richard Henderson
  2025-01-31 17:05     ` Peter Maydell
  0 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:08 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> The Armv8.7 FEAT_AFP feature defines three new control bits in
> the FPCR:
>   * FPCR.AH: "alternate floating point mode"; this changes floating
>     point behaviour in a variety of ways, including:
>      - the sign of a default NaN is 1, not 0
>      - if FPCR.FZ is also 1, denormals detected after rounding
>        with an unbounded exponent has been applied are flushed to zero
>      - FPCR.FZ does not cause denormalized inputs to be flushed to zero
>      - miscellaneous other corner-case behaviour changes
>   * FPCR.FIZ: flush denormalized numbers to zero on input for
>     most instructions
>   * FPCR.NEP: makes scalar SIMD operations merge the result with
>     higher vector elements in one of the source registers, instead
>     of zeroing the higher elements of the destination
> 
> This commit defines the new bits in the FPCR, and allows them to be
> read or written when FEAT_AFP is implemented.  Actual behaviour
> changes will be implemented in subsequent commits.
> 
> Note that these are the first FPCR bits which don't appear in the
> AArch32 FPSCR view of the register, and which share bit positions
> with FPSR bits.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/cpu-features.h |  5 +++++
>   target/arm/cpu.h          |  3 +++
>   target/arm/vfp_helper.c   | 11 ++++++++---
>   3 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
> index 30302d6c5b4..7bf24c506b3 100644
> --- a/target/arm/cpu-features.h
> +++ b/target/arm/cpu-features.h
> @@ -802,6 +802,11 @@ static inline bool isar_feature_aa64_hcx(const ARMISARegisters *id)
>       return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HCX) != 0;
>   }
>   
> +static inline bool isar_feature_aa64_afp(const ARMISARegisters *id)
> +{
> +    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, AFP) != 0;
> +}
> +
>   static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id)
>   {
>       return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0;
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 2213c277348..7ba227ac4c5 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1713,6 +1713,9 @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
>    */
>   
>   /* FPCR bits */
> +#define FPCR_FIZ    (1 << 0)    /* Flush Inputs to Zero (FEAT_AFP) */
> +#define FPCR_AH     (1 << 1)    /* Alternate Handling (FEAT_AFP) */
> +#define FPCR_NEP    (1 << 2)    /* SIMD scalar ops preserve elts (FEAT_AFP) */
>   #define FPCR_IOE    (1 << 8)    /* Invalid Operation exception trap enable */
>   #define FPCR_DZE    (1 << 9)    /* Divide by Zero exception trap enable */
>   #define FPCR_OFE    (1 << 10)   /* Overflow exception trap enable */
> diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
> index 3c8f3e65887..8c79ab4fc8a 100644
> --- a/target/arm/vfp_helper.c
> +++ b/target/arm/vfp_helper.c
> @@ -242,6 +242,9 @@ static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask)
>       if (!cpu_isar_feature(any_fp16, cpu)) {
>           val &= ~FPCR_FZ16;
>       }
> +    if (!cpu_isar_feature(aa64_afp, cpu)) {
> +        val &= ~(FPCR_FIZ | FPCR_AH | FPCR_NEP);
> +    }

I suppose this aa64 check, without is_a64(), is ok because the a32 caller has already 
applied FPSCR_FPCR_MASK.  And similarly for the ebf16 check below.

>   
>       if (!cpu_isar_feature(aa64_ebf16, cpu)) {
>           val &= ~FPCR_EBF;

But it does feel like we could usefully move these to vfp_set_fpcr, or such?


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 28/76] target/arm: Implement FPCR.FIZ handling
  2025-01-24 16:27 ` [PATCH 28/76] target/arm: Implement FPCR.FIZ handling Peter Maydell
@ 2025-01-25 17:25   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:25 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> Part of FEAT_AFP is the new control bit FPCR.FIZ.  This bit affects
> flushing of single and double precision denormal inputs to zero for
> AArch64 floating point instructions.  (For half-precision, the
> existing FPCR.FZ16 control remains the only one.)
> 
> FPCR.FIZ differs from FPCR.FZ in that if we flush an input denormal
> only because of FPCR.FIZ then we should *not* set the cumulative
> exception bit FPSR.IDC.
> 
> FEAT_AFP also defines that in AArch64 the existing FPCR.FZ only
> applies when FPCR.AH is 0.
> 
> We can implement this by setting the "flush inputs to zero" state
> appropriately when FPCR is written, and by not reflecting the
> float_flag_input_denormal status flag into FPSR reads when it is the
> result only of FPSR.FIZ.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/vfp_helper.c | 58 ++++++++++++++++++++++++++++++++++-------
>   1 file changed, 48 insertions(+), 10 deletions(-)
> 
> diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
> index 8c79ab4fc8a..5a0b389f7a3 100644
> --- a/target/arm/vfp_helper.c
> +++ b/target/arm/vfp_helper.c
> @@ -61,19 +61,29 @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
>   
>   static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
>   {
> -    uint32_t i = 0;
> +    uint32_t a32_flags = 0, a64_flags = 0;
>   
> -    i |= get_float_exception_flags(&env->vfp.fp_status_a32);
> -    i |= get_float_exception_flags(&env->vfp.fp_status_a64);
> -    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
> +    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
> +    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
>       /* FZ16 does not generate an input denormal exception.  */
> -    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
> +    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
>             & ~float_flag_input_denormal_flushed);
> -    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
> +    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
>             & ~float_flag_input_denormal_flushed);
> -    i |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
> +
> +    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
> +    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
>             & ~float_flag_input_denormal_flushed);
> -    return vfp_exceptbits_from_host(i);
> +    /*
> +     * Flushing an input denormal only because FPCR.FIZ == 1 does
> +     * not set FPSR.IDC. So squash it unless (FPCR.AH == 0 && FPCR.FZ == 1).
> +     * We only do this for the a64 flags because FIZ has no effect
> +     * on AArch32 even if it is set.
> +     */
> +    if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
> +        a64_flags &= ~float_flag_input_denormal_flushed;
> +    }

It might be worth pointing to FPUnpackBase pseudocode to say if both FZ and FIZ set, FZ 
takes precedence for setting IDC.

Anyway,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 29/76] target/arm: Adjust FP behaviour for FPCR.AH = 1
  2025-01-24 16:27 ` [PATCH 29/76] target/arm: Adjust FP behaviour for FPCR.AH = 1 Peter Maydell
@ 2025-01-25 17:27   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:27 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> When FPCR.AH is set, various behaviours of AArch64 floating point
> operations which are controlled by softfloat config settings change:
>   * tininess and ftz detection before/after rounding
>   * NaN propagation order
>   * result of 0 * Inf + NaN
>   * default NaN value
> 
> When the guest changes the value of the AH bit, switch these config
> settings on the fp_status_a64 and fp_status_f16_a64 float_status
> fields.
> 
> This requires us to make the arm_set_default_fp_behaviours() function
> global, since we now need to call it from cpu.c and vfp_helper.c; we
> move it to vfp_helper.c so it can be next to the new
> arm_set_ah_fp_behaviours().
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/internals.h  |  4 +++
>   target/arm/cpu.c        | 23 -----------------
>   target/arm/vfp_helper.c | 56 +++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 60 insertions(+), 23 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 30/76] target/arm: Adjust exception flag handling for AH = 1
  2025-01-24 16:27 ` [PATCH 30/76] target/arm: Adjust exception flag handling for AH " Peter Maydell
@ 2025-01-25 17:29   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:29 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> When FPCR.AH = 1, some of the cumulative exception flags in the FPSR
> behave slightly differently for A64 operations:
>   * IDC is set when a denormal input is used without flushing
>   * IXC (Inexact) is set when an output denormal is flushed to zero
> 
> Update vfp_get_fpsr_from_host() to do this.
> 
> Note that because half-precision operations never set IDC, we now
> need to add float_flag_input_denormal_used to the set we mask out of
> fp_status_f16_a64.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/vfp_helper.c | 17 ++++++++++++++---
>   1 file changed, 14 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 31/76] target/arm: Add FPCR.AH to tbflags
  2025-01-24 16:27 ` [PATCH 31/76] target/arm: Add FPCR.AH to tbflags Peter Maydell
@ 2025-01-25 17:30   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:30 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> We are going to need to generate different code in some cases when
> FPCR.AH is 1.  For example:
>   * Floating point neg and abs must not flip the sign bit of NaNs
>   * some insns (FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, and various
>     BFCVT and BFM bfloat16 ops) need to use a different float_status
>     to the usual one
> 
> Encode FPCR.AH into the A64 tbflags, so we can refer to it at
> translate time.
> 
> Because we now have a bit in FPCR that affects codegen, we can't mark
> the AArch64 FPCR register as being SUPPRESS_TB_END any more; writes
> to it will now end the TB and trigger a regeneration of hflags.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/cpu.h               | 1 +
>   target/arm/tcg/translate.h     | 2 ++
>   target/arm/helper.c            | 2 +-
>   target/arm/tcg/hflags.c        | 4 ++++
>   target/arm/tcg/translate-a64.c | 1 +
>   5 files changed, 9 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 32/76] target/arm: Set up float_status to use for FPCR.AH=1 behaviour
  2025-01-24 16:27 ` [PATCH 32/76] target/arm: Set up float_status to use for FPCR.AH=1 behaviour Peter Maydell
@ 2025-01-25 17:36   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:36 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> When FPCR.AH is 1, the behaviour of some instructions changes:
>   * AdvSIMD BFCVT, BFCVTN, BFCVTN2, BFMLALB, BFMLALT
>   * SVE BFCVT, BFCVTNT, BFMLALB, BFMLALT, BFMLSLB, BFMLSLT
>   * SME BFCVT, BFCVTN, BFMLAL, BFMLSL (these are all in SME2 which
>     QEMU does not yet implement)
>   * FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
> 
> The behaviour change is:
>   * the instructions do not update the FPSR cumulative exception flags
>   * trapped floating point exceptions are disabled (a no-op for QEMU,
>     which doesn't implement FPCR.{IDE,IXE,UFE,OFE,DZE,IOE})
>   * rounding is always round-to-nearest-even regardless of FPCR.RMode
>   * denormalized inputs and outputs are always flushed to zero, as if
>     FPCR.{FZ,FIZ} is {1,1}
>   * FPCR.FZ16 is still honoured for half-precision inputs
> 
> (See the Arm ARM DDI0487L.a section A1.5.9.)
> 
> We can provide all these behaviours with another pair of float_status fields
> which we use only for these insns, when FPCR.AH is 1. These float_status
> fields will always have:
>   * flush_to_zero and flush_inputs_to_zero set for the non-F16 field
>   * rounding mode set to round-to-nearest-even
> and so the only FPCR fields they need to honour are DN and FZ16.
> 
> In this commit we only define the new fp_status fields and give them
> the required behaviour when FPSR is updated.  In subsequent commits
> we will arrange to use this new fp_status field for the instructions
> that should be affected by FPCR.AH in this way.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> I'm not super enthusiastic about the ah_fp_status naming, which sort
> of suggests it's always to be used when AH=1, rather than "for this
> specific group of insns when AH=1". But I couldn't think of a better
> name that was still reasonably short...
> ---

Hmm.  I should really compare this vs the new pair of fp_status that I add for SME2, which 
also do not write back exception flags.

Anyway,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 33/76] target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
  2025-01-24 16:27 ` [PATCH 33/76] target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS Peter Maydell
@ 2025-01-25 17:40   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:40 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> For the instructions FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, use
> FPST_FPCR_AH or FPST_FPCR_AH_F16 when FPCR.AH is 1, so that they get
> the required behaviour changes.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> select_fpst() is another function I'm not super happy wit hthe
> naming of, because again it should only be used for the subset
> of insns which have this particular behaviour, but the current
> name kind of implies more generality than that. Suggestions welcome.
> ---
>   target/arm/tcg/translate.h     |  13 ++++
>   target/arm/tcg/translate-a64.c | 119 +++++++++++++++++++++++++--------
>   target/arm/tcg/translate-sve.c |  30 ++++++---
>   3 files changed, 127 insertions(+), 35 deletions(-)
> 
> diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
> index d6edd8db76b..680ca52a181 100644
> --- a/target/arm/tcg/translate.h
> +++ b/target/arm/tcg/translate.h
> @@ -746,6 +746,19 @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
>       return statusptr;
>   }
>   
> +/*
> + * Return the ARMFPStatusFlavour to use based on element size and
> + * whether FPCR.AH is set.
> + */
> +static inline ARMFPStatusFlavour select_fpst(DisasContext *s, MemOp esz)
> +{
> +    if (s->fpcr_ah) {
> +        return esz == MO_16 ? FPST_FPCR_AH_F16 : FPST_FPCR_AH;
> +    } else {
> +        return esz == MO_16 ? FPST_FPCR_F16_A64 : FPST_FPCR_A64;
> +    }
> +}
> +

translate-a64.h, I think.

Otherwise.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 34/76] target/arm: Use FPST_FPCR_AH for BFCVT* insns
  2025-01-24 16:27 ` [PATCH 34/76] target/arm: Use FPST_FPCR_AH for BFCVT* insns Peter Maydell
@ 2025-01-25 17:42   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:42 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> When FPCR.AH is 1, use FPST_FPCR_AH for:
>   * AdvSIMD BFCVT, BFCVTN, BFCVTN2
>   * SVE BFCVT, BFCVTNT
> 
> so that they get the required behaviour changes.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 27 +++++++++++++++++++++------
>   target/arm/tcg/translate-sve.c |  6 ++++--
>   2 files changed, 25 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 35/76] target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
  2025-01-24 16:27 ` [PATCH 35/76] target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns Peter Maydell
@ 2025-01-25 17:44   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:44 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> When FPCR.AH is 1, use FPST_FPCR_AH for:
>   * AdvSIMD BFMLALB, BFMLALT
>   * SVE BFMLALB, BFMLALT, BFMLSLB, BFMLSLT
> 
> so that they get the required behaviour changes.
> 
> We do this by making gen_gvec_op4_fpst() take an ARMFPStatusFlavour
> rather than a bool is_fp16; existing callsites now select
> FPST_FPCR_F16_A64 vs FPST_FPCR_A64 themselves rather than passing in
> the boolean.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 20 +++++++++++++-------
>   target/arm/tcg/translate-sve.c |  6 ++++--
>   2 files changed, 17 insertions(+), 9 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 36/76] target/arm: Add FPCR.NEP to TBFLAGS
  2025-01-24 16:27 ` [PATCH 36/76] target/arm: Add FPCR.NEP to TBFLAGS Peter Maydell
@ 2025-01-25 17:45   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:45 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> For FEAT_AFP, we want to emit different code when FPCR.NEP is set, so
> that instead of zeroing the high elements of a vector register when
> we write the output of a scalar operation to it, we instead merge in
> those elements from one of the source registers.  Since this affects
> the generated code, we need to put FPCR.NEP into the TBFLAGS.
> 
> FPCR.NEP is treated as 0 when in streaming SVE mode and FEAT_SME_FA64
> is not implemented or not enabled; we can implement this logic in
> rebuild_hflags_a64().
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/cpu.h               | 1 +
>   target/arm/tcg/translate.h     | 2 ++
>   target/arm/tcg/hflags.c        | 9 +++++++++
>   target/arm/tcg/translate-a64.c | 1 +
>   4 files changed, 13 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 37/76] target/arm: Define and use new write_fp_*reg_merging() functions
  2025-01-24 16:27 ` [PATCH 37/76] target/arm: Define and use new write_fp_*reg_merging() functions Peter Maydell
@ 2025-01-25 17:52   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:52 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> For FEAT_AFP's FPCR.NEP bit, we need to programmatically change the
> behaviour of the writeback of the result for most SIMD scalar
> operations, so that instead of zeroing the upper part of the result
> register it merges the upper elements from one of the input
> registers.
> 
> Provide new functions write_fp_*reg_merging() which can be used
> instead of the existing write_fp_*reg() functions when we want this
> "merge the result with one of the input registers if FPCR.NEP is
> enabled" handling, and use them in do_fp3_scalar_with_fpsttype().
> 
> Note that (as documented in the description of the FPCR.NEP bit)
> which input register to use as the merge source varies by
> instruction: for these 2-input scalar operations, the comparison
> instructions take from Rm, not Rn.
> 
> We'll extend this to also provide the merging behaviour for
> the remaining scalar insns in subsequent commits.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 117 +++++++++++++++++++++++++--------
>   1 file changed, 91 insertions(+), 26 deletions(-)
> 
> diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
> index d34672a8ba6..19a4ae14c15 100644
> --- a/target/arm/tcg/translate-a64.c
> +++ b/target/arm/tcg/translate-a64.c
> @@ -665,6 +665,68 @@ static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
>       write_fp_dreg(s, reg, tmp);
>   }
>   
> +/*
> + * Write a double result to 128 bit vector register reg, honouring FPCR.NEP:
> + * - if FPCR.NEP == 0, clear the high elements of reg
> + * - if FPCR.NEP == 1, set the high elements of reg from mergereg
> + *   (i.e. merge the result with those high elements)
> + * In either case, SVE register bits above 128 are zeroed (per R_WKYLB).
> + */
> +static void write_fp_dreg_merging(DisasContext *s, int reg, int mergereg,
> +                                  TCGv_i64 v)
> +{
> +    if (!s->fpcr_nep) {
> +        write_fp_dreg(s, reg, v);
> +        return;
> +    }
> +
> +    /*
> +     * Move from mergereg to reg; this sets the high elements and
> +     * clears the bits above 128 as a side effect.
> +     */
> +    tcg_gen_gvec_mov(MO_64, fp_reg_offset(s, reg, MO_64),
> +                     fp_reg_offset(s, mergereg, MO_64),
> +                     16, vec_full_reg_size(s));

I think this would be clearer with vec_full_reg_offset(), though the result is correct 
either way.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 38/76] target/arm: Handle FPCR.NEP for 3-input scalar operations
  2025-01-24 16:27 ` [PATCH 38/76] target/arm: Handle FPCR.NEP for 3-input scalar operations Peter Maydell
@ 2025-01-25 17:53   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:53 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> Handle FPCR.NEP for the 3-input scalar operations which use
> do_fmla_scalar_idx() and do_fmadd(), by making them call the
> appropriate write_fp_*reg_merging() functions.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 39/76] target/arm: Handle FPCR.NEP for BFCVT scalar
  2025-01-24 16:27 ` [PATCH 39/76] target/arm: Handle FPCR.NEP for BFCVT scalar Peter Maydell
@ 2025-01-25 17:55   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-25 17:55 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> Currently we implement BFCVT scalar via do_fp1_scalar().  This works
> even though BFCVT is a narrowing operation from 32 to 16 bits,
> because we can use write_fp_sreg() for float16. However, FPCR.NEP
> support requires that we use write_fp_hreg_merging() for float16
> outputs, so we can't continue to borrow the non-narrowing
> do_fp1_scalar() function for this. Split out trans_BFCVT_s()
> into its own implementation that honours FPCR.NEP.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 40/76] target/arm: Handle FPCR.NEP for 1-input scalar operations
  2025-01-24 16:28 ` [PATCH 40/76] target/arm: Handle FPCR.NEP for 1-input scalar operations Peter Maydell
@ 2025-01-26 12:33   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 12:33 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Handle FPCR.NEP for the 1-input scalar operations.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 26 ++++++++++++++------------
>   1 file changed, 14 insertions(+), 12 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 41/76] target/arm: Handle FPCR.NEP in do_cvtf_scalar()
  2025-01-24 16:28 ` [PATCH 41/76] target/arm: Handle FPCR.NEP in do_cvtf_scalar() Peter Maydell
@ 2025-01-26 12:33   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 12:33 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Handle FPCR.NEP in the operations handled by do_cvtf_scalar().
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 42/76] target/arm: Handle FPCR.NEP for scalar FABS and FNEG
  2025-01-24 16:28 ` [PATCH 42/76] target/arm: Handle FPCR.NEP for scalar FABS and FNEG Peter Maydell
@ 2025-01-26 12:34   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 12:34 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Handle FPCR.NEP merging for scalar FABS and FNEG; this requires
> an extra parameter to do_fp1_scalar_int(), since FMOV scalar
> does not have the merging behaviour.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 27 ++++++++++++++++++++-------
>   1 file changed, 20 insertions(+), 7 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 43/76] target/arm: Handle FPCR.NEP for FCVTXN (scalar)
  2025-01-24 16:28 ` [PATCH 43/76] target/arm: Handle FPCR.NEP for FCVTXN (scalar) Peter Maydell
@ 2025-01-26 12:36   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 12:36 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Unlike the other users of do_2misc_narrow_scalar(), FCVTXN (scalar)
> is always double-to-single and must honour FPCR.NEP.  Implement this
> directly in a trans function rather than using
> do_2misc_narrow_scalar().
> 
> We still need gen_fcvtxn_sd() and the f_scalar_fcvtxn[] array for
> the FCVTXN (vector) insn, so we move those down in the file to
> where they are used.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 43 ++++++++++++++++++++++------------
>   1 file changed, 28 insertions(+), 15 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 44/76] target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
  2025-01-24 16:28 ` [PATCH 44/76] target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element Peter Maydell
@ 2025-01-26 12:36   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 12:36 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> do_fp3_scalar_idx() is used only for the FMUL and FMULX scalar by
> element instructions; these both need to merge the result with the Rn
> register when FPCR.NEP is set.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 45/76] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
  2025-01-24 16:28 ` [PATCH 45/76] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX Peter Maydell
@ 2025-01-26 12:43   ` Richard Henderson
  2025-01-31 13:09     ` Peter Maydell
  0 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 12:43 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
> index 05036089dd7..406d76e1129 100644
> --- a/target/arm/tcg/helper-a64.c
> +++ b/target/arm/tcg/helper-a64.c
> @@ -399,6 +399,42 @@ float32 HELPER(fcvtx_f64_to_f32)(float64 a, float_status *fpst)
>       return r;
>   }
>   
> +/*
> + * AH=1 min/max have some odd special cases:
> + * comparing two zeroes (even of different sign), (NaN, anything),
> + * or (anything, NaN) should return the second argument (possibly
> + * squashed to zero).
> + * Also, denormal outputs are not squashed to zero regardless of FZ or FZ16.
> + */
> +#define AH_MINMAX_HELPER(NAME, CTYPE, FLOATTYPE, MINMAX)                \
> +    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
> +    {                                                                   \
> +        bool save;                                                      \
> +        CTYPE r;                                                        \
> +        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
> +        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
> +        if (FLOATTYPE ## _is_zero(a) && FLOATTYPE ## _is_zero(b)) {     \

The comment says "even of different sign", the pseudocode explicitly checks different 
sign.  But of course if they're the same sign a and b are indistinguishable.  Perhaps 
slightly different wording?

Anyway,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 46/76] target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
  2025-01-24 16:28 ` [PATCH 46/76] target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX Peter Maydell
@ 2025-01-26 12:45   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 12:45 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Implement the FPCR.AH == 1 semantics for vector FMIN/FMAX, by
> creating new_ah_ versions of the gvec helpers which invoke the
> scalar fmin_ah and fmax_ah helpers on each element.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
>   target/arm/tcg/translate-a64.c | 21 +++++++++++++++++++--
>   target/arm/tcg/vec_helper.c    |  8 ++++++++
>   3 files changed, 41 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 47/76] target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
  2025-01-24 16:28 ` [PATCH 47/76] target/arm: Implement FPCR.AH semantics for FMAXV and FMINV Peter Maydell
@ 2025-01-26 12:47   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 12:47 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Implement the FPCR.AH semantics for FMAXV and FMINV.  These are the
> "recursively reduce all lanes of a vector to a scalar result" insns;
> we just need to use the_ah_ helper for the reduction step when
> FPCR.AH == 1.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 28 ++++++++++++++++++----------
>   1 file changed, 18 insertions(+), 10 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 48/76] target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
  2025-01-24 16:28 ` [PATCH 48/76] target/arm: Implement FPCR.AH semantics for FMINP and FMAXP Peter Maydell
@ 2025-01-26 12:49   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 12:49 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Implement the FPCR.AH semantics for the pairwise floating
> point minimum/maximum insns FMINP and FMAXP.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
>   target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
>   target/arm/tcg/vec_helper.c    | 10 ++++++++++
>   3 files changed, 45 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 49/76] target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
  2025-01-24 16:28 ` [PATCH 49/76] target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV Peter Maydell
@ 2025-01-26 12:51   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 12:51 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Implement the FPCR.AH semantics for the SVE FMAXV and FMINV
> vector-reduction-to-scalar max/min operations.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    | 14 +++++++++++
>   target/arm/tcg/sve_helper.c    | 43 +++++++++++++++++++++-------------
>   target/arm/tcg/translate-sve.c | 16 +++++++++++--
>   3 files changed, 55 insertions(+), 18 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 50/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
  2025-01-24 16:28 ` [PATCH 50/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate Peter Maydell
@ 2025-01-26 12:54   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 12:54 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Implement the FPCR.AH semantics for the SVE FMAX and FMIN operations
> that take an immediate as the second operand.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
>   target/arm/tcg/sve_helper.c    |  8 ++++++++
>   target/arm/tcg/translate-sve.c | 25 +++++++++++++++++++++++--
>   3 files changed, 45 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 51/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
  2025-01-24 16:28 ` [PATCH 51/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector Peter Maydell
@ 2025-01-26 12:55   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 12:55 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Implement the FPCR.AH semantics for the SVE FMAX and FMIN
> operations that take two vector operands.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
>   target/arm/tcg/sve_helper.c    |  8 ++++++++
>   target/arm/tcg/translate-sve.c | 17 +++++++++++++++--
>   3 files changed, 37 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 52/76] target/arm: Implement FPCR.AH handling of negation of NaN
  2025-01-24 16:28 ` [PATCH 52/76] target/arm: Implement FPCR.AH handling of negation of NaN Peter Maydell
@ 2025-01-26 13:00   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:00 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> FPCR.AH == 1 mandates that negation of a NaN value should not flip
> its sign bit.  This means we can no longer use gen_vfp_neg*()
> everywhere but must instead generate slightly more complex code when
> FPCR.AH is set.
> 
> Make this change for the scalar FNEG and for those places in
> translate-a64.c which were previously directly calling
> gen_vfp_neg*().
> 
> This change in semantics also affects any other instruction whose
> pseudocode calls FPNeg(); in following commits we extend this
> change to the other affected instructions.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 125 ++++++++++++++++++++++++++++++---
>   1 file changed, 114 insertions(+), 11 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 53/76] target/arm: Implement FPCR.AH handling for scalar FABS and FABD
  2025-01-24 16:28 ` [PATCH 53/76] target/arm: Implement FPCR.AH handling for scalar FABS and FABD Peter Maydell
@ 2025-01-26 13:01   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:01 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> FPCR.AH == 1 mandates that taking the absolute value of a NaN should
> not change its sign bit.  This means we can no longer use
> gen_vfp_abs*() everywhere but must instead generate slightly more
> complex code when FPCR.AH is set.
> 
> Implement these semantics for scalar FABS and FABD.  This change also
> affects all other instructions whose psuedocode calls FPAbs(); we
> will extend the change to those instructions in following commits.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 69 +++++++++++++++++++++++++++++++++-
>   1 file changed, 67 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 54/76] target/arm: Handle FPCR.AH in vector FABD
  2025-01-24 16:28 ` [PATCH 54/76] target/arm: Handle FPCR.AH in vector FABD Peter Maydell
@ 2025-01-26 13:03   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:03 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Split the handling of vector FABD so that it calls a different set
> of helpers when FPCR.AH is 1, which implement the "no negation of
> the sign of a NaN" semantics.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper.h            |  4 ++++
>   target/arm/tcg/translate-a64.c |  7 ++++++-
>   target/arm/tcg/vec_helper.c    | 23 +++++++++++++++++++++++
>   3 files changed, 33 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 55/76] target/arm: Handle FPCR.AH in SVE FNEG
  2025-01-24 16:28 ` [PATCH 55/76] target/arm: Handle FPCR.AH in SVE FNEG Peter Maydell
@ 2025-01-26 13:05   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:05 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Make SVE FNEG honour the FPCR.AH "don't negate the sign of a NaN" semantics.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    | 4 ++++
>   target/arm/tcg/sve_helper.c    | 8 ++++++++
>   target/arm/tcg/translate-sve.c | 7 ++++++-
>   3 files changed, 18 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 56/76] target/arm: Handle FPCR.AH in SVE FABS
  2025-01-24 16:28 ` [PATCH 56/76] target/arm: Handle FPCR.AH in SVE FABS Peter Maydell
@ 2025-01-26 13:05   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:05 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Make SVE FABS honour the FPCR.AH "don't negate the sign of a NaN" semantics.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    | 4 ++++
>   target/arm/tcg/sve_helper.c    | 8 ++++++++
>   target/arm/tcg/translate-sve.c | 7 ++++++-
>   3 files changed, 18 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 57/76] target/arm: Handle FPCR.AH in SVE FABD
  2025-01-24 16:28 ` [PATCH 57/76] target/arm: Handle FPCR.AH in SVE FABD Peter Maydell
@ 2025-01-26 13:06   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:06 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Make the SVE FABD insn honour the FPCR.AH "don't negate the sign
> of a NaN" semantics.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    |  7 +++++++
>   target/arm/tcg/sve_helper.c    | 22 ++++++++++++++++++++++
>   target/arm/tcg/translate-sve.c |  2 +-
>   3 files changed, 30 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 58/76] target/arm: Handle FPCR.AH in negation steps in FCADD
  2025-01-24 16:28 ` [PATCH 58/76] target/arm: Handle FPCR.AH in negation steps in FCADD Peter Maydell
@ 2025-01-26 13:08   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:08 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> The negation steps in FCADD must honour FPCR.AH's "don't change the
> sign of a NaN" semantics.  Implement this by encoding FPCR.AH into
> the SIMD data field passed to the helper and using that to decide
> whether to negate the values.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 10 +++++++--
>   target/arm/tcg/vec_helper.c    | 39 ++++++++++++++++++++++++++++------
>   2 files changed, 41 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 59/76] target/arm: Handle FPCR.AH in negation steps in SVE FCADD
  2025-01-24 16:28 ` [PATCH 59/76] target/arm: Handle FPCR.AH in negation steps in SVE FCADD Peter Maydell
@ 2025-01-26 13:10   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:10 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> +            if (neg_real && !(fpcr_ah && float16_is_any_nan(e1))) {
> +                e1 ^= neg_real;
> +            }
> +            if (neg_imag && !(fpcr_ah && float16_is_any_nan(e3))) {
> +                e3 ^= neg_imag;
> +            }

Drop the neg_real/neg_imag check?
Or do you imagine the is_any_nan check to be more expensive?


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 60/76] target/arm: Handle FPCR.AH in FMLSL
  2025-01-24 16:28 ` [PATCH 60/76] target/arm: Handle FPCR.AH in FMLSL Peter Maydell
@ 2025-01-26 13:13   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:13 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Honour the FPCR.AH "don't negate the sign of a NaN" semantics in
> FMLSL. We pass in the value of FPCR.AH in the SIMD data field, and
> use this to determine whether we should suppress the negation for
> NaN inputs.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c |  4 ++--
>   target/arm/tcg/vec_helper.c    | 28 ++++++++++++++++++++++++----
>   2 files changed, 26 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 61/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
  2025-01-24 16:28 ` [PATCH 61/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns Peter Maydell
@ 2025-01-26 13:14   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:14 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Handle the FPCR.AH semantics that we do not change the sign of an
> input NaN in the FRECPS and FRSQRTS scalar insns, by providing
> new helper functions that do the CHS part of the operation
> differently.
> 
> Since the extra helper functions would be very repetitive if written
> out longhand, we condense them and the existing non-AH helpers into
> being emitted via macros.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-a64.h    |   6 ++
>   target/arm/tcg/helper-a64.c    | 128 ++++++++++++++-------------------
>   target/arm/tcg/translate-a64.c |  25 +++++--
>   3 files changed, 78 insertions(+), 81 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 62/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
  2025-01-24 16:28 ` [PATCH 62/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns Peter Maydell
@ 2025-01-26 13:15   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:15 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Handle the FPCR.AH "don't negate the sign of a NaN" semantics
> in the vector versions of FRECPS and FRSQRTS, by implementing
> new vector wrappers that call the_ah_ scalar helpers.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
>   target/arm/tcg/translate-a64.c | 21 ++++++++++++++++-----
>   target/arm/tcg/translate-sve.c |  7 ++++++-
>   target/arm/tcg/vec_helper.c    |  8 ++++++++
>   4 files changed, 44 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 63/76] target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
  2025-01-24 16:28 ` [PATCH 63/76] target/arm: Handle FPCR.AH in negation step in FMLS (indexed) Peter Maydell
@ 2025-01-26 13:16   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:16 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Handle the FPCR.AH "don't negate the sign of a NaN" semantics in FMLS
> (indexed), by passing through FPCR.AH in the SIMD data word, for the
> helper to use to determine whether to negate.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 2 +-
>   target/arm/tcg/translate-sve.c | 2 +-
>   target/arm/tcg/vec_helper.c    | 9 +++++++--
>   3 files changed, 9 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 64/76] target/arm: Handle FPCR.AH in negation in FMLS (vector)
  2025-01-24 16:28 ` [PATCH 64/76] target/arm: Handle FPCR.AH in negation in FMLS (vector) Peter Maydell
@ 2025-01-26 13:17   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:17 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Handle the FPCR.AH "don't negate the sign of a NaN" semantics
> in FMLS (vector), by implementing a new set of helpers for
> the AH=1 case.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper.h            |  4 ++++
>   target/arm/tcg/translate-a64.c |  7 ++++++-
>   target/arm/tcg/vec_helper.c    | 25 +++++++++++++++++++++++++
>   3 files changed, 35 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 65/76] target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
  2025-01-24 16:28 ` [PATCH 65/76] target/arm: Handle FPCR.AH in negation step in SVE " Peter Maydell
@ 2025-01-26 13:19   ` Richard Henderson
  2025-01-27 20:41   ` Richard Henderson
  1 sibling, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:19 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Handle the FPCR.AH "don't negate the sign of a NaN" semantics fro the
> SVE FMLS (vector) insns, by providing new helpers for the AH=1 case
> which end up passing fpcr_ah = true to the do_fmla_zpzzz_* functions
> that do the work.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    |  21 ++++++
>   target/arm/tcg/sve_helper.c    | 114 +++++++++++++++++++++++++++------
>   target/arm/tcg/translate-sve.c |  18 ++++--
>   3 files changed, 126 insertions(+), 27 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 66/76] target/arm: Handle FPCR.AH in SVE FTSSEL
  2025-01-24 16:28 ` [PATCH 66/76] target/arm: Handle FPCR.AH in SVE FTSSEL Peter Maydell
@ 2025-01-26 13:20   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:20 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> The negation step in the SVE FTSSEL insn mustn't negate a NaN when
> FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field
> and use that to determine whether to do the negation.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/sve_helper.c    | 18 +++++++++++++++---
>   target/arm/tcg/translate-sve.c |  4 ++--
>   2 files changed, 17 insertions(+), 5 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 67/76] target/arm: Handle FPCR.AH in SVE FTMAD
  2025-01-24 16:28 ` [PATCH 67/76] target/arm: Handle FPCR.AH in SVE FTMAD Peter Maydell
@ 2025-01-26 13:21   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:21 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> The negation step in the SVE FTMAD insn mustn't negate a NaN when
> FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field
> and use that to determine whether to do the negation.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/sve_helper.c    | 21 +++++++++++++++------
>   target/arm/tcg/translate-sve.c |  3 ++-
>   2 files changed, 17 insertions(+), 7 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 68/76] target/arm: Enable FEAT_AFP for '-cpu max'
  2025-01-24 16:28 ` [PATCH 68/76] target/arm: Enable FEAT_AFP for '-cpu max' Peter Maydell
@ 2025-01-26 13:21   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:21 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Now that we have completed the handling for FPCR.{AH,FIZ,NEP}, we
> can enable FEAT_AFP for '-cpu max', and document that we support it.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   docs/system/arm/emulation.rst | 1 +
>   target/arm/tcg/cpu64.c        | 1 +
>   2 files changed, 2 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 69/76] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
  2025-01-24 16:28 ` [PATCH 69/76] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper Peter Maydell
@ 2025-01-26 13:23   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:23 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> FEAT_RPRES implements an "increased precision" variant of the single
> precision FRECPE and FRSQRTE instructions from an 8 bit to a 12
> bit mantissa. This applies only when FPCR.AH == 1. Note that the
> halfprec and double versions of these insns retain the 8 bit
> precision regardless.
> 
> In this commit we add all the plumbing to make these instructions
> call a new helper function when the increased-precision is in
> effect. In the following commit we will provide the actual change
> in behaviour in the helpers.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/cpu-features.h      |  5 +++++
>   target/arm/helper.h            |  4 ++++
>   target/arm/tcg/translate-a64.c | 34 ++++++++++++++++++++++++++++++----
>   target/arm/tcg/translate-sve.c | 16 ++++++++++++++--
>   target/arm/tcg/vec_helper.c    |  2 ++
>   target/arm/vfp_helper.c        | 32 ++++++++++++++++++++++++++++++--
>   6 files changed, 85 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 70/76] target/arm: Implement increased precision FRECPE
  2025-01-24 16:28 ` [PATCH 70/76] target/arm: Implement increased precision FRECPE Peter Maydell
@ 2025-01-26 13:26   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:26 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Implement the increased precision variation of FRECPE.  In the
> pseudocode this corresponds to the handling of the
> "increasedprecision" boolean in the FPRecipEstimate() and
> RecipEstimate() functions.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/vfp_helper.c | 54 +++++++++++++++++++++++++++++++++++------
>   1 file changed, 46 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 71/76] target/arm: Implement increased precision FRSQRTE
  2025-01-24 16:28 ` [PATCH 71/76] target/arm: Implement increased precision FRSQRTE Peter Maydell
@ 2025-01-26 13:28   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:28 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Implement the increased precision variation of FRSQRTE.  In the
> pseudocode this corresponds to the handling of the
> "increasedprecision" boolean in the FPRSqrtEstimate() and
> RecipSqrtEstimate() functions.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/vfp_helper.c | 77 ++++++++++++++++++++++++++++++++++-------
>   1 file changed, 64 insertions(+), 13 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 72/76] target/arm: Enable FEAT_RPRES for -cpu max
  2025-01-24 16:28 ` [PATCH 72/76] target/arm: Enable FEAT_RPRES for -cpu max Peter Maydell
@ 2025-01-26 13:29   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:29 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Now the emulation is complete, we can enable FEAT_RPRES for the 'max'
> CPU type.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   docs/system/arm/emulation.rst | 1 +
>   target/arm/tcg/cpu64.c        | 1 +
>   2 files changed, 2 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 73/76] target/i386: Detect flush-to-zero after rounding
  2025-01-24 16:28 ` [PATCH 73/76] target/i386: Detect flush-to-zero after rounding Peter Maydell
@ 2025-01-26 13:30   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:30 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> The Intel SDM section 10.2.3.3 on the MXCSR.FTZ bit says that we
> flush outputs to zero when we detect underflow, which is after
> rounding.  Set the detect_ftz flag accordingly.
> 
> This allows us to enable the test in fma.c which checks this
> behaviour.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/i386/tcg/fpu_helper.c | 8 ++++----
>   tests/tcg/x86_64/fma.c       | 5 -----
>   2 files changed, 4 insertions(+), 9 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 74/76] target/i386: Use correct type for get_float_exception_flags() values
  2025-01-24 16:28 ` [PATCH 74/76] target/i386: Use correct type for get_float_exception_flags() values Peter Maydell
@ 2025-01-26 13:30   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:30 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> The softfloat get_float_exception_flags() function returns 'int', but
> in various places in target/i386 we incorrectly store the returned
> value into a uint8_t.  This currently has no ill effects because i386
> doesn't care about any of the float_flag enum values above 0x40.
> However, we want to start using float_flag_input_denormal_used, which
> is 0x4000.
> 
> Switch to using 'int' so that we can handle all the possible valid
> float_flag_* values. This includes changing the return type of
> save_exception_flags() and the argument to merge_exception_flags().
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/i386/ops_sse.h        | 16 +++----
>   target/i386/tcg/fpu_helper.c | 82 ++++++++++++++++++------------------
>   2 files changed, 49 insertions(+), 49 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 75/76] target/i386: Wire up MXCSR.DE and FPUS.DE correctly
  2025-01-24 16:28 ` [PATCH 75/76] target/i386: Wire up MXCSR.DE and FPUS.DE correctly Peter Maydell
@ 2025-01-26 13:31   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:31 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> The x86 DE bit in the FPU and MXCSR status is supposed to be set
> when an input denormal is consumed. We didn't previously report
> this from softfloat, so the x86 code either simply didn't set
> the DE bit or else incorrectly wired it up to denormal_flushed,
> depending on which register you looked at.
> 
> Now we have input_denormal_used we can wire up these DE bits
> with the semantics they are supposed to have.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/i386/tcg/fpu_helper.c | 11 +++--------
>   1 file changed, 3 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 76/76] tests/tcg/x86_64/fma: add test for exact-denormal output
  2025-01-24 16:28 ` [PATCH 76/76] tests/tcg/x86_64/fma: add test for exact-denormal output Peter Maydell
@ 2025-01-26 13:32   ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-26 13:32 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> Add some fma test cases that check for correct handling of FTZ and
> for the flag that indicates that the input denormal was consumed.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   tests/tcg/x86_64/fma.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/tests/tcg/x86_64/fma.c b/tests/tcg/x86_64/fma.c
> index 46f863005ed..34219614c0a 100644
> --- a/tests/tcg/x86_64/fma.c
> +++ b/tests/tcg/x86_64/fma.c
> @@ -82,6 +82,18 @@ static testdata tests[] = {
>        */
>       { 0x3fdfffffffffffff, 0x001fffffffffffff, 0x801fffffffffffff, true,
>         0x8010000000000000, 0x20 }, /* Enabling FTZ shouldn't change flags */
> +    /*
> +     * normal * 0 + a denormal. With FTZ disabled this gives an exact
> +     * result (equal to the input denormal) that has consumed the denormal.
> +     */
> +    { 0x3cc8000000000000, 0x0000000000000000, 0x8008000000000000, false,
> +      0x8008000000000000, 0x2 }, /* Denormal */
> +    /*
> +     * With FTZ enabled, this consumes the denormal, returns zero (because
> +     * flushed) and indicates also Underflow and Precision.
> +     */
> +    { 0x3cc8000000000000, 0x0000000000000000, 0x8008000000000000, true,
> +      0x8000000000000000, 0x32 }, /* Precision, Underflow, Denormal */
>   };
>   
>   int main(void)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 06/76] target/arm: Define new fp_status_a32 and fp_status_a64
  2025-01-24 16:27 ` [PATCH 06/76] target/arm: Define new fp_status_a32 and fp_status_a64 Peter Maydell
  2025-01-25 15:12   ` Richard Henderson
@ 2025-01-27  4:59   ` Richard Henderson
  1 sibling, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-27  4:59 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
> index 2d37d7c9f21..a7509b314b0 100644
> --- a/target/arm/tcg/translate.h
> +++ b/target/arm/tcg/translate.h
> @@ -671,6 +671,8 @@ static inline CPUARMTBFlags arm_tbflags_from_tb(const TranslationBlock *tb)
>    */
>   typedef enum ARMFPStatusFlavour {
>       FPST_FPCR,
> +    FPST_FPCR_A32,
> +    FPST_FPCR_A64,

May I suggest calling these FPST_A32 and FPST_A64?
Just a bit less typing in the common case, and it's
not so different from FPST_STD and FPST_AH.


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64
  2025-01-24 16:27 ` [PATCH 13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64 Peter Maydell
  2025-01-25 15:21   ` Richard Henderson
@ 2025-01-27  5:00   ` Richard Henderson
  1 sibling, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-27  5:00 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:27, Peter Maydell wrote:
> --- a/target/arm/tcg/translate.h
> +++ b/target/arm/tcg/translate.h
> @@ -673,6 +673,8 @@ typedef enum ARMFPStatusFlavour {
>       FPST_FPCR_A32,
>       FPST_FPCR_A64,
>       FPST_FPCR_F16,
> +    FPST_FPCR_F16_A32,
> +    FPST_FPCR_F16_A64,
>       FPST_STD,
>       FPST_STD_F16,
>   } ARMFPStatusFlavour;

May I suggest calling these FPST_A32_F16 and FPST_A64_F16.
This matches FPST_STD_F16 and FPST_AH_F16.


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 02/76] tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases
  2025-01-24 17:15   ` Alex Bennée
@ 2025-01-27  9:54     ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2025-01-27  9:54 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-arm, qemu-devel

On Fri, 24 Jan 2025 at 17:15, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Peter Maydell <peter.maydell@linaro.org> writes:
>
> > Add a test case which tests some corner case behaviour of
> > fused-multiply-add on x86:
> >  * 0 * Inf + SNaN should raise Invalid
> >  * 0 * Inf + QNaN shouldh not raise Invalid
> >  * tininess should be detected after rounding

> > +static testdata tests[] = {
> > +    { 0, 0x7ff0000000000000, 0x7ff000000000aaaa, false, /* 0 * Inf + SNaN */
> > +      0x7ff800000000aaaa, 1 }, /* Should be QNaN and does raise Invalid */
> > +    { 0, 0x7ff0000000000000, 0x7ff800000000aaaa, false, /* 0 * Inf + QNaN */
> > +      0x7ff800000000aaaa, 0 }, /* Should be QNaN and does *not* raise Invalid */
> > +    /*
> > +     * These inputs give a result which is tiny before rounding but which
> > +     * becomes non-tiny after rounding. x86 is a "detect tininess after
> > +     * rounding" architecture, so it should give a non-denormal result and
> > +     * not set the Underflow flag (only the Precision flag for an inexact
> > +     * result).
> > +     */
> > +    { 0x3fdfffffffffffff, 0x001fffffffffffff, 0x801fffffffffffff, false,
> > +      0x8010000000000000, 0x20 },
> > +    /*
> > +     * Flushing of denormal outputs to zero should also happen after
> > +     * rounding, so setting FTZ should not affect the result or the flags.
> > +     * QEMU currently does not emulate this correctly because we do the
> > +     * flush-to-zero check before rounding, so we incorrectly produce a
> > +     * zero result and set Underflow as well as Precision.
> > +     */
> > +#ifdef ENABLE_FAILING_TESTS
> > +    { 0x3fdfffffffffffff, 0x001fffffffffffff, 0x801fffffffffffff, true,
> > +      0x8010000000000000, 0x20 }, /* Enabling FTZ shouldn't change flags */
> > +#endif
>
> We could extend the multiarch/float_madds test to handle doubles as well
> (or create a new multiarch test).

This test case is specifically testing a corner case of x86
semantics -- on Arm, for instance, you would not get the same
result/flags, because Arm does tininess and flushing-of-denormal
before rounding, and Arm does raise Invalid for 0 * Inf + QNaN.
So I'm not sure that a multiarch test would be possible.

-- PMM


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 24/76] fpu: allow flushing of output denormals to be after rounding
  2025-01-25 16:41   ` Richard Henderson
@ 2025-01-27 10:01     ` Peter Maydell
  2025-01-27 16:09       ` Richard Henderson
  2025-01-29 13:04     ` Peter Maydell
  1 sibling, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-27 10:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Sat, 25 Jan 2025 at 16:42, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 1/24/25 08:27, Peter Maydell wrote:
> > Currently we handle flushing of output denormals in uncanon_normal
> > always before we deal with rounding.  This works for architectures
> > that detect tininess before rounding, but is usually not the right
> > place when the architecture detects tininess after rounding.  For
> > example, for x86 the SDM states that the MXCSR FTZ control bit causes
> > outputs to be flushed to zero "when it detects a floating-point
> > underflow condition".  This means that we mustn't flush to zero if
> > the input is such that after rounding it is no longer tiny.
> >
> > At least one of our guest architectures does underflow detection
> > after rounding but flushing of denormals before rounding (MIPS MSA);
>
> Whacky, but yes, I see that in the msa docs.
>
> > Add an ftz_detection flag.  For consistency with
> > tininess_before_rounding, we make it default to "detect ftz after
> > rounding"; this means that we need to explicitly set the flag to
> > "detect ftz before rounding" on every existing architecture that sets
> > flush_to_zero, so that this commit has no behaviour change.
> > (This means more code change here but for the long term a less
> > confusing API.)
>
> Do we really want flush_to_zero to be separate from ftz_detection?
>
> E.g.
>
> enum {
>    float_ftz_disabled,
>    float_ftz_after_rounding,
>    float_ftz_before_rounding,
> }

I did consider that, but on almost all targets the "before
or after rounding" setting is constant for the life of the
emulation, whereas turning ftz on and off via a status register
bit is common. I preferred to leave it so that you could continue
to write:
 set_flush_to_zero((vscr >> VSCR_NJ) & 1, &env->vec_status)
or whatever, rather than having to switch to
  (vscr >> VSCR_NJ) ? float_ftz_before_rounding : float_ftz_disabled.
which in addition to being more longwinded also means that the
"is this architecture ftz before or after rounding" setting is
scattered in multiple places, wherever it turns FTZ on or off.
And for Arm it gets more awkward, because the FZ bit is
"turn FTZ on or off, whatever its current semantics are", so
you end up needing "FZ ? AH ? after_rounding : before_rounding : disabled".

Keeping the on/off and the "what semantics is your architecture"
separate questions I think is simpler.

> BTW, I'm not keen on your "detect_*" names, without "float_" prefix like (almost?)
> everything else.

Yes, I'm not super enthused about them either. Happy to switch
to something else. We're not very consistent about 'float',
though: eg set_flush_to_zero, set_snan_bit_is_one,
set_flush_inputs_to_zero.

-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 24/76] fpu: allow flushing of output denormals to be after rounding
  2025-01-27 10:01     ` Peter Maydell
@ 2025-01-27 16:09       ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-27 16:09 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel

On 1/27/25 02:01, Peter Maydell wrote:
> On Sat, 25 Jan 2025 at 16:42, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 1/24/25 08:27, Peter Maydell wrote:
>>> Currently we handle flushing of output denormals in uncanon_normal
>>> always before we deal with rounding.  This works for architectures
>>> that detect tininess before rounding, but is usually not the right
>>> place when the architecture detects tininess after rounding.  For
>>> example, for x86 the SDM states that the MXCSR FTZ control bit causes
>>> outputs to be flushed to zero "when it detects a floating-point
>>> underflow condition".  This means that we mustn't flush to zero if
>>> the input is such that after rounding it is no longer tiny.
>>>
>>> At least one of our guest architectures does underflow detection
>>> after rounding but flushing of denormals before rounding (MIPS MSA);
>>
>> Whacky, but yes, I see that in the msa docs.
>>
>>> Add an ftz_detection flag.  For consistency with
>>> tininess_before_rounding, we make it default to "detect ftz after
>>> rounding"; this means that we need to explicitly set the flag to
>>> "detect ftz before rounding" on every existing architecture that sets
>>> flush_to_zero, so that this commit has no behaviour change.
>>> (This means more code change here but for the long term a less
>>> confusing API.)
>>
>> Do we really want flush_to_zero to be separate from ftz_detection?
>>
>> E.g.
>>
>> enum {
>>     float_ftz_disabled,
>>     float_ftz_after_rounding,
>>     float_ftz_before_rounding,
>> }
> 
> I did consider that, but on almost all targets the "before
> or after rounding" setting is constant for the life of the
> emulation, whereas turning ftz on and off via a status register
> bit is common. I preferred to leave it so that you could continue
> to write:
>   set_flush_to_zero((vscr >> VSCR_NJ) & 1, &env->vec_status)
> or whatever, rather than having to switch to
>    (vscr >> VSCR_NJ) ? float_ftz_before_rounding : float_ftz_disabled.
> which in addition to being more longwinded also means that the
> "is this architecture ftz before or after rounding" setting is
> scattered in multiple places, wherever it turns FTZ on or off.

Fair.

> And for Arm it gets more awkward, because the FZ bit is
> "turn FTZ on or off, whatever its current semantics are", so
> you end up needing "FZ ? AH ? after_rounding : before_rounding : disabled".

Ah, yes.


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 65/76] target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
  2025-01-24 16:28 ` [PATCH 65/76] target/arm: Handle FPCR.AH in negation step in SVE " Peter Maydell
  2025-01-26 13:19   ` Richard Henderson
@ 2025-01-27 20:41   ` Richard Henderson
  1 sibling, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-27 20:41 UTC (permalink / raw)
  To: qemu-devel

On 1/24/25 08:28, Peter Maydell wrote:
> @@ -4910,9 +4958,15 @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
>               if (likely((pg >> (i & 63)) & 1)) {
>                   float64 e1, e2, e3, r;
>   
> -                e1 = *(uint64_t *)(vn + i) ^ neg1;
> +                e1 = *(uint64_t *)(vn + i);
>                   e2 = *(uint64_t *)(vm + i);
> -                e3 = *(uint64_t *)(va + i) ^ neg3;
> +                e3 = *(uint64_t *)(va + i);
> +                if (neg1 && !(fpcr_ah && float64_is_any_nan(e1))) {
> +                    e1 ^= neg1;
> +                }
> +                if (neg3 && !(fpcr_ah && float64_is_any_nan(e3))) {
> +                    e3 ^= neg3;
> +                }
>                   r = float64_muladd(e1, e2, e3, 0, status);

It occurs to me that with AH=1, we can use the float_muladd_* flags.
We couldn't use those for AH=0, because there we *need* to negate NaNs.


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 07/76] target/arm: Use vfp.fp_status_a64 in A64-only helper functions
  2025-01-25 15:15   ` Richard Henderson
@ 2025-01-28 12:35     ` Peter Maydell
  2025-01-28 15:54       ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-28 12:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Sat, 25 Jan 2025 at 15:16, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 1/24/25 08:27, Peter Maydell wrote:
> > @@ -2808,7 +2808,7 @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
> >        */
> >       bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
> >
> > -    *statusp = env->vfp.fp_status;
> > +    *statusp = env->vfp.fp_status_a64;
> >       set_default_nan_mode(true, statusp);
> >
> >       if (ebf) {
>
> Is this really correct?  !ebf includes aa32.

Whoops, yes. I'll drop this hunk of the patch and put in this
patch afterwards:

Author: Peter Maydell <peter.maydell@linaro.org>
Date:   Tue Jan 28 11:40:13 2025 +0000

    target/arm: Use fp_status_a64 or fp_status_a32 in is_ebf()

    In is_ebf(), we might be called for A64 or A32, but we have
    the CPUARMState* so we can select fp_status_a64 or
    fp_status_a32 accordingly.

    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 011726a72d4..2ba1f7cb32e 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2808,7 +2808,7 @@ bool is_ebf(CPUARMState *env, float_status
*statusp, float_status *oddstatusp)
      */
     bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;

-    *statusp = env->vfp.fp_status;
+    *statusp = is_a64(env) ? env->vfp.fp_status_a64 : env->vfp.fp_status_a32;
     set_default_nan_mode(true, statusp);

     if (ebf) {

thanks
-- PMM


^ permalink raw reply related	[flat|nested] 167+ messages in thread

* Re: [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES
  2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
                   ` (76 preceding siblings ...)
  2025-01-24 16:35 ` [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
@ 2025-01-28 13:23 ` Peter Maydell
  77 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2025-01-28 13:23 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

On Fri, 24 Jan 2025 at 16:28, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> This patchset implements emulation of the Arm FEAT_AFP and FEAT_RPRES
> extensions, which are floating-point related.

I plan to take into target-arm.next the following patches
from the start of this series: 3-21, 25, 26. (Paolo has
already picked up 1 and 2.)

I'm making the renaming of the FPCR_* constants RTH suggested
as I do that (i.e. not reposting the series just for that).
I'm also making the correction to the is_ebf() function
discussed in the review subthread on patch 7.

Handling the review comments on the rest of the series
will take a bit more time/work, but at least this way
v2 will be a bit smaller...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 07/76] target/arm: Use vfp.fp_status_a64 in A64-only helper functions
  2025-01-28 12:35     ` Peter Maydell
@ 2025-01-28 15:54       ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-28 15:54 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel

On 1/28/25 04:35, Peter Maydell wrote:
> On Sat, 25 Jan 2025 at 15:16, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 1/24/25 08:27, Peter Maydell wrote:
>>> @@ -2808,7 +2808,7 @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
>>>         */
>>>        bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
>>>
>>> -    *statusp = env->vfp.fp_status;
>>> +    *statusp = env->vfp.fp_status_a64;
>>>        set_default_nan_mode(true, statusp);
>>>
>>>        if (ebf) {
>>
>> Is this really correct?  !ebf includes aa32.
> 
> Whoops, yes. I'll drop this hunk of the patch and put in this
> patch afterwards:
> 
> Author: Peter Maydell <peter.maydell@linaro.org>
> Date:   Tue Jan 28 11:40:13 2025 +0000
> 
>      target/arm: Use fp_status_a64 or fp_status_a32 in is_ebf()
> 
>      In is_ebf(), we might be called for A64 or A32, but we have
>      the CPUARMState* so we can select fp_status_a64 or
>      fp_status_a32 accordingly.
> 
>      Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> 
> diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
> index 011726a72d4..2ba1f7cb32e 100644
> --- a/target/arm/tcg/vec_helper.c
> +++ b/target/arm/tcg/vec_helper.c
> @@ -2808,7 +2808,7 @@ bool is_ebf(CPUARMState *env, float_status
> *statusp, float_status *oddstatusp)
>        */
>       bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
> 
> -    *statusp = env->vfp.fp_status;
> +    *statusp = is_a64(env) ? env->vfp.fp_status_a64 : env->vfp.fp_status_a32;
>       set_default_nan_mode(true, statusp);

That'll do.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 24/76] fpu: allow flushing of output denormals to be after rounding
  2025-01-25 16:41   ` Richard Henderson
  2025-01-27 10:01     ` Peter Maydell
@ 2025-01-29 13:04     ` Peter Maydell
  2025-01-31 13:36       ` Richard Henderson
  1 sibling, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-29 13:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Sat, 25 Jan 2025 at 16:42, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 1/24/25 08:27, Peter Maydell wrote:
> > Currently we handle flushing of output denormals in uncanon_normal
> > always before we deal with rounding.  This works for architectures
> > that detect tininess before rounding, but is usually not the right
> > place when the architecture detects tininess after rounding.  For
> > example, for x86 the SDM states that the MXCSR FTZ control bit causes
> > outputs to be flushed to zero "when it detects a floating-point
> > underflow condition".  This means that we mustn't flush to zero if
> > the input is such that after rounding it is no longer tiny.
> >
> > At least one of our guest architectures does underflow detection
> > after rounding but flushing of denormals before rounding (MIPS MSA);
>
> Whacky, but yes, I see that in the msa docs.

> BTW, I'm not keen on your "detect_*" names, without "float_" prefix like (almost?)
> everything else.

Do you have a suggestion for better naming? Maybe
 set_float_detect_ftz()
 get_float_detect_ftz()
to match set/get_float_detect_tininess()? Though "detect"
isn't quite the right verb, I feel...

And for the enum

typedef enum __attribute__((__packed__)) {
    float_ftz_after_rounding = 0,
    float_ftz_before_rounding = 1,
} FloatFTZDetection;

?

(the detect_tininess functions work on a 'bool tininess_before_rounding'
field in float_status, but I think I prefer the enum here, since
what we're setting doesn't have an obvious "on/off" that a bool
would be the natural representation for, unlike e.g. flush_to_zero.)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 45/76] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
  2025-01-26 12:43   ` Richard Henderson
@ 2025-01-31 13:09     ` Peter Maydell
  2025-01-31 13:37       ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2025-01-31 13:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Sun, 26 Jan 2025 at 12:44, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 1/24/25 08:28, Peter Maydell wrote:
> > diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
> > index 05036089dd7..406d76e1129 100644
> > --- a/target/arm/tcg/helper-a64.c
> > +++ b/target/arm/tcg/helper-a64.c
> > @@ -399,6 +399,42 @@ float32 HELPER(fcvtx_f64_to_f32)(float64 a, float_status *fpst)
> >       return r;
> >   }
> >
> > +/*
> > + * AH=1 min/max have some odd special cases:
> > + * comparing two zeroes (even of different sign), (NaN, anything),
> > + * or (anything, NaN) should return the second argument (possibly
> > + * squashed to zero).
> > + * Also, denormal outputs are not squashed to zero regardless of FZ or FZ16.
> > + */
> > +#define AH_MINMAX_HELPER(NAME, CTYPE, FLOATTYPE, MINMAX)                \
> > +    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
> > +    {                                                                   \
> > +        bool save;                                                      \
> > +        CTYPE r;                                                        \
> > +        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
> > +        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
> > +        if (FLOATTYPE ## _is_zero(a) && FLOATTYPE ## _is_zero(b)) {     \
>
> The comment says "even of different sign", the pseudocode explicitly checks different
> sign.  But of course if they're the same sign a and b are indistinguishable.  Perhaps
> slightly different wording?

Sure. I changed from "(even of different sign)" to
"(regardless of sign)". Let me know if you have a
more specific tweak you'd like.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 24/76] fpu: allow flushing of output denormals to be after rounding
  2025-01-29 13:04     ` Peter Maydell
@ 2025-01-31 13:36       ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-31 13:36 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel

On 1/29/25 05:04, Peter Maydell wrote:
> On Sat, 25 Jan 2025 at 16:42, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 1/24/25 08:27, Peter Maydell wrote:
>>> Currently we handle flushing of output denormals in uncanon_normal
>>> always before we deal with rounding.  This works for architectures
>>> that detect tininess before rounding, but is usually not the right
>>> place when the architecture detects tininess after rounding.  For
>>> example, for x86 the SDM states that the MXCSR FTZ control bit causes
>>> outputs to be flushed to zero "when it detects a floating-point
>>> underflow condition".  This means that we mustn't flush to zero if
>>> the input is such that after rounding it is no longer tiny.
>>>
>>> At least one of our guest architectures does underflow detection
>>> after rounding but flushing of denormals before rounding (MIPS MSA);
>>
>> Whacky, but yes, I see that in the msa docs.
> 
>> BTW, I'm not keen on your "detect_*" names, without "float_" prefix like (almost?)
>> everything else.
> 
> Do you have a suggestion for better naming? Maybe
>   set_float_detect_ftz()
>   get_float_detect_ftz()
> to match set/get_float_detect_tininess()? Though "detect"
> isn't quite the right verb, I feel...
> 
> And for the enum
> 
> typedef enum __attribute__((__packed__)) {
>      float_ftz_after_rounding = 0,
>      float_ftz_before_rounding = 1,
> } FloatFTZDetection;

The enum looks good.  The accessors are harder.  Maybe set_ftz_detection, matching the enum?


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 45/76] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
  2025-01-31 13:09     ` Peter Maydell
@ 2025-01-31 13:37       ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2025-01-31 13:37 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel

On 1/31/25 05:09, Peter Maydell wrote:
>>> +/*
>>> + * AH=1 min/max have some odd special cases:
>>> + * comparing two zeroes (even of different sign), (NaN, anything),
>>> + * or (anything, NaN) should return the second argument (possibly
>>> + * squashed to zero).
>>> + * Also, denormal outputs are not squashed to zero regardless of FZ or FZ16.
>>> + */
>>> +#define AH_MINMAX_HELPER(NAME, CTYPE, FLOATTYPE, MINMAX)                \
>>> +    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
>>> +    {                                                                   \
>>> +        bool save;                                                      \
>>> +        CTYPE r;                                                        \
>>> +        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
>>> +        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
>>> +        if (FLOATTYPE ## _is_zero(a) && FLOATTYPE ## _is_zero(b)) {     \
>>
>> The comment says "even of different sign", the pseudocode explicitly checks different
>> sign.  But of course if they're the same sign a and b are indistinguishable.  Perhaps
>> slightly different wording?
> 
> Sure. I changed from "(even of different sign)" to
> "(regardless of sign)". Let me know if you have a
> more specific tweak you'd like.

Sounds good.


r~


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [PATCH 27/76] target/arm: Define FPCR AH, FIZ, NEP bits
  2025-01-25 17:08   ` Richard Henderson
@ 2025-01-31 17:05     ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2025-01-31 17:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Sat, 25 Jan 2025 at 17:08, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 1/24/25 08:27, Peter Maydell wrote:
> > diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
> > index 3c8f3e65887..8c79ab4fc8a 100644
> > --- a/target/arm/vfp_helper.c
> > +++ b/target/arm/vfp_helper.c
> > @@ -242,6 +242,9 @@ static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask)
> >       if (!cpu_isar_feature(any_fp16, cpu)) {
> >           val &= ~FPCR_FZ16;
> >       }
> > +    if (!cpu_isar_feature(aa64_afp, cpu)) {
> > +        val &= ~(FPCR_FIZ | FPCR_AH | FPCR_NEP);
> > +    }
>
> I suppose this aa64 check, without is_a64(), is ok because the a32 caller has already
> applied FPSCR_FPCR_MASK.  And similarly for the ebf16 check below.
>
> >
> >       if (!cpu_isar_feature(aa64_ebf16, cpu)) {
> >           val &= ~FPCR_EBF;
>
> But it does feel like we could usefully move these to vfp_set_fpcr, or such?

I dunno, having all the feature tests in one place makes
sense to me. Since we're already doing it here for aa64_ebf16,
I think I prefer to keep the aa64_afp check the same way.
This series is big enough as it is without adding another
cleanup...

thanks
-- PMM


^ permalink raw reply	[flat|nested] 167+ messages in thread

end of thread, other threads:[~2025-01-31 17:05 UTC | newest]

Thread overview: 167+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-24 16:27 [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
2025-01-24 16:27 ` [PATCH 01/76] target/i386: Do not raise Invalid for 0 * Inf + QNaN Peter Maydell
2025-01-24 16:27 ` [PATCH 02/76] tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases Peter Maydell
2025-01-24 17:15   ` Alex Bennée
2025-01-27  9:54     ` Peter Maydell
2025-01-24 16:27 ` [PATCH 03/76] target/arm: arm_reset_sve_state() should set FPSR, not FPCR Peter Maydell
2025-01-25 15:07   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host() Peter Maydell
2025-01-25 15:07   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 05/76] target/arm: Use uint32_t " Peter Maydell
2025-01-25 15:08   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 06/76] target/arm: Define new fp_status_a32 and fp_status_a64 Peter Maydell
2025-01-25 15:12   ` Richard Henderson
2025-01-27  4:59   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 07/76] target/arm: Use vfp.fp_status_a64 in A64-only helper functions Peter Maydell
2025-01-25 15:15   ` Richard Henderson
2025-01-28 12:35     ` Peter Maydell
2025-01-28 15:54       ` Richard Henderson
2025-01-24 16:27 ` [PATCH 08/76] target/arm: Use fp_status_a32 in vjvct helper Peter Maydell
2025-01-25 15:16   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers Peter Maydell
2025-01-25 15:18   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 10/76] target/arm: Use FPST_FPCR_A32 in A32 decoder Peter Maydell
2025-01-25 15:18   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 11/76] target/arm: Use FPST_FPCR_A64 in A64 decoder Peter Maydell
2025-01-25 15:19   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 12/76] target/arm: Remove now-unused vfp.fp_status and FPST_FPCR Peter Maydell
2025-01-25 15:20   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64 Peter Maydell
2025-01-25 15:21   ` Richard Henderson
2025-01-27  5:00   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 14/76] target/arm: Use fp_status_f16_a32 in AArch32-only helpers Peter Maydell
2025-01-25 15:21   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 15/76] target/arm: Use fp_status_f16_a64 in AArch64-only helpers Peter Maydell
2025-01-25 15:22   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 16/76] target/arm: Use FPST_FPCR_F16_A32 in A32 decoder Peter Maydell
2025-01-25 15:23   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 17/76] target/arm: Use FPST_FPCR_F16_A64 in A64 decoder Peter Maydell
2025-01-25 15:23   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 18/76] target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16 Peter Maydell
2025-01-25 15:23   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 19/76] fpu: Rename float_flag_input_denormal to float_flag_input_denormal_flushed Peter Maydell
2025-01-25 15:25   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 20/76] fpu: Rename float_flag_output_denormal to float_flag_output_denormal_flushed Peter Maydell
2025-01-25 15:26   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 21/76] fpu: Fix a comment in softfloat-types.h Peter Maydell
2025-01-25 15:27   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 22/76] fpu: Add float_class_denormal Peter Maydell
2025-01-25 15:31   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 23/76] fpu: Implement float_flag_input_denormal_used Peter Maydell
2025-01-25 15:42   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 24/76] fpu: allow flushing of output denormals to be after rounding Peter Maydell
2025-01-25 16:41   ` Richard Henderson
2025-01-27 10:01     ` Peter Maydell
2025-01-27 16:09       ` Richard Henderson
2025-01-29 13:04     ` Peter Maydell
2025-01-31 13:36       ` Richard Henderson
2025-01-24 16:27 ` [PATCH 25/76] target/arm: Remove redundant advsimd float16 helpers Peter Maydell
2025-01-25 16:59   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 26/76] target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions Peter Maydell
2025-01-25 17:01   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 27/76] target/arm: Define FPCR AH, FIZ, NEP bits Peter Maydell
2025-01-25 17:08   ` Richard Henderson
2025-01-31 17:05     ` Peter Maydell
2025-01-24 16:27 ` [PATCH 28/76] target/arm: Implement FPCR.FIZ handling Peter Maydell
2025-01-25 17:25   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 29/76] target/arm: Adjust FP behaviour for FPCR.AH = 1 Peter Maydell
2025-01-25 17:27   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 30/76] target/arm: Adjust exception flag handling for AH " Peter Maydell
2025-01-25 17:29   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 31/76] target/arm: Add FPCR.AH to tbflags Peter Maydell
2025-01-25 17:30   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 32/76] target/arm: Set up float_status to use for FPCR.AH=1 behaviour Peter Maydell
2025-01-25 17:36   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 33/76] target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS Peter Maydell
2025-01-25 17:40   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 34/76] target/arm: Use FPST_FPCR_AH for BFCVT* insns Peter Maydell
2025-01-25 17:42   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 35/76] target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns Peter Maydell
2025-01-25 17:44   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 36/76] target/arm: Add FPCR.NEP to TBFLAGS Peter Maydell
2025-01-25 17:45   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 37/76] target/arm: Define and use new write_fp_*reg_merging() functions Peter Maydell
2025-01-25 17:52   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 38/76] target/arm: Handle FPCR.NEP for 3-input scalar operations Peter Maydell
2025-01-25 17:53   ` Richard Henderson
2025-01-24 16:27 ` [PATCH 39/76] target/arm: Handle FPCR.NEP for BFCVT scalar Peter Maydell
2025-01-25 17:55   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 40/76] target/arm: Handle FPCR.NEP for 1-input scalar operations Peter Maydell
2025-01-26 12:33   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 41/76] target/arm: Handle FPCR.NEP in do_cvtf_scalar() Peter Maydell
2025-01-26 12:33   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 42/76] target/arm: Handle FPCR.NEP for scalar FABS and FNEG Peter Maydell
2025-01-26 12:34   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 43/76] target/arm: Handle FPCR.NEP for FCVTXN (scalar) Peter Maydell
2025-01-26 12:36   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 44/76] target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element Peter Maydell
2025-01-26 12:36   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 45/76] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX Peter Maydell
2025-01-26 12:43   ` Richard Henderson
2025-01-31 13:09     ` Peter Maydell
2025-01-31 13:37       ` Richard Henderson
2025-01-24 16:28 ` [PATCH 46/76] target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX Peter Maydell
2025-01-26 12:45   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 47/76] target/arm: Implement FPCR.AH semantics for FMAXV and FMINV Peter Maydell
2025-01-26 12:47   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 48/76] target/arm: Implement FPCR.AH semantics for FMINP and FMAXP Peter Maydell
2025-01-26 12:49   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 49/76] target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV Peter Maydell
2025-01-26 12:51   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 50/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate Peter Maydell
2025-01-26 12:54   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 51/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector Peter Maydell
2025-01-26 12:55   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 52/76] target/arm: Implement FPCR.AH handling of negation of NaN Peter Maydell
2025-01-26 13:00   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 53/76] target/arm: Implement FPCR.AH handling for scalar FABS and FABD Peter Maydell
2025-01-26 13:01   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 54/76] target/arm: Handle FPCR.AH in vector FABD Peter Maydell
2025-01-26 13:03   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 55/76] target/arm: Handle FPCR.AH in SVE FNEG Peter Maydell
2025-01-26 13:05   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 56/76] target/arm: Handle FPCR.AH in SVE FABS Peter Maydell
2025-01-26 13:05   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 57/76] target/arm: Handle FPCR.AH in SVE FABD Peter Maydell
2025-01-26 13:06   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 58/76] target/arm: Handle FPCR.AH in negation steps in FCADD Peter Maydell
2025-01-26 13:08   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 59/76] target/arm: Handle FPCR.AH in negation steps in SVE FCADD Peter Maydell
2025-01-26 13:10   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 60/76] target/arm: Handle FPCR.AH in FMLSL Peter Maydell
2025-01-26 13:13   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 61/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns Peter Maydell
2025-01-26 13:14   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 62/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns Peter Maydell
2025-01-26 13:15   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 63/76] target/arm: Handle FPCR.AH in negation step in FMLS (indexed) Peter Maydell
2025-01-26 13:16   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 64/76] target/arm: Handle FPCR.AH in negation in FMLS (vector) Peter Maydell
2025-01-26 13:17   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 65/76] target/arm: Handle FPCR.AH in negation step in SVE " Peter Maydell
2025-01-26 13:19   ` Richard Henderson
2025-01-27 20:41   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 66/76] target/arm: Handle FPCR.AH in SVE FTSSEL Peter Maydell
2025-01-26 13:20   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 67/76] target/arm: Handle FPCR.AH in SVE FTMAD Peter Maydell
2025-01-26 13:21   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 68/76] target/arm: Enable FEAT_AFP for '-cpu max' Peter Maydell
2025-01-26 13:21   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 69/76] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper Peter Maydell
2025-01-26 13:23   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 70/76] target/arm: Implement increased precision FRECPE Peter Maydell
2025-01-26 13:26   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 71/76] target/arm: Implement increased precision FRSQRTE Peter Maydell
2025-01-26 13:28   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 72/76] target/arm: Enable FEAT_RPRES for -cpu max Peter Maydell
2025-01-26 13:29   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 73/76] target/i386: Detect flush-to-zero after rounding Peter Maydell
2025-01-26 13:30   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 74/76] target/i386: Use correct type for get_float_exception_flags() values Peter Maydell
2025-01-26 13:30   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 75/76] target/i386: Wire up MXCSR.DE and FPUS.DE correctly Peter Maydell
2025-01-26 13:31   ` Richard Henderson
2025-01-24 16:28 ` [PATCH 76/76] tests/tcg/x86_64/fma: add test for exact-denormal output Peter Maydell
2025-01-26 13:32   ` Richard Henderson
2025-01-24 16:35 ` [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES Peter Maydell
2025-01-28 13:23 ` Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).