* [PATCH 0/9] target/arm: Refactor FPCR/FPSR handling to prepare for FEAT_AFP
@ 2024-06-28 14:23 Peter Maydell
2024-06-28 14:23 ` [PATCH 1/9] target/arm: Correct comments about M-profile FPSCR Peter Maydell
` (8 more replies)
0 siblings, 9 replies; 20+ messages in thread
From: Peter Maydell @ 2024-06-28 14:23 UTC (permalink / raw)
To: qemu-arm, qemu-devel
In AArch32, the floating point control and status bits are all in a
single register, FPSCR. In AArch64, these were split into separate
FPCR and FPSR registers, but the bit layouts remained the same, with
no overlaps, so that you could construct an FPSCR value by ORing FPCR
and FPSR, or equivalently could produce FPSR and FPCR by masking an
FPSCR value. For QEMU's implementation, we opted to use masking to
produce FPSR and FPCR, because we started with an AArch32
implementation of FPSCR.
The addition of the (AArch64-only) FEAT_AFP adds new bits to the FPCR
which overlap with some bits in the FPSR. This means we'll no longer
be able to consider the FPSCR-encoded value as the primary one, but
instead need to treat FPSR/FPCR as the primary encoding and construct
the FPSCR from those. (This remains possible because the FEAT_AFP
bits in FPCR don't appear in the FPSCR.)
This patchseries does the necessary refactoring to our handling
of FPCR and FPSR to allow us to implement the new FPCR-only bits:
* we flip vfp_{get,set}_fpscr() and vfp_{get,set}_{fpcr,fpsr}()
so that the former call the latter rather than the other way round
* we make the migration code send FPSR and FPCR separately when there's
a bit set which can't be represented in the FPSCR format
(we continue to use the FPSCR format on the wire when possible,
for migration backwards-compatibility)
* we store the FPCR and FPSR in the CPU state struct in new
vfp.fpcr and vfp.fpsr fields, rather than in vfp.xregs[ARM_VFP_FPSCR]
* we make sure that writes to FPSCR leave bits in FPCR that aren't
in FPSCR alone
The series also has a bit of tidyup of some of the macro naming etc.
thanks
-- PMM
Peter Maydell (9):
target/arm: Correct comments about M-profile FPSCR
target/arm: Make vfp_get_fpscr() call vfp_get_{fpcr,fpsr}
target/arm: Make vfp_set_fpscr() call vfp_set_{fpcr,fpsr}
target/arm: Support migration when FPSR/FPCR won't fit in the FPSCR
target/arm: Implement store_cpu_field_low32() macro
target/arm: Store FPSR and FPCR in separate CPU state fields
target/arm: Rename FPCR_ QC, NZCV macros to FPSR_
target/arm: Rename FPSR_MASK and FPCR_MASK and define them
symbolically
target/arm: Allow FPCR bits that aren't in FPSCR
target/arm/cpu.h | 109 ++++++++++++-----
target/arm/tcg/translate-a32.h | 7 ++
target/arm/tcg/translate.h | 3 +-
target/arm/machine.c | 135 ++++++++++++++++++++-
target/arm/tcg/mve_helper.c | 12 +-
target/arm/tcg/translate-m-nocp.c | 22 ++--
target/arm/tcg/translate-vfp.c | 4 +-
target/arm/vfp_helper.c | 187 +++++++++++++++++++-----------
8 files changed, 356 insertions(+), 123 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/9] target/arm: Correct comments about M-profile FPSCR
2024-06-28 14:23 [PATCH 0/9] target/arm: Refactor FPCR/FPSR handling to prepare for FEAT_AFP Peter Maydell
@ 2024-06-28 14:23 ` Peter Maydell
2024-06-28 14:57 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 2/9] target/arm: Make vfp_get_fpscr() call vfp_get_{fpcr, fpsr} Peter Maydell
` (7 subsequent siblings)
8 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2024-06-28 14:23 UTC (permalink / raw)
To: qemu-arm, qemu-devel
The M-profile FPSCR LTPSIZE is bits [18:16]; this is the same
field as A-profile FPSCR Len, not Stride. Correct the comment
in vfp_get_fpscr().
We also implemented M-profile FPSCR.QC, but forgot to delete
a TODO comment from vfp_set_fpscr(); remove it now.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/vfp_helper.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index ce26b8a71a1..dd67825270b 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -176,8 +176,8 @@ uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
| (env->vfp.vec_stride << 20);
/*
- * M-profile LTPSIZE overlaps A-profile Stride; whichever of the
- * two is not applicable to this CPU will always be zero.
+ * M-profile LTPSIZE is the same bits [18:16] as A-profile Len; whichever
+ * of the two is not applicable to this CPU will always be zero.
*/
fpscr |= env->v7m.ltpsize << 16;
@@ -226,7 +226,6 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
/*
* The bit we set within fpscr_q is arbitrary; the register as a
* whole being zero/non-zero is what counts.
- * TODO: M-profile MVE also has a QC bit.
*/
env->vfp.qc[0] = val & FPCR_QC;
env->vfp.qc[1] = 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/9] target/arm: Make vfp_get_fpscr() call vfp_get_{fpcr, fpsr}
2024-06-28 14:23 [PATCH 0/9] target/arm: Refactor FPCR/FPSR handling to prepare for FEAT_AFP Peter Maydell
2024-06-28 14:23 ` [PATCH 1/9] target/arm: Correct comments about M-profile FPSCR Peter Maydell
@ 2024-06-28 14:23 ` Peter Maydell
2024-06-28 15:37 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 3/9] target/arm: Make vfp_set_fpscr() call vfp_set_{fpcr, fpsr} Peter Maydell
` (6 subsequent siblings)
8 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2024-06-28 14:23 UTC (permalink / raw)
To: qemu-arm, qemu-devel
In AArch32, the floating point control and status bits are all in a
single register, FPSCR. In AArch64, these were split into separate
FPCR and FPSR registers, but the bit layouts remained the same, with
no overlaps, so that you could construct an FPSCR value by ORing FPCR
and FPSR, or equivalently could produce FPSR and FPCR by masking an
FPSCR value. For QEMU's implementation, we opted to use masking to
produce FPSR and FPCR, because we started with an AArch32
implementation of FPSCR.
The addition of the (AArch64-only) FEAT_AFP adds new bits to the FPCR
which overlap with some bits in the FPSR. This means we'll no longer
be able to consider the FPSCR-encoded value as the primary one, but
instead need to treat FPSR/FPCR as the primary encoding and construct
the FPSCR from those. (This remains possible because the FEAT_AFP
bits in FPCR don't appear in the FPSCR.)
As the first step in this refactoring, make vfp_get_fpscr() call
vfp_get_fpcr() and vfp_get_fpsr(), instead of the other way around.
Note that vfp_get_fpcsr_from_host() returns only bits in the FPSR
(for the cumulative fp exception bits), so we can simply rename
it without needing to add a new function for getting FPCR bits.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/cpu.h | 24 +++++++++++++++---------
target/arm/vfp_helper.c | 34 ++++++++++++++++++++++------------
2 files changed, 37 insertions(+), 21 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 3841359d0f1..68a9922f88e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1714,10 +1714,21 @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
#define FPCR_NZCV_MASK (FPCR_N | FPCR_Z | FPCR_C | FPCR_V)
#define FPCR_NZCVQC_MASK (FPCR_NZCV_MASK | FPCR_QC)
-static inline uint32_t vfp_get_fpsr(CPUARMState *env)
-{
- return vfp_get_fpscr(env) & FPSR_MASK;
-}
+/**
+ * vfp_get_fpsr: read the AArch64 FPSR
+ * @env: CPU context
+ *
+ * Return the current AArch64 FPSR value
+ */
+uint32_t vfp_get_fpsr(CPUARMState *env);
+
+/**
+ * vfp_get_fpcr: read the AArch64 FPCR
+ * @env: CPU context
+ *
+ * Return the current AArch64 FPCR value
+ */
+uint32_t vfp_get_fpcr(CPUARMState *env);
static inline void vfp_set_fpsr(CPUARMState *env, uint32_t val)
{
@@ -1725,11 +1736,6 @@ static inline void vfp_set_fpsr(CPUARMState *env, uint32_t val)
vfp_set_fpscr(env, new_fpscr);
}
-static inline uint32_t vfp_get_fpcr(CPUARMState *env)
-{
- return vfp_get_fpscr(env) & FPCR_MASK;
-}
-
static inline void vfp_set_fpcr(CPUARMState *env, uint32_t val)
{
uint32_t new_fpscr = (vfp_get_fpscr(env) & ~FPCR_MASK) | (val & FPCR_MASK);
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index dd67825270b..a87d39e4d9b 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -85,7 +85,7 @@ static inline int vfp_exceptbits_to_host(int target_bits)
return host_bits;
}
-static uint32_t vfp_get_fpscr_from_host(CPUARMState *env)
+static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
{
uint32_t i;
@@ -156,7 +156,7 @@ static void vfp_set_fpscr_to_host(CPUARMState *env, uint32_t val)
#else
-static uint32_t vfp_get_fpscr_from_host(CPUARMState *env)
+static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
{
return 0;
}
@@ -167,26 +167,36 @@ static void vfp_set_fpscr_to_host(CPUARMState *env, uint32_t val)
#endif
-uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
+uint32_t vfp_get_fpcr(CPUARMState *env)
{
- uint32_t i, fpscr;
-
- fpscr = env->vfp.xregs[ARM_VFP_FPSCR]
- | (env->vfp.vec_len << 16)
- | (env->vfp.vec_stride << 20);
+ uint32_t fpcr = (env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_MASK)
+ | (env->vfp.vec_len << 16)
+ | (env->vfp.vec_stride << 20);
/*
* M-profile LTPSIZE is the same bits [18:16] as A-profile Len; whichever
* of the two is not applicable to this CPU will always be zero.
*/
- fpscr |= env->v7m.ltpsize << 16;
+ fpcr |= env->v7m.ltpsize << 16;
- fpscr |= vfp_get_fpscr_from_host(env);
+ return fpcr;
+}
+
+uint32_t vfp_get_fpsr(CPUARMState *env)
+{
+ uint32_t fpsr = env->vfp.xregs[ARM_VFP_FPSCR] & FPSR_MASK;
+ uint32_t i;
+
+ fpsr |= vfp_get_fpsr_from_host(env);
i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3];
- fpscr |= i ? FPCR_QC : 0;
+ fpsr |= i ? FPCR_QC : 0;
+ return fpsr;
+}
- return fpscr;
+uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
+{
+ return (vfp_get_fpcr(env) & FPCR_MASK) | (vfp_get_fpsr(env) & FPSR_MASK);
}
uint32_t vfp_get_fpscr(CPUARMState *env)
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/9] target/arm: Make vfp_set_fpscr() call vfp_set_{fpcr, fpsr}
2024-06-28 14:23 [PATCH 0/9] target/arm: Refactor FPCR/FPSR handling to prepare for FEAT_AFP Peter Maydell
2024-06-28 14:23 ` [PATCH 1/9] target/arm: Correct comments about M-profile FPSCR Peter Maydell
2024-06-28 14:23 ` [PATCH 2/9] target/arm: Make vfp_get_fpscr() call vfp_get_{fpcr, fpsr} Peter Maydell
@ 2024-06-28 14:23 ` Peter Maydell
2024-06-28 15:50 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 4/9] target/arm: Support migration when FPSR/FPCR won't fit in the FPSCR Peter Maydell
` (5 subsequent siblings)
8 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2024-06-28 14:23 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Make vfp_set_fpscr() call vfp_set_fpsr() and vfp_set_fpcr()
instead of the other way around.
The masking we do when getting and setting vfp.xregs[ARM_VFP_FPSCR]
is a little awkward, but we are going to change where we store the
underlying FPSR and FPCR information in a later commit, so it will
go away then.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/cpu.h | 22 +++++----
target/arm/vfp_helper.c | 100 ++++++++++++++++++++++++++--------------
2 files changed, 78 insertions(+), 44 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 68a9922f88e..0a570afcab4 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1730,17 +1730,19 @@ uint32_t vfp_get_fpsr(CPUARMState *env);
*/
uint32_t vfp_get_fpcr(CPUARMState *env);
-static inline void vfp_set_fpsr(CPUARMState *env, uint32_t val)
-{
- uint32_t new_fpscr = (vfp_get_fpscr(env) & ~FPSR_MASK) | (val & FPSR_MASK);
- vfp_set_fpscr(env, new_fpscr);
-}
+/**
+ * vfp_set_fpsr: write the AArch64 FPSR
+ * @env: CPU context
+ * @value: new value
+ */
+void vfp_set_fpsr(CPUARMState *env, uint32_t value);
-static inline void vfp_set_fpcr(CPUARMState *env, uint32_t val)
-{
- uint32_t new_fpscr = (vfp_get_fpscr(env) & ~FPCR_MASK) | (val & FPCR_MASK);
- vfp_set_fpscr(env, new_fpscr);
-}
+/**
+ * vfp_set_fpcr: write the AArch64 FPCR
+ * @env: CPU context
+ * @value: new value
+ */
+void vfp_set_fpcr(CPUARMState *env, uint32_t value);
enum arm_cpu_mode {
ARM_CPU_MODE_USR = 0x10,
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index a87d39e4d9b..38c8aadf9b4 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -99,14 +99,27 @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
return vfp_exceptbits_from_host(i);
}
-static void vfp_set_fpscr_to_host(CPUARMState *env, uint32_t val)
+static void vfp_set_fpsr_to_host(CPUARMState *env, uint32_t val)
+{
+ /*
+ * The exception flags are ORed together when we read fpscr so we
+ * only need to preserve the current state in one of our
+ * float_status values.
+ */
+ int i = vfp_exceptbits_to_host(val);
+ set_float_exception_flags(i, &env->vfp.fp_status);
+ set_float_exception_flags(0, &env->vfp.fp_status_f16);
+ set_float_exception_flags(0, &env->vfp.standard_fp_status);
+ set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
+}
+
+static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val)
{
- int i;
uint32_t changed = env->vfp.xregs[ARM_VFP_FPSCR];
changed ^= val;
if (changed & (3 << 22)) {
- i = (val >> 22) & 3;
+ int i = (val >> 22) & 3;
switch (i) {
case FPROUNDING_TIEEVEN:
i = float_round_nearest_even;
@@ -141,17 +154,6 @@ static void vfp_set_fpscr_to_host(CPUARMState *env, uint32_t val)
set_default_nan_mode(dnan_enabled, &env->vfp.fp_status);
set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16);
}
-
- /*
- * The exception flags are ORed together when we read fpscr so we
- * only need to preserve the current state in one of our
- * float_status values.
- */
- i = vfp_exceptbits_to_host(val);
- set_float_exception_flags(i, &env->vfp.fp_status);
- set_float_exception_flags(0, &env->vfp.fp_status_f16);
- set_float_exception_flags(0, &env->vfp.standard_fp_status);
- set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
}
#else
@@ -161,7 +163,11 @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
return 0;
}
-static void vfp_set_fpscr_to_host(CPUARMState *env, uint32_t val)
+static void vfp_set_fpsr_to_host(CPUARMState *env, uint32_t val)
+{
+}
+
+static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val)
{
}
@@ -204,7 +210,37 @@ uint32_t vfp_get_fpscr(CPUARMState *env)
return HELPER(vfp_get_fpscr)(env);
}
-void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
+void vfp_set_fpsr(CPUARMState *env, uint32_t val)
+{
+ ARMCPU *cpu = env_archcpu(env);
+
+ vfp_set_fpsr_to_host(env, val);
+
+ if (arm_feature(env, ARM_FEATURE_NEON) ||
+ cpu_isar_feature(aa32_mve, cpu)) {
+ /*
+ * The bit we set within fpscr_q is arbitrary; the register as a
+ * whole being zero/non-zero is what counts.
+ */
+ env->vfp.qc[0] = val & FPCR_QC;
+ env->vfp.qc[1] = 0;
+ env->vfp.qc[2] = 0;
+ env->vfp.qc[3] = 0;
+ }
+
+ /*
+ * The only FPSR bits we keep in vfp.xregs[FPSCR] are NZCV:
+ * the exception flags IOC|DZC|OFC|UFC|IXC|IDC are stored in
+ * fp_status, and QC is in vfp.qc[]. Store the NZCV bits there,
+ * and zero any of the other FPSR bits (but preserve the FPCR
+ * bits).
+ */
+ val &= FPCR_NZCV_MASK;
+ env->vfp.xregs[ARM_VFP_FPSCR] &= ~FPSR_MASK;
+ env->vfp.xregs[ARM_VFP_FPSCR] |= val;
+}
+
+void vfp_set_fpcr(CPUARMState *env, uint32_t val)
{
ARMCPU *cpu = env_archcpu(env);
@@ -213,7 +249,7 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
val &= ~FPCR_FZ16;
}
- vfp_set_fpscr_to_host(env, val);
+ vfp_set_fpcr_to_host(env, val);
if (!arm_feature(env, ARM_FEATURE_M)) {
/*
@@ -231,28 +267,24 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
FPCR_LTPSIZE_LENGTH);
}
- if (arm_feature(env, ARM_FEATURE_NEON) ||
- cpu_isar_feature(aa32_mve, cpu)) {
- /*
- * The bit we set within fpscr_q is arbitrary; the register as a
- * whole being zero/non-zero is what counts.
- */
- env->vfp.qc[0] = val & FPCR_QC;
- env->vfp.qc[1] = 0;
- env->vfp.qc[2] = 0;
- env->vfp.qc[3] = 0;
- }
-
/*
* We don't implement trapped exception handling, so the
* trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
*
- * The exception flags IOC|DZC|OFC|UFC|IXC|IDC are stored in
- * fp_status; QC, Len and Stride are stored separately earlier.
- * Clear out all of those and the RES0 bits: only NZCV, AHP, DN,
- * FZ, RMode and FZ16 are kept in vfp.xregs[FPSCR].
+ * The FPCR bits we keep in vfp.xregs[FPSCR] are AHP, DN, FZ, RMode
+ * and FZ16. Len, Stride and LTPSIZE we just handled. Store those bits
+ * there, and zero any of the other FPCR bits and the RES0 and RAZ/WI
+ * bits.
*/
- env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
+ val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16;
+ env->vfp.xregs[ARM_VFP_FPSCR] &= ~FPCR_MASK;
+ env->vfp.xregs[ARM_VFP_FPSCR] |= val;
+}
+
+void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
+{
+ vfp_set_fpcr(env, val & FPCR_MASK);
+ vfp_set_fpsr(env, val & FPSR_MASK);
}
void vfp_set_fpscr(CPUARMState *env, uint32_t val)
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/9] target/arm: Support migration when FPSR/FPCR won't fit in the FPSCR
2024-06-28 14:23 [PATCH 0/9] target/arm: Refactor FPCR/FPSR handling to prepare for FEAT_AFP Peter Maydell
` (2 preceding siblings ...)
2024-06-28 14:23 ` [PATCH 3/9] target/arm: Make vfp_set_fpscr() call vfp_set_{fpcr, fpsr} Peter Maydell
@ 2024-06-28 14:23 ` Peter Maydell
2024-06-28 16:01 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 5/9] target/arm: Implement store_cpu_field_low32() macro Peter Maydell
` (4 subsequent siblings)
8 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2024-06-28 14:23 UTC (permalink / raw)
To: qemu-arm, qemu-devel
To support FPSR and FPCR bits that don't exist in the AArch32 FPSCR
view of floating point control and status (such as the FEAT_AFP ones),
we need to make sure those bits can be migrated. This commit allows
that, whilst maintaining backwards and forwards migration compatibility
for CPUs where there are no such bits:
On sending:
* If either the FPCR or the FPSR include set bits that are not
visible in the AArch32 FPSCR view of floating point control/status
then we send the FPCR and FPSR as two separate fields in a new
cpu/vfp/fpcr_fpsr subsection, and we send a 0 for the old
FPSCR field in cpu/vfp
* Otherwise, we don't send the fpcr_fpsr subsection, and we send
an FPSCR-format value in cpu/vfp as we did previously
On receiving:
* if we see a non-zero FPSCR field, that is the right information
* if we see a fpcr_fpsr subsection then that has the information
* if we see neither, then FPSCR/FPCR/FPSR are all zero on the source;
cpu_pre_load() ensures the CPU state defaults to that
* if we see both, then the migration source is buggy or malicious;
either the fpcr_fpsr or the FPSCR will "win" depending which
is first in the migration stream; we don't care which that is
We make the new FPCR and FPSR on-the-wire data be 64 bits, because
architecturally these registers are that wide, and this avoids the
need to engage in further migration-compatibility contortions in
future if some new architecture revision defines bits in the high
half of either register.
(We won't ever send the new migration subsection until we add support
for a CPU feature which enables setting overlapping FPCR bits, like
FEAT_AFP.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/machine.c | 134 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 132 insertions(+), 2 deletions(-)
diff --git a/target/arm/machine.c b/target/arm/machine.c
index 0a722ca7e75..8c820955d95 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -18,6 +18,34 @@ static bool vfp_needed(void *opaque)
: cpu_isar_feature(aa32_vfp_simd, cpu));
}
+static bool vfp_fpcr_fpsr_needed(void *opaque)
+{
+ /*
+ * If either the FPCR or the FPSR include set bits that are not
+ * visible in the AArch32 FPSCR view of floating point control/status
+ * then we must send the FPCR and FPSR as two separate fields in the
+ * cpu/vfp/fpcr_fpsr subsection, and we will send a 0 for the old
+ * FPSCR field in cpu/vfp.
+ *
+ * If all the set bits are representable in an AArch32 FPSCR then we
+ * send that value as the cpu/vfp FPSCR field, and don't send the
+ * cpu/vfp/fpcr_fpsr subsection.
+ *
+ * On incoming migration, if the cpu/vfp FPSCR field is non-zero we
+ * use it, and if the fpcr_fpsr subsection is present we use that.
+ * (The subsection will never be present with a non-zero FPSCR field,
+ * and if FPSCR is zero and the subsection is not present that means
+ * that FPSCR/FPSR/FPCR are zero.)
+ *
+ * This preserves migration compatibility with older QEMU versions,
+ * in both directions.
+ */
+ ARMCPU *cpu = opaque;
+ CPUARMState *env = &cpu->env;
+
+ return (vfp_get_fpcr(env) & ~FPCR_MASK) || (vfp_get_fpsr(env) & ~FPSR_MASK);
+}
+
static int get_fpscr(QEMUFile *f, void *opaque, size_t size,
const VMStateField *field)
{
@@ -25,7 +53,10 @@ static int get_fpscr(QEMUFile *f, void *opaque, size_t size,
CPUARMState *env = &cpu->env;
uint32_t val = qemu_get_be32(f);
- vfp_set_fpscr(env, val);
+ if (val) {
+ /* 0 means we might have the data in the fpcr_fpsr subsection */
+ vfp_set_fpscr(env, val);
+ }
return 0;
}
@@ -34,8 +65,9 @@ static int put_fpscr(QEMUFile *f, void *opaque, size_t size,
{
ARMCPU *cpu = opaque;
CPUARMState *env = &cpu->env;
+ uint32_t fpscr = vfp_fpcr_fpsr_needed(opaque) ? 0 : vfp_get_fpscr(env);
- qemu_put_be32(f, vfp_get_fpscr(env));
+ qemu_put_be32(f, fpscr);
return 0;
}
@@ -45,6 +77,86 @@ static const VMStateInfo vmstate_fpscr = {
.put = put_fpscr,
};
+static int get_fpcr(QEMUFile *f, void *opaque, size_t size,
+ const VMStateField *field)
+{
+ ARMCPU *cpu = opaque;
+ CPUARMState *env = &cpu->env;
+ uint64_t val = qemu_get_be64(f);
+
+ vfp_set_fpcr(env, val);
+ return 0;
+}
+
+static int put_fpcr(QEMUFile *f, void *opaque, size_t size,
+ const VMStateField *field, JSONWriter *vmdesc)
+{
+ ARMCPU *cpu = opaque;
+ CPUARMState *env = &cpu->env;
+
+ qemu_put_be64(f, vfp_get_fpcr(env));
+ return 0;
+}
+
+static const VMStateInfo vmstate_fpcr = {
+ .name = "fpcr",
+ .get = get_fpcr,
+ .put = put_fpcr,
+};
+
+static int get_fpsr(QEMUFile *f, void *opaque, size_t size,
+ const VMStateField *field)
+{
+ ARMCPU *cpu = opaque;
+ CPUARMState *env = &cpu->env;
+ uint64_t val = qemu_get_be64(f);
+
+ vfp_set_fpsr(env, val);
+ return 0;
+}
+
+static int put_fpsr(QEMUFile *f, void *opaque, size_t size,
+ const VMStateField *field, JSONWriter *vmdesc)
+{
+ ARMCPU *cpu = opaque;
+ CPUARMState *env = &cpu->env;
+
+ qemu_put_be64(f, vfp_get_fpsr(env));
+ return 0;
+}
+
+static const VMStateInfo vmstate_fpsr = {
+ .name = "fpsr",
+ .get = get_fpsr,
+ .put = put_fpsr,
+};
+
+static const VMStateDescription vmstate_vfp_fpcr_fpsr = {
+ .name = "cpu/vfp/fpcr_fpsr",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .needed = vfp_fpcr_fpsr_needed,
+ .fields = (const VMStateField[]) {
+ {
+ .name = "fpcr",
+ .version_id = 0,
+ .size = sizeof(uint64_t),
+ .info = &vmstate_fpcr,
+ .flags = VMS_SINGLE,
+ .offset = 0,
+ },
+ {
+ .name = "fpsr",
+ .version_id = 0,
+ .size = sizeof(uint64_t),
+ .info = &vmstate_fpsr,
+ .flags = VMS_SINGLE,
+ .offset = 0,
+ },
+ VMSTATE_END_OF_LIST()
+ },
+};
+
static const VMStateDescription vmstate_vfp = {
.name = "cpu/vfp",
.version_id = 3,
@@ -100,6 +212,10 @@ static const VMStateDescription vmstate_vfp = {
.offset = 0,
},
VMSTATE_END_OF_LIST()
+ },
+ .subsections = (const VMStateDescription * const []) {
+ &vmstate_vfp_fpcr_fpsr,
+ NULL
}
};
@@ -784,6 +900,20 @@ static int cpu_pre_load(void *opaque)
ARMCPU *cpu = opaque;
CPUARMState *env = &cpu->env;
+ /*
+ * In an inbound migration where on the source FPSCR/FPSR/FPCR are 0,
+ * there will be no fpcr_fpsr subsection so we won't call vfp_set_fpcr()
+ * and vfp_set_fpsr() from get_fpcr() and get_fpsr(); also the get_fpscr()
+ * function will not call vfp_set_fpscr() because it will see a 0 in the
+ * inbound data. Ensure that in this case we have a correctly set up
+ * zero FPSCR/FPCR/FPSR.
+ *
+ * This is not strictly needed because FPSCR is zero out of reset, but
+ * it avoids the possibility of future confusing migration bugs if some
+ * future architecture change makes the reset value non-zero.
+ */
+ vfp_set_fpscr(env, 0);
+
/*
* Pre-initialize irq_line_state to a value that's never valid as
* real data, so cpu_post_load() can tell whether we've seen the
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 5/9] target/arm: Implement store_cpu_field_low32() macro
2024-06-28 14:23 [PATCH 0/9] target/arm: Refactor FPCR/FPSR handling to prepare for FEAT_AFP Peter Maydell
` (3 preceding siblings ...)
2024-06-28 14:23 ` [PATCH 4/9] target/arm: Support migration when FPSR/FPCR won't fit in the FPSCR Peter Maydell
@ 2024-06-28 14:23 ` Peter Maydell
2024-06-28 16:02 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 6/9] target/arm: Store FPSR and FPCR in separate CPU state fields Peter Maydell
` (3 subsequent siblings)
8 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2024-06-28 14:23 UTC (permalink / raw)
To: qemu-arm, qemu-devel
We already have a load_cpu_field_low32() to load the low half of a
64-bit CPU struct field to a TCGv_i32; however we haven't yet needed
the store equivalent. We'll want that in the next patch, so
implement it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/tcg/translate-a32.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/target/arm/tcg/translate-a32.h b/target/arm/tcg/translate-a32.h
index 19de6e0a1a9..0b1fa57965c 100644
--- a/target/arm/tcg/translate-a32.h
+++ b/target/arm/tcg/translate-a32.h
@@ -83,6 +83,13 @@ void store_cpu_offset(TCGv_i32 var, int offset, int size);
sizeof_field(CPUARMState, name)); \
})
+/* Store to the low half of a 64-bit field from a TCGv_i32 */
+#define store_cpu_field_low32(val, name) \
+ ({ \
+ QEMU_BUILD_BUG_ON(sizeof_field(CPUARMState, name) != 8); \
+ store_cpu_offset(val, offsetoflow32(CPUARMState, name), 4); \
+ })
+
#define store_cpu_field_constant(val, name) \
store_cpu_field(tcg_constant_i32(val), name)
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 6/9] target/arm: Store FPSR and FPCR in separate CPU state fields
2024-06-28 14:23 [PATCH 0/9] target/arm: Refactor FPCR/FPSR handling to prepare for FEAT_AFP Peter Maydell
` (4 preceding siblings ...)
2024-06-28 14:23 ` [PATCH 5/9] target/arm: Implement store_cpu_field_low32() macro Peter Maydell
@ 2024-06-28 14:23 ` Peter Maydell
2024-06-28 16:06 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 7/9] target/arm: Rename FPCR_ QC, NZCV macros to FPSR_ Peter Maydell
` (2 subsequent siblings)
8 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2024-06-28 14:23 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Now that we have refactored the set/get functions so that the FPSCR
format is no longer the authoritative one, we can keep FPSR and FPCR
in separate CPU state fields.
As well as the get and set functions, we also have a scattering of
places in the code which directly access vfp.xregs[ARM_VFP_FPSCR] to
extract single fields which are stored there. These all change to
directly access either vfp.fpsr or vfp.fpcr, depending on the
location of the field. (Most commonly, this is the NZCV flags.)
We make the field in the CPU state struct 64 bits, because
architecturally FPSR and FPCR are 64 bits. However we leave the
types of the arguments and return values of the get/set functions as
32 bits, since we don't need to make that change with the current
architecture and various callsites would be unable to handle
set bits in the high half (for instance the gdbstub protocol
assumes they're only 32 bit registers).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/cpu.h | 7 +++++++
target/arm/tcg/translate.h | 3 +--
target/arm/tcg/mve_helper.c | 12 ++++++------
target/arm/tcg/translate-m-nocp.c | 6 +++---
target/arm/tcg/translate-vfp.c | 2 +-
target/arm/vfp_helper.c | 25 ++++++++++---------------
6 files changed, 28 insertions(+), 27 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 0a570afcab4..2eb7fc3bc39 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -619,6 +619,13 @@ typedef struct CPUArchState {
int vec_len;
int vec_stride;
+ /*
+ * Floating point status and control registers. Some bits are
+ * stored separately in other fields or in the float_status below.
+ */
+ uint64_t fpsr;
+ uint64_t fpcr;
+
uint32_t xregs[16];
/* Scratch space for aa32 neon expansion. */
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index aba21f730fe..a8672c857c1 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -351,8 +351,7 @@ static inline TCGv_i32 get_ahp_flag(void)
{
TCGv_i32 ret = tcg_temp_new_i32();
- tcg_gen_ld_i32(ret, tcg_env,
- offsetof(CPUARMState, vfp.xregs[ARM_VFP_FPSCR]));
+ tcg_gen_ld_i32(ret, tcg_env, offsetoflow32(CPUARMState, vfp.fpcr));
tcg_gen_extract_i32(ret, ret, 26, 1);
return ret;
diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index 8b99736aad1..234f395b093 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -1115,21 +1115,21 @@ static void do_vadc(CPUARMState *env, uint32_t *d, uint32_t *n, uint32_t *m,
if (update_flags) {
/* Store C, clear NZV. */
- env->vfp.xregs[ARM_VFP_FPSCR] &= ~FPCR_NZCV_MASK;
- env->vfp.xregs[ARM_VFP_FPSCR] |= carry_in * FPCR_C;
+ env->vfp.fpsr &= ~FPCR_NZCV_MASK;
+ env->vfp.fpsr |= carry_in * FPCR_C;
}
mve_advance_vpt(env);
}
void HELPER(mve_vadc)(CPUARMState *env, void *vd, void *vn, void *vm)
{
- bool carry_in = env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_C;
+ bool carry_in = env->vfp.fpsr & FPCR_C;
do_vadc(env, vd, vn, vm, 0, carry_in, false);
}
void HELPER(mve_vsbc)(CPUARMState *env, void *vd, void *vn, void *vm)
{
- bool carry_in = env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_C;
+ bool carry_in = env->vfp.fpsr & FPCR_C;
do_vadc(env, vd, vn, vm, -1, carry_in, false);
}
@@ -3343,7 +3343,7 @@ static void do_vcvt_sh(CPUARMState *env, void *vd, void *vm, int top)
uint32_t *m = vm;
uint16_t r;
uint16_t mask = mve_element_mask(env);
- bool ieee = !(env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_AHP);
+ bool ieee = !(env->vfp.fpcr & FPCR_AHP);
unsigned e;
float_status *fpst;
float_status scratch_fpst;
@@ -3373,7 +3373,7 @@ static void do_vcvt_hs(CPUARMState *env, void *vd, void *vm, int top)
uint16_t *m = vm;
uint32_t r;
uint16_t mask = mve_element_mask(env);
- bool ieee = !(env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_AHP);
+ bool ieee = !(env->vfp.fpcr & FPCR_AHP);
unsigned e;
float_status *fpst;
float_status scratch_fpst;
diff --git a/target/arm/tcg/translate-m-nocp.c b/target/arm/tcg/translate-m-nocp.c
index f564d06ccf1..875f6a8725d 100644
--- a/target/arm/tcg/translate-m-nocp.c
+++ b/target/arm/tcg/translate-m-nocp.c
@@ -341,10 +341,10 @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
16, 16, qc);
}
tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
- fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
+ fpscr = load_cpu_field_low32(vfp.fpsr);
tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
tcg_gen_or_i32(fpscr, fpscr, tmp);
- store_cpu_field(fpscr, vfp.xregs[ARM_VFP_FPSCR]);
+ store_cpu_field_low32(fpscr, vfp.fpsr);
break;
}
case ARM_VFP_FPCXT_NS:
@@ -465,7 +465,7 @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
* Read just NZCV; this is a special case to avoid the
* helper call for the "VMRS to CPSR.NZCV" insn.
*/
- tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
+ tmp = load_cpu_field_low32(vfp.fpsr);
tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
storefn(s, opaque, tmp, true);
break;
diff --git a/target/arm/tcg/translate-vfp.c b/target/arm/tcg/translate-vfp.c
index 39ec971ff70..0d9788e8103 100644
--- a/target/arm/tcg/translate-vfp.c
+++ b/target/arm/tcg/translate-vfp.c
@@ -833,7 +833,7 @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
break;
case ARM_VFP_FPSCR:
if (a->rt == 15) {
- tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
+ tmp = load_cpu_field_low32(vfp.fpsr);
tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
} else {
tmp = tcg_temp_new_i32();
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 38c8aadf9b4..f7a3da7f68b 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -115,7 +115,7 @@ static void vfp_set_fpsr_to_host(CPUARMState *env, uint32_t val)
static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val)
{
- uint32_t changed = env->vfp.xregs[ARM_VFP_FPSCR];
+ uint64_t changed = env->vfp.fpcr;
changed ^= val;
if (changed & (3 << 22)) {
@@ -175,7 +175,7 @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val)
uint32_t vfp_get_fpcr(CPUARMState *env)
{
- uint32_t fpcr = (env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_MASK)
+ uint32_t fpcr = env->vfp.fpcr
| (env->vfp.vec_len << 16)
| (env->vfp.vec_stride << 20);
@@ -190,7 +190,7 @@ uint32_t vfp_get_fpcr(CPUARMState *env)
uint32_t vfp_get_fpsr(CPUARMState *env)
{
- uint32_t fpsr = env->vfp.xregs[ARM_VFP_FPSCR] & FPSR_MASK;
+ uint32_t fpsr = env->vfp.fpsr;
uint32_t i;
fpsr |= vfp_get_fpsr_from_host(env);
@@ -229,15 +229,13 @@ void vfp_set_fpsr(CPUARMState *env, uint32_t val)
}
/*
- * The only FPSR bits we keep in vfp.xregs[FPSCR] are NZCV:
+ * The only FPSR bits we keep in vfp.fpsr are NZCV:
* the exception flags IOC|DZC|OFC|UFC|IXC|IDC are stored in
* fp_status, and QC is in vfp.qc[]. Store the NZCV bits there,
- * and zero any of the other FPSR bits (but preserve the FPCR
- * bits).
+ * and zero any of the other FPSR bits.
*/
val &= FPCR_NZCV_MASK;
- env->vfp.xregs[ARM_VFP_FPSCR] &= ~FPSR_MASK;
- env->vfp.xregs[ARM_VFP_FPSCR] |= val;
+ env->vfp.fpsr = val;
}
void vfp_set_fpcr(CPUARMState *env, uint32_t val)
@@ -271,14 +269,13 @@ void vfp_set_fpcr(CPUARMState *env, uint32_t val)
* We don't implement trapped exception handling, so the
* trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
*
- * The FPCR bits we keep in vfp.xregs[FPSCR] are AHP, DN, FZ, RMode
+ * The FPCR bits we keep in vfp.fpcr are AHP, DN, FZ, RMode
* and FZ16. Len, Stride and LTPSIZE we just handled. Store those bits
* there, and zero any of the other FPCR bits and the RES0 and RAZ/WI
* bits.
*/
val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16;
- env->vfp.xregs[ARM_VFP_FPSCR] &= ~FPCR_MASK;
- env->vfp.xregs[ARM_VFP_FPSCR] |= val;
+ env->vfp.fpcr = val;
}
void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
@@ -356,8 +353,7 @@ static void softfloat_to_vfp_compare(CPUARMState *env, FloatRelation cmp)
default:
g_assert_not_reached();
}
- env->vfp.xregs[ARM_VFP_FPSCR] =
- deposit32(env->vfp.xregs[ARM_VFP_FPSCR], 28, 4, flags);
+ env->vfp.fpsr = deposit64(env->vfp.fpsr, 28, 4, flags); /* NZCV */
}
/* XXX: check quiet/signaling case */
@@ -1160,8 +1156,7 @@ uint32_t HELPER(vjcvt)(float64 value, CPUARMState *env)
uint32_t z = (pair >> 32) == 0;
/* Store Z, clear NCV, in FPSCR.NZCV. */
- env->vfp.xregs[ARM_VFP_FPSCR]
- = (env->vfp.xregs[ARM_VFP_FPSCR] & ~CPSR_NZCV) | (z * CPSR_Z);
+ env->vfp.fpsr = (env->vfp.fpsr & ~FPCR_NZCV_MASK) | (z * FPCR_Z);
return result;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 7/9] target/arm: Rename FPCR_ QC, NZCV macros to FPSR_
2024-06-28 14:23 [PATCH 0/9] target/arm: Refactor FPCR/FPSR handling to prepare for FEAT_AFP Peter Maydell
` (5 preceding siblings ...)
2024-06-28 14:23 ` [PATCH 6/9] target/arm: Store FPSR and FPCR in separate CPU state fields Peter Maydell
@ 2024-06-28 14:23 ` Peter Maydell
2024-06-28 16:07 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 8/9] target/arm: Rename FPSR_MASK and FPCR_MASK and define them symbolically Peter Maydell
2024-06-28 14:23 ` [PATCH 9/9] target/arm: Allow FPCR bits that aren't in FPSCR Peter Maydell
8 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2024-06-28 14:23 UTC (permalink / raw)
To: qemu-arm, qemu-devel
The QC, N, Z, C, V bits live in the FPSR, not the FPCR. Rename the
macros that define these bits accordingly.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/cpu.h | 17 ++++++++++-------
target/arm/tcg/mve_helper.c | 8 ++++----
target/arm/tcg/translate-m-nocp.c | 16 ++++++++--------
target/arm/tcg/translate-vfp.c | 2 +-
target/arm/vfp_helper.c | 8 ++++----
5 files changed, 27 insertions(+), 24 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 2eb7fc3bc39..9d226c474d2 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1697,6 +1697,7 @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
#define FPSR_MASK 0xf800009f
#define FPCR_MASK 0x07ff9f00
+/* FPCR bits */
#define FPCR_IOE (1 << 8) /* Invalid Operation exception trap enable */
#define FPCR_DZE (1 << 9) /* Divide by Zero exception trap enable */
#define FPCR_OFE (1 << 10) /* Overflow exception trap enable */
@@ -1708,18 +1709,20 @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
#define FPCR_FZ (1 << 24) /* Flush-to-zero enable bit */
#define FPCR_DN (1 << 25) /* Default NaN enable bit */
#define FPCR_AHP (1 << 26) /* Alternative half-precision */
-#define FPCR_QC (1 << 27) /* Cumulative saturation bit */
-#define FPCR_V (1 << 28) /* FP overflow flag */
-#define FPCR_C (1 << 29) /* FP carry flag */
-#define FPCR_Z (1 << 30) /* FP zero flag */
-#define FPCR_N (1 << 31) /* FP negative flag */
#define FPCR_LTPSIZE_SHIFT 16 /* LTPSIZE, M-profile only */
#define FPCR_LTPSIZE_MASK (7 << FPCR_LTPSIZE_SHIFT)
#define FPCR_LTPSIZE_LENGTH 3
-#define FPCR_NZCV_MASK (FPCR_N | FPCR_Z | FPCR_C | FPCR_V)
-#define FPCR_NZCVQC_MASK (FPCR_NZCV_MASK | FPCR_QC)
+/* FPSR bits */
+#define FPSR_QC (1 << 27) /* Cumulative saturation bit */
+#define FPSR_V (1 << 28) /* FP overflow flag */
+#define FPSR_C (1 << 29) /* FP carry flag */
+#define FPSR_Z (1 << 30) /* FP zero flag */
+#define FPSR_N (1 << 31) /* FP negative flag */
+
+#define FPSR_NZCV_MASK (FPSR_N | FPSR_Z | FPSR_C | FPSR_V)
+#define FPSR_NZCVQC_MASK (FPSR_NZCV_MASK | FPSR_QC)
/**
* vfp_get_fpsr: read the AArch64 FPSR
diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index 234f395b093..03ebef5ef21 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -1115,21 +1115,21 @@ static void do_vadc(CPUARMState *env, uint32_t *d, uint32_t *n, uint32_t *m,
if (update_flags) {
/* Store C, clear NZV. */
- env->vfp.fpsr &= ~FPCR_NZCV_MASK;
- env->vfp.fpsr |= carry_in * FPCR_C;
+ env->vfp.fpsr &= ~FPSR_NZCV_MASK;
+ env->vfp.fpsr |= carry_in * FPSR_C;
}
mve_advance_vpt(env);
}
void HELPER(mve_vadc)(CPUARMState *env, void *vd, void *vn, void *vm)
{
- bool carry_in = env->vfp.fpsr & FPCR_C;
+ bool carry_in = env->vfp.fpsr & FPSR_C;
do_vadc(env, vd, vn, vm, 0, carry_in, false);
}
void HELPER(mve_vsbc)(CPUARMState *env, void *vd, void *vn, void *vm)
{
- bool carry_in = env->vfp.fpsr & FPCR_C;
+ bool carry_in = env->vfp.fpsr & FPSR_C;
do_vadc(env, vd, vn, vm, -1, carry_in, false);
}
diff --git a/target/arm/tcg/translate-m-nocp.c b/target/arm/tcg/translate-m-nocp.c
index 875f6a8725d..b92773b4af5 100644
--- a/target/arm/tcg/translate-m-nocp.c
+++ b/target/arm/tcg/translate-m-nocp.c
@@ -332,7 +332,7 @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
if (dc_isar_feature(aa32_mve, s)) {
/* QC is only present for MVE; otherwise RES0 */
TCGv_i32 qc = tcg_temp_new_i32();
- tcg_gen_andi_i32(qc, tmp, FPCR_QC);
+ tcg_gen_andi_i32(qc, tmp, FPSR_QC);
/*
* The 4 vfp.qc[] fields need only be "zero" vs "non-zero";
* here writing the same value into all elements is simplest.
@@ -340,9 +340,9 @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
tcg_gen_gvec_dup_i32(MO_32, offsetof(CPUARMState, vfp.qc),
16, 16, qc);
}
- tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
+ tcg_gen_andi_i32(tmp, tmp, FPSR_NZCV_MASK);
fpscr = load_cpu_field_low32(vfp.fpsr);
- tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
+ tcg_gen_andi_i32(fpscr, fpscr, ~FPSR_NZCV_MASK);
tcg_gen_or_i32(fpscr, fpscr, tmp);
store_cpu_field_low32(fpscr, vfp.fpsr);
break;
@@ -390,7 +390,7 @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
tcg_gen_deposit_i32(control, control, sfpa,
R_V7M_CONTROL_SFPA_SHIFT, 1);
store_cpu_field(control, v7m.control[M_REG_S]);
- tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
+ tcg_gen_andi_i32(tmp, tmp, ~FPSR_NZCV_MASK);
gen_helper_vfp_set_fpscr(tcg_env, tmp);
s->base.is_jmp = DISAS_UPDATE_NOCHAIN;
break;
@@ -457,7 +457,7 @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
case ARM_VFP_FPSCR_NZCVQC:
tmp = tcg_temp_new_i32();
gen_helper_vfp_get_fpscr(tmp, tcg_env);
- tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
+ tcg_gen_andi_i32(tmp, tmp, FPSR_NZCVQC_MASK);
storefn(s, opaque, tmp, true);
break;
case QEMU_VFP_FPSCR_NZCV:
@@ -466,7 +466,7 @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
* helper call for the "VMRS to CPSR.NZCV" insn.
*/
tmp = load_cpu_field_low32(vfp.fpsr);
- tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
+ tcg_gen_andi_i32(tmp, tmp, FPSR_NZCV_MASK);
storefn(s, opaque, tmp, true);
break;
case ARM_VFP_FPCXT_S:
@@ -476,7 +476,7 @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
tmp = tcg_temp_new_i32();
sfpa = tcg_temp_new_i32();
gen_helper_vfp_get_fpscr(tmp, tcg_env);
- tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
+ tcg_gen_andi_i32(tmp, tmp, ~FPSR_NZCV_MASK);
control = load_cpu_field(v7m.control[M_REG_S]);
tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
@@ -529,7 +529,7 @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
sfpa = tcg_temp_new_i32();
fpscr = tcg_temp_new_i32();
gen_helper_vfp_get_fpscr(fpscr, tcg_env);
- tcg_gen_andi_i32(tmp, fpscr, ~FPCR_NZCV_MASK);
+ tcg_gen_andi_i32(tmp, fpscr, ~FPSR_NZCV_MASK);
control = load_cpu_field(v7m.control[M_REG_S]);
tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
diff --git a/target/arm/tcg/translate-vfp.c b/target/arm/tcg/translate-vfp.c
index 0d9788e8103..cd5b8483576 100644
--- a/target/arm/tcg/translate-vfp.c
+++ b/target/arm/tcg/translate-vfp.c
@@ -834,7 +834,7 @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
case ARM_VFP_FPSCR:
if (a->rt == 15) {
tmp = load_cpu_field_low32(vfp.fpsr);
- tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
+ tcg_gen_andi_i32(tmp, tmp, FPSR_NZCV_MASK);
} else {
tmp = tcg_temp_new_i32();
gen_helper_vfp_get_fpscr(tmp, tcg_env);
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index f7a3da7f68b..1ac0142e10f 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -196,7 +196,7 @@ uint32_t vfp_get_fpsr(CPUARMState *env)
fpsr |= vfp_get_fpsr_from_host(env);
i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3];
- fpsr |= i ? FPCR_QC : 0;
+ fpsr |= i ? FPSR_QC : 0;
return fpsr;
}
@@ -222,7 +222,7 @@ void vfp_set_fpsr(CPUARMState *env, uint32_t val)
* The bit we set within fpscr_q is arbitrary; the register as a
* whole being zero/non-zero is what counts.
*/
- env->vfp.qc[0] = val & FPCR_QC;
+ env->vfp.qc[0] = val & FPSR_QC;
env->vfp.qc[1] = 0;
env->vfp.qc[2] = 0;
env->vfp.qc[3] = 0;
@@ -234,7 +234,7 @@ void vfp_set_fpsr(CPUARMState *env, uint32_t val)
* fp_status, and QC is in vfp.qc[]. Store the NZCV bits there,
* and zero any of the other FPSR bits.
*/
- val &= FPCR_NZCV_MASK;
+ val &= FPSR_NZCV_MASK;
env->vfp.fpsr = val;
}
@@ -1156,7 +1156,7 @@ uint32_t HELPER(vjcvt)(float64 value, CPUARMState *env)
uint32_t z = (pair >> 32) == 0;
/* Store Z, clear NCV, in FPSCR.NZCV. */
- env->vfp.fpsr = (env->vfp.fpsr & ~FPCR_NZCV_MASK) | (z * FPCR_Z);
+ env->vfp.fpsr = (env->vfp.fpsr & ~FPSR_NZCV_MASK) | (z * FPSR_Z);
return result;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 8/9] target/arm: Rename FPSR_MASK and FPCR_MASK and define them symbolically
2024-06-28 14:23 [PATCH 0/9] target/arm: Refactor FPCR/FPSR handling to prepare for FEAT_AFP Peter Maydell
` (6 preceding siblings ...)
2024-06-28 14:23 ` [PATCH 7/9] target/arm: Rename FPCR_ QC, NZCV macros to FPSR_ Peter Maydell
@ 2024-06-28 14:23 ` Peter Maydell
2024-06-28 16:12 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 9/9] target/arm: Allow FPCR bits that aren't in FPSCR Peter Maydell
8 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2024-06-28 14:23 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Now that we store FPSR and FPCR separately, the FPSR_MASK and
FPCR_MASK macros are slightly confusingly named and the comment
describing them is out of date. Rename them to FPSCR_FPSR_MASK and
FPSCR_FPCR_MASK, document that they are the mask of which FPSCR bits
are architecturally mapped to which AArch64 register, and define them
symbolically rather than as hex values. (This latter requires
defining some extra macros for bits which we haven't previously
defined.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/cpu.h | 41 ++++++++++++++++++++++++++++++++++-------
target/arm/machine.c | 3 ++-
target/arm/vfp_helper.c | 7 ++++---
3 files changed, 40 insertions(+), 11 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9d226c474d2..f6339bde216 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1687,15 +1687,19 @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
uint32_t vfp_get_fpscr(CPUARMState *env);
void vfp_set_fpscr(CPUARMState *env, uint32_t val);
-/* FPCR, Floating Point Control Register
- * FPSR, Floating Poiht Status Register
+/*
+ * FPCR, Floating Point Control Register
+ * FPSR, Floating Point Status Register
*
- * For A64 the FPSCR is split into two logically distinct registers,
- * FPCR and FPSR. However since they still use non-overlapping bits
- * we store the underlying state in fpscr and just mask on read/write.
+ * For A64 floating point control and status bits are stored in
+ * two logically distinct registers, FPCR and FPSR. We store these
+ * in QEMU in vfp.fpcr and vfp.fpsr.
+ * For A32 there was only one register, FPSCR. The bits are arranged
+ * such that FPSCR bits map to FPCR or FPSR bits in the same bit positions,
+ * so we can use appropriate masking to handle FPSCR reads and writes.
+ * Note that the FPCR has some bits which are not visible in the
+ * AArch32 view (for FEAT_AFP). Writing the FPSCR leaves these unchanged.
*/
-#define FPSR_MASK 0xf800009f
-#define FPCR_MASK 0x07ff9f00
/* FPCR bits */
#define FPCR_IOE (1 << 8) /* Invalid Operation exception trap enable */
@@ -1704,7 +1708,9 @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
#define FPCR_UFE (1 << 11) /* Underflow exception trap enable */
#define FPCR_IXE (1 << 12) /* Inexact exception trap enable */
#define FPCR_IDE (1 << 15) /* Input Denormal exception trap enable */
+#define FPCR_LEN_MASK (7 << 16) /* LEN, A-profile only */
#define FPCR_FZ16 (1 << 19) /* ARMv8.2+, FP16 flush-to-zero */
+#define FPCR_STRIDE_MASK (3 << 20) /* Stride */
#define FPCR_RMODE_MASK (3 << 22) /* Rounding mode */
#define FPCR_FZ (1 << 24) /* Flush-to-zero enable bit */
#define FPCR_DN (1 << 25) /* Default NaN enable bit */
@@ -1714,16 +1720,37 @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
#define FPCR_LTPSIZE_MASK (7 << FPCR_LTPSIZE_SHIFT)
#define FPCR_LTPSIZE_LENGTH 3
+/* Cumulative exception trap enable bits */
+#define FPCR_EEXC_MASK (FPCR_IOE | FPCR_DZE | FPCR_OFE | FPCR_UFE | FPCR_IXE | FPCR_IDE)
+
/* FPSR bits */
+#define FPSR_IOC (1 << 0) /* Invalid Operation cumulative exception */
+#define FPSR_DZC (1 << 1) /* Divide by Zero cumulative exception */
+#define FPSR_OFC (1 << 2) /* Overflow cumulative exception */
+#define FPSR_UFC (1 << 3) /* Underflow cumulative exception */
+#define FPSR_IXC (1 << 4) /* Inexact cumulative exception */
+#define FPSR_IDC (1 << 7) /* Input Denormal cumulative exception */
#define FPSR_QC (1 << 27) /* Cumulative saturation bit */
#define FPSR_V (1 << 28) /* FP overflow flag */
#define FPSR_C (1 << 29) /* FP carry flag */
#define FPSR_Z (1 << 30) /* FP zero flag */
#define FPSR_N (1 << 31) /* FP negative flag */
+/* Cumulative exception status bits */
+#define FPSR_CEXC_MASK (FPSR_IOC | FPSR_DZC | FPSR_OFC | FPSR_UFC | FPSR_IXC | FPSR_IDC)
+
#define FPSR_NZCV_MASK (FPSR_N | FPSR_Z | FPSR_C | FPSR_V)
#define FPSR_NZCVQC_MASK (FPSR_NZCV_MASK | FPSR_QC)
+/* A32 FPSCR bits which architecturally map to FPSR bits */
+#define FPSCR_FPSR_MASK (FPSR_NZCVQC_MASK | FPSR_CEXC_MASK)
+/* A32 FPSCR bits which architecturally map to FPCR bits */
+#define FPSCR_FPCR_MASK (FPCR_EEXC_MASK | FPCR_LEN_MASK | FPCR_FZ16 | \
+ FPCR_STRIDE_MASK | FPCR_RMODE_MASK | \
+ FPCR_FZ | FPCR_DN | FPCR_AHP)
+/* These masks don't overlap: each bit lives in only one place */
+QEMU_BUILD_BUG_ON(FPSCR_FPSR_MASK & FPSCR_FPCR_MASK);
+
/**
* vfp_get_fpsr: read the AArch64 FPSR
* @env: CPU context
diff --git a/target/arm/machine.c b/target/arm/machine.c
index 8c820955d95..a3c1e05e65d 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -43,7 +43,8 @@ static bool vfp_fpcr_fpsr_needed(void *opaque)
ARMCPU *cpu = opaque;
CPUARMState *env = &cpu->env;
- return (vfp_get_fpcr(env) & ~FPCR_MASK) || (vfp_get_fpsr(env) & ~FPSR_MASK);
+ return (vfp_get_fpcr(env) & ~FPSCR_FPCR_MASK) ||
+ (vfp_get_fpsr(env) & ~FPSCR_FPSR_MASK);
}
static int get_fpscr(QEMUFile *f, void *opaque, size_t size,
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 1ac0142e10f..586c33e9460 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -202,7 +202,8 @@ uint32_t vfp_get_fpsr(CPUARMState *env)
uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
{
- return (vfp_get_fpcr(env) & FPCR_MASK) | (vfp_get_fpsr(env) & FPSR_MASK);
+ return (vfp_get_fpcr(env) & FPSCR_FPCR_MASK) |
+ (vfp_get_fpsr(env) & FPSCR_FPSR_MASK);
}
uint32_t vfp_get_fpscr(CPUARMState *env)
@@ -280,8 +281,8 @@ void vfp_set_fpcr(CPUARMState *env, uint32_t val)
void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
{
- vfp_set_fpcr(env, val & FPCR_MASK);
- vfp_set_fpsr(env, val & FPSR_MASK);
+ vfp_set_fpcr(env, val & FPSCR_FPCR_MASK);
+ vfp_set_fpsr(env, val & FPSCR_FPSR_MASK);
}
void vfp_set_fpscr(CPUARMState *env, uint32_t val)
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 9/9] target/arm: Allow FPCR bits that aren't in FPSCR
2024-06-28 14:23 [PATCH 0/9] target/arm: Refactor FPCR/FPSR handling to prepare for FEAT_AFP Peter Maydell
` (7 preceding siblings ...)
2024-06-28 14:23 ` [PATCH 8/9] target/arm: Rename FPSR_MASK and FPCR_MASK and define them symbolically Peter Maydell
@ 2024-06-28 14:23 ` Peter Maydell
2024-06-28 16:14 ` Richard Henderson
8 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2024-06-28 14:23 UTC (permalink / raw)
To: qemu-arm, qemu-devel
In order to allow FPCR bits that aren't in the FPSCR (like the new
bits that are defined for FEAT_AFP), we need to make sure that writes
to the FPSCR only write to the bits of FPCR that are architecturally
mapped, and not the others.
Implement this with a new function vfp_set_fpcr_masked() which
takes a mask of which bits to update.
(We could do the same for FPSR, but we leave that until we actually
are likely to need it.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/vfp_helper.c | 54 ++++++++++++++++++++++++++---------------
1 file changed, 34 insertions(+), 20 deletions(-)
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 586c33e9460..9406e32f3da 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -113,11 +113,12 @@ static void vfp_set_fpsr_to_host(CPUARMState *env, uint32_t val)
set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
}
-static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val)
+static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
{
uint64_t changed = env->vfp.fpcr;
changed ^= val;
+ changed &= mask;
if (changed & (3 << 22)) {
int i = (val >> 22) & 3;
switch (i) {
@@ -167,7 +168,7 @@ static void vfp_set_fpsr_to_host(CPUARMState *env, uint32_t val)
{
}
-static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val)
+static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
{
}
@@ -239,8 +240,13 @@ void vfp_set_fpsr(CPUARMState *env, uint32_t val)
env->vfp.fpsr = val;
}
-void vfp_set_fpcr(CPUARMState *env, uint32_t val)
+static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask)
{
+ /*
+ * We only set FPCR bits defined by mask, and leave the others alone.
+ * We assume the mask is sensible (e.g. doesn't try to set only
+ * part of a field)
+ */
ARMCPU *cpu = env_archcpu(env);
/* When ARMv8.2-FP16 is not supported, FZ16 is RES0. */
@@ -248,22 +254,24 @@ void vfp_set_fpcr(CPUARMState *env, uint32_t val)
val &= ~FPCR_FZ16;
}
- vfp_set_fpcr_to_host(env, val);
+ vfp_set_fpcr_to_host(env, val, mask);
- if (!arm_feature(env, ARM_FEATURE_M)) {
- /*
- * Short-vector length and stride; on M-profile these bits
- * are used for different purposes.
- * We can't make this conditional be "if MVFR0.FPShVec != 0",
- * because in v7A no-short-vector-support cores still had to
- * allow Stride/Len to be written with the only effect that
- * some insns are required to UNDEF if the guest sets them.
- */
- env->vfp.vec_len = extract32(val, 16, 3);
- env->vfp.vec_stride = extract32(val, 20, 2);
- } else if (cpu_isar_feature(aa32_mve, cpu)) {
- env->v7m.ltpsize = extract32(val, FPCR_LTPSIZE_SHIFT,
- FPCR_LTPSIZE_LENGTH);
+ if (mask & (FPCR_LEN_MASK | FPCR_STRIDE_MASK)) {
+ if (!arm_feature(env, ARM_FEATURE_M)) {
+ /*
+ * Short-vector length and stride; on M-profile these bits
+ * are used for different purposes.
+ * We can't make this conditional be "if MVFR0.FPShVec != 0",
+ * because in v7A no-short-vector-support cores still had to
+ * allow Stride/Len to be written with the only effect that
+ * some insns are required to UNDEF if the guest sets them.
+ */
+ env->vfp.vec_len = extract32(val, 16, 3);
+ env->vfp.vec_stride = extract32(val, 20, 2);
+ } else if (cpu_isar_feature(aa32_mve, cpu)) {
+ env->v7m.ltpsize = extract32(val, FPCR_LTPSIZE_SHIFT,
+ FPCR_LTPSIZE_LENGTH);
+ }
}
/*
@@ -276,12 +284,18 @@ void vfp_set_fpcr(CPUARMState *env, uint32_t val)
* bits.
*/
val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16;
- env->vfp.fpcr = val;
+ env->vfp.fpcr &= ~mask;
+ env->vfp.fpcr |= val;
+}
+
+void vfp_set_fpcr(CPUARMState *env, uint32_t val)
+{
+ vfp_set_fpcr_masked(env, val, MAKE_64BIT_MASK(0, 32));
}
void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
{
- vfp_set_fpcr(env, val & FPSCR_FPCR_MASK);
+ vfp_set_fpcr_masked(env, val, FPSCR_FPCR_MASK);
vfp_set_fpsr(env, val & FPSCR_FPSR_MASK);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/9] target/arm: Correct comments about M-profile FPSCR
2024-06-28 14:23 ` [PATCH 1/9] target/arm: Correct comments about M-profile FPSCR Peter Maydell
@ 2024-06-28 14:57 ` Richard Henderson
0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2024-06-28 14:57 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel
On 6/28/24 07:23, Peter Maydell wrote:
> The M-profile FPSCR LTPSIZE is bits [18:16]; this is the same
> field as A-profile FPSCR Len, not Stride. Correct the comment
> in vfp_get_fpscr().
>
> We also implemented M-profile FPSCR.QC, but forgot to delete
> a TODO comment from vfp_set_fpscr(); remove it now.
>
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
> target/arm/vfp_helper.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/9] target/arm: Make vfp_get_fpscr() call vfp_get_{fpcr, fpsr}
2024-06-28 14:23 ` [PATCH 2/9] target/arm: Make vfp_get_fpscr() call vfp_get_{fpcr, fpsr} Peter Maydell
@ 2024-06-28 15:37 ` Richard Henderson
0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2024-06-28 15:37 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel
On 6/28/24 07:23, Peter Maydell wrote:
> In AArch32, the floating point control and status bits are all in a
> single register, FPSCR. In AArch64, these were split into separate
> FPCR and FPSR registers, but the bit layouts remained the same, with
> no overlaps, so that you could construct an FPSCR value by ORing FPCR
> and FPSR, or equivalently could produce FPSR and FPCR by masking an
> FPSCR value. For QEMU's implementation, we opted to use masking to
> produce FPSR and FPCR, because we started with an AArch32
> implementation of FPSCR.
>
> The addition of the (AArch64-only) FEAT_AFP adds new bits to the FPCR
> which overlap with some bits in the FPSR. This means we'll no longer
> be able to consider the FPSCR-encoded value as the primary one, but
> instead need to treat FPSR/FPCR as the primary encoding and construct
> the FPSCR from those. (This remains possible because the FEAT_AFP
> bits in FPCR don't appear in the FPSCR.)
>
> As the first step in this refactoring, make vfp_get_fpscr() call
> vfp_get_fpcr() and vfp_get_fpsr(), instead of the other way around.
>
> Note that vfp_get_fpcsr_from_host() returns only bits in the FPSR
> (for the cumulative fp exception bits), so we can simply rename
> it without needing to add a new function for getting FPCR bits.
>
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
> target/arm/cpu.h | 24 +++++++++++++++---------
> target/arm/vfp_helper.c | 34 ++++++++++++++++++++++------------
> 2 files changed, 37 insertions(+), 21 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/9] target/arm: Make vfp_set_fpscr() call vfp_set_{fpcr, fpsr}
2024-06-28 14:23 ` [PATCH 3/9] target/arm: Make vfp_set_fpscr() call vfp_set_{fpcr, fpsr} Peter Maydell
@ 2024-06-28 15:50 ` Richard Henderson
0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2024-06-28 15:50 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel
On 6/28/24 07:23, Peter Maydell wrote:
> +void vfp_set_fpsr(CPUARMState *env, uint32_t val)
> +{
> + ARMCPU *cpu = env_archcpu(env);
> +
> + vfp_set_fpsr_to_host(env, val);
> +
> + if (arm_feature(env, ARM_FEATURE_NEON) ||
> + cpu_isar_feature(aa32_mve, cpu)) {
> + /*
> + * The bit we set within fpscr_q is arbitrary; the register as a
> + * whole being zero/non-zero is what counts.
> + */
> + env->vfp.qc[0] = val & FPCR_QC;
While it's code movement, the comment is out of date.
Update s/fpscr_q/vfp.qc[]/, possibly as a follow-up.
Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/9] target/arm: Support migration when FPSR/FPCR won't fit in the FPSCR
2024-06-28 14:23 ` [PATCH 4/9] target/arm: Support migration when FPSR/FPCR won't fit in the FPSCR Peter Maydell
@ 2024-06-28 16:01 ` Richard Henderson
2024-06-28 16:27 ` Peter Maydell
0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2024-06-28 16:01 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel
On 6/28/24 07:23, Peter Maydell wrote:
> To support FPSR and FPCR bits that don't exist in the AArch32 FPSCR
> view of floating point control and status (such as the FEAT_AFP ones),
> we need to make sure those bits can be migrated. This commit allows
> that, whilst maintaining backwards and forwards migration compatibility
> for CPUs where there are no such bits:
>
> On sending:
> * If either the FPCR or the FPSR include set bits that are not
> visible in the AArch32 FPSCR view of floating point control/status
> then we send the FPCR and FPSR as two separate fields in a new
> cpu/vfp/fpcr_fpsr subsection, and we send a 0 for the old
> FPSCR field in cpu/vfp
> * Otherwise, we don't send the fpcr_fpsr subsection, and we send
> an FPSCR-format value in cpu/vfp as we did previously
>
> On receiving:
> * if we see a non-zero FPSCR field, that is the right information
> * if we see a fpcr_fpsr subsection then that has the information
> * if we see neither, then FPSCR/FPCR/FPSR are all zero on the source;
> cpu_pre_load() ensures the CPU state defaults to that
> * if we see both, then the migration source is buggy or malicious;
> either the fpcr_fpsr or the FPSCR will "win" depending which
> is first in the migration stream; we don't care which that is
>
> We make the new FPCR and FPSR on-the-wire data be 64 bits, because
> architecturally these registers are that wide, and this avoids the
> need to engage in further migration-compatibility contortions in
> future if some new architecture revision defines bits in the high
> half of either register.
>
> (We won't ever send the new migration subsection until we add support
> for a CPU feature which enables setting overlapping FPCR bits, like
> FEAT_AFP.)
>
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
> target/arm/machine.c | 134 ++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 132 insertions(+), 2 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Not ideal, as vfp_get_{fpcr,fpsr} are called 3 or 4 times during migration. But unless we
have separate 'fp*r_migrate' fields in cpu state, initialized in pre_save, there's no
getting around it. And I suppose migration isn't exactly performance critical.
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 5/9] target/arm: Implement store_cpu_field_low32() macro
2024-06-28 14:23 ` [PATCH 5/9] target/arm: Implement store_cpu_field_low32() macro Peter Maydell
@ 2024-06-28 16:02 ` Richard Henderson
0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2024-06-28 16:02 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel
On 6/28/24 07:23, Peter Maydell wrote:
> We already have a load_cpu_field_low32() to load the low half of a
> 64-bit CPU struct field to a TCGv_i32; however we haven't yet needed
> the store equivalent. We'll want that in the next patch, so
> implement it.
>
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
> target/arm/tcg/translate-a32.h | 7 +++++++
> 1 file changed, 7 insertions(+)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 6/9] target/arm: Store FPSR and FPCR in separate CPU state fields
2024-06-28 14:23 ` [PATCH 6/9] target/arm: Store FPSR and FPCR in separate CPU state fields Peter Maydell
@ 2024-06-28 16:06 ` Richard Henderson
0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2024-06-28 16:06 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel
On 6/28/24 07:23, Peter Maydell wrote:
> Now that we have refactored the set/get functions so that the FPSCR
> format is no longer the authoritative one, we can keep FPSR and FPCR
> in separate CPU state fields.
>
> As well as the get and set functions, we also have a scattering of
> places in the code which directly access vfp.xregs[ARM_VFP_FPSCR] to
> extract single fields which are stored there. These all change to
> directly access either vfp.fpsr or vfp.fpcr, depending on the
> location of the field. (Most commonly, this is the NZCV flags.)
>
> We make the field in the CPU state struct 64 bits, because
> architecturally FPSR and FPCR are 64 bits. However we leave the
> types of the arguments and return values of the get/set functions as
> 32 bits, since we don't need to make that change with the current
> architecture and various callsites would be unable to handle
> set bits in the high half (for instance the gdbstub protocol
> assumes they're only 32 bit registers).
>
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
> target/arm/cpu.h | 7 +++++++
> target/arm/tcg/translate.h | 3 +--
> target/arm/tcg/mve_helper.c | 12 ++++++------
> target/arm/tcg/translate-m-nocp.c | 6 +++---
> target/arm/tcg/translate-vfp.c | 2 +-
> target/arm/vfp_helper.c | 25 ++++++++++---------------
> 6 files changed, 28 insertions(+), 27 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 7/9] target/arm: Rename FPCR_ QC, NZCV macros to FPSR_
2024-06-28 14:23 ` [PATCH 7/9] target/arm: Rename FPCR_ QC, NZCV macros to FPSR_ Peter Maydell
@ 2024-06-28 16:07 ` Richard Henderson
0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2024-06-28 16:07 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel
On 6/28/24 07:23, Peter Maydell wrote:
> The QC, N, Z, C, V bits live in the FPSR, not the FPCR. Rename the
> macros that define these bits accordingly.
>
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
> target/arm/cpu.h | 17 ++++++++++-------
> target/arm/tcg/mve_helper.c | 8 ++++----
> target/arm/tcg/translate-m-nocp.c | 16 ++++++++--------
> target/arm/tcg/translate-vfp.c | 2 +-
> target/arm/vfp_helper.c | 8 ++++----
> 5 files changed, 27 insertions(+), 24 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 8/9] target/arm: Rename FPSR_MASK and FPCR_MASK and define them symbolically
2024-06-28 14:23 ` [PATCH 8/9] target/arm: Rename FPSR_MASK and FPCR_MASK and define them symbolically Peter Maydell
@ 2024-06-28 16:12 ` Richard Henderson
0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2024-06-28 16:12 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel
On 6/28/24 07:23, Peter Maydell wrote:
> Now that we store FPSR and FPCR separately, the FPSR_MASK and
> FPCR_MASK macros are slightly confusingly named and the comment
> describing them is out of date. Rename them to FPSCR_FPSR_MASK and
> FPSCR_FPCR_MASK, document that they are the mask of which FPSCR bits
> are architecturally mapped to which AArch64 register, and define them
> symbolically rather than as hex values. (This latter requires
> defining some extra macros for bits which we haven't previously
> defined.)
>
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
> target/arm/cpu.h | 41 ++++++++++++++++++++++++++++++++++-------
> target/arm/machine.c | 3 ++-
> target/arm/vfp_helper.c | 7 ++++---
> 3 files changed, 40 insertions(+), 11 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 9/9] target/arm: Allow FPCR bits that aren't in FPSCR
2024-06-28 14:23 ` [PATCH 9/9] target/arm: Allow FPCR bits that aren't in FPSCR Peter Maydell
@ 2024-06-28 16:14 ` Richard Henderson
0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2024-06-28 16:14 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel
On 6/28/24 07:23, Peter Maydell wrote:
> In order to allow FPCR bits that aren't in the FPSCR (like the new
> bits that are defined for FEAT_AFP), we need to make sure that writes
> to the FPSCR only write to the bits of FPCR that are architecturally
> mapped, and not the others.
>
> Implement this with a new function vfp_set_fpcr_masked() which
> takes a mask of which bits to update.
>
> (We could do the same for FPSR, but we leave that until we actually
> are likely to need it.)
>
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
> target/arm/vfp_helper.c | 54 ++++++++++++++++++++++++++---------------
> 1 file changed, 34 insertions(+), 20 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/9] target/arm: Support migration when FPSR/FPCR won't fit in the FPSCR
2024-06-28 16:01 ` Richard Henderson
@ 2024-06-28 16:27 ` Peter Maydell
0 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2024-06-28 16:27 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-arm, qemu-devel
On Fri, 28 Jun 2024 at 17:01, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/28/24 07:23, Peter Maydell wrote:
> > To support FPSR and FPCR bits that don't exist in the AArch32 FPSCR
> > view of floating point control and status (such as the FEAT_AFP ones),
> > we need to make sure those bits can be migrated. This commit allows
> > that, whilst maintaining backwards and forwards migration compatibility
> > for CPUs where there are no such bits:
> >
> > On sending:
> > * If either the FPCR or the FPSR include set bits that are not
> > visible in the AArch32 FPSCR view of floating point control/status
> > then we send the FPCR and FPSR as two separate fields in a new
> > cpu/vfp/fpcr_fpsr subsection, and we send a 0 for the old
> > FPSCR field in cpu/vfp
> > * Otherwise, we don't send the fpcr_fpsr subsection, and we send
> > an FPSCR-format value in cpu/vfp as we did previously
> >
> > On receiving:
> > * if we see a non-zero FPSCR field, that is the right information
> > * if we see a fpcr_fpsr subsection then that has the information
> > * if we see neither, then FPSCR/FPCR/FPSR are all zero on the source;
> > cpu_pre_load() ensures the CPU state defaults to that
> > * if we see both, then the migration source is buggy or malicious;
> > either the fpcr_fpsr or the FPSCR will "win" depending which
> > is first in the migration stream; we don't care which that is
> >
> > We make the new FPCR and FPSR on-the-wire data be 64 bits, because
> > architecturally these registers are that wide, and this avoids the
> > need to engage in further migration-compatibility contortions in
> > future if some new architecture revision defines bits in the high
> > half of either register.
> >
> > (We won't ever send the new migration subsection until we add support
> > for a CPU feature which enables setting overlapping FPCR bits, like
> > FEAT_AFP.)
> >
> > Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> > ---
> > target/arm/machine.c | 134 ++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 132 insertions(+), 2 deletions(-)
>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
> Not ideal, as vfp_get_{fpcr,fpsr} are called 3 or 4 times during migration. But unless we
> have separate 'fp*r_migrate' fields in cpu state, initialized in pre_save, there's no
> getting around it. And I suppose migration isn't exactly performance critical.
Yeah, we could have done it that way, but I am assuming that
the time taken for this is pretty miniscule in the general
scheme of how long migration takes, so I preferred the
way that doesn't clutter up the CPU state struct with
migration-only fields.
If somebody cares about migration downtime performance (which
does actually matter for some workload/use cases AIUI) they
can do some benchmarking and tell us what the actually
slow parts are :-)
thanks
-- PMM
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-06-28 16:28 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-28 14:23 [PATCH 0/9] target/arm: Refactor FPCR/FPSR handling to prepare for FEAT_AFP Peter Maydell
2024-06-28 14:23 ` [PATCH 1/9] target/arm: Correct comments about M-profile FPSCR Peter Maydell
2024-06-28 14:57 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 2/9] target/arm: Make vfp_get_fpscr() call vfp_get_{fpcr, fpsr} Peter Maydell
2024-06-28 15:37 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 3/9] target/arm: Make vfp_set_fpscr() call vfp_set_{fpcr, fpsr} Peter Maydell
2024-06-28 15:50 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 4/9] target/arm: Support migration when FPSR/FPCR won't fit in the FPSCR Peter Maydell
2024-06-28 16:01 ` Richard Henderson
2024-06-28 16:27 ` Peter Maydell
2024-06-28 14:23 ` [PATCH 5/9] target/arm: Implement store_cpu_field_low32() macro Peter Maydell
2024-06-28 16:02 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 6/9] target/arm: Store FPSR and FPCR in separate CPU state fields Peter Maydell
2024-06-28 16:06 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 7/9] target/arm: Rename FPCR_ QC, NZCV macros to FPSR_ Peter Maydell
2024-06-28 16:07 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 8/9] target/arm: Rename FPSR_MASK and FPCR_MASK and define them symbolically Peter Maydell
2024-06-28 16:12 ` Richard Henderson
2024-06-28 14:23 ` [PATCH 9/9] target/arm: Allow FPCR bits that aren't in FPSCR Peter Maydell
2024-06-28 16:14 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).