[Qemu-devel] [PATCH v2 16/42] target/arm: Convert the VFP load/store multiple insns to decodetree

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Maydell <peter.maydell@linaro.org>
To: qemu-arm@nongnu.org, qemu-devel@nongnu.org
Cc: Richard Henderson <richard.henderson@linaro.org>
Subject: [Qemu-devel] [PATCH v2 16/42] target/arm: Convert the VFP load/store multiple insns to decodetree
Date: Tue, 11 Jun 2019 11:53:25 +0100	[thread overview]
Message-ID: <20190611105351.9871-17-peter.maydell@linaro.org> (raw)
In-Reply-To: <20190611105351.9871-1-peter.maydell@linaro.org>

Convert the VFP load/store multiple insns to decodetree.
This includes tightening up the UNDEF checking for pre-VFPv3
CPUs which only have D0-D15 : they now UNDEF for any access
to D16-D31, not merely when the smallest register in the
transfer list is in D16-D31.

This conversion does not try to share code between the single
precision and the double precision versions; this looks a bit
duplicative of code, but it leaves the door open for a future
refactoring which gets rid of the use of the "F0" registers
by inlining the various functions like gen_vfp_ld() and
gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }"
conditionalisation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 162 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  97 +-------------------
 target/arm/vfp.decode          |  18 ++++
 3 files changed, 183 insertions(+), 94 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 40f2cac3e2e..32a1805e582 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -926,3 +926,165 @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 
     return true;
 }
+
+static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
+{
+    uint32_t offset;
+    TCGv_i32 addr;
+    int i, n;
+
+    n = a->imm;
+
+    if (n == 0 || (a->vd + n) > 32) {
+        /*
+         * UNPREDICTABLE cases for bad immediates: we choose to
+         * UNDEF to avoid generating huge numbers of TCG ops
+         */
+        return false;
+    }
+    if (a->rn == 15 && a->w) {
+        /* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (s->thumb && a->rn == 15) {
+        /* This is actually UNPREDICTABLE */
+        addr = tcg_temp_new_i32();
+        tcg_gen_movi_i32(addr, s->pc & ~2);
+    } else {
+        addr = load_reg(s, a->rn);
+    }
+    if (a->p) {
+        /* pre-decrement */
+        tcg_gen_addi_i32(addr, addr, -(a->imm << 2));
+    }
+
+    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
+        /*
+         * Here 'addr' is the lowest address we will store to,
+         * and is either the old SP (if post-increment) or
+         * the new SP (if pre-decrement). For post-increment
+         * where the old value is below the limit and the new
+         * value is above, it is UNKNOWN whether the limit check
+         * triggers; we choose to trigger.
+         */
+        gen_helper_v8m_stackcheck(cpu_env, addr);
+    }
+
+    offset = 4;
+    for (i = 0; i < n; i++) {
+        if (a->l) {
+            /* load */
+            gen_vfp_ld(s, false, addr);
+            gen_mov_vreg_F0(false, a->vd + i);
+        } else {
+            /* store */
+            gen_mov_F0_vreg(false, a->vd + i);
+            gen_vfp_st(s, false, addr);
+        }
+        tcg_gen_addi_i32(addr, addr, offset);
+    }
+    if (a->w) {
+        /* writeback */
+        if (a->p) {
+            offset = -offset * n;
+            tcg_gen_addi_i32(addr, addr, offset);
+        }
+        store_reg(s, a->rn, addr);
+    } else {
+        tcg_temp_free_i32(addr);
+    }
+
+    return true;
+}
+
+static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
+{
+    uint32_t offset;
+    TCGv_i32 addr;
+    int i, n;
+
+    n = a->imm >> 1;
+
+    if (n == 0 || (a->vd + n) > 32 || n > 16) {
+        /*
+         * UNPREDICTABLE cases for bad immediates: we choose to
+         * UNDEF to avoid generating huge numbers of TCG ops
+         */
+        return false;
+    }
+    if (a->rn == 15 && a->w) {
+        /* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd + n) > 16) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (s->thumb && a->rn == 15) {
+        /* This is actually UNPREDICTABLE */
+        addr = tcg_temp_new_i32();
+        tcg_gen_movi_i32(addr, s->pc & ~2);
+    } else {
+        addr = load_reg(s, a->rn);
+    }
+    if (a->p) {
+        /* pre-decrement */
+        tcg_gen_addi_i32(addr, addr, -(a->imm << 2));
+    }
+
+    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
+        /*
+         * Here 'addr' is the lowest address we will store to,
+         * and is either the old SP (if post-increment) or
+         * the new SP (if pre-decrement). For post-increment
+         * where the old value is below the limit and the new
+         * value is above, it is UNKNOWN whether the limit check
+         * triggers; we choose to trigger.
+         */
+        gen_helper_v8m_stackcheck(cpu_env, addr);
+    }
+
+    offset = 8;
+    for (i = 0; i < n; i++) {
+        if (a->l) {
+            /* load */
+            gen_vfp_ld(s, true, addr);
+            gen_mov_vreg_F0(true, a->vd + i);
+        } else {
+            /* store */
+            gen_mov_F0_vreg(true, a->vd + i);
+            gen_vfp_st(s, true, addr);
+        }
+        tcg_gen_addi_i32(addr, addr, offset);
+    }
+    if (a->w) {
+        /* writeback */
+        if (a->p) {
+            offset = -offset * n;
+        } else if (a->imm & 1) {
+            offset = 4;
+        } else {
+            offset = 0;
+        }
+
+        if (offset != 0) {
+            tcg_gen_addi_i32(addr, addr, offset);
+        }
+        store_reg(s, a->rn, addr);
+    } else {
+        tcg_temp_free_i32(addr);
+    }
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d954e8de1eb..5a9d0c30d3d 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3092,9 +3092,8 @@ static void gen_neon_dup_high16(TCGv_i32 var)
  */
 static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
-    uint32_t rd, rn, rm, op, i, n, offset, delta_d, delta_m, bank_mask;
+    uint32_t rd, rn, rm, op, i, n, delta_d, delta_m, bank_mask;
     int dp, veclen;
-    TCGv_i32 addr;
     TCGv_i32 tmp;
     TCGv_i32 tmp2;
 
@@ -3702,98 +3701,8 @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         break;
     case 0xc:
     case 0xd:
-        if ((insn & 0x03e00000) == 0x00400000) {
-            /* Already handled by decodetree */
-            return 1;
-        } else {
-            /* Load/store */
-            rn = (insn >> 16) & 0xf;
-            if (dp)
-                VFP_DREG_D(rd, insn);
-            else
-                rd = VFP_SREG_D(insn);
-            if ((insn & 0x01200000) == 0x01000000) {
-                /* Already handled by decodetree */
-                return 1;
-            } else {
-                /* load/store multiple */
-                int w = insn & (1 << 21);
-                if (dp)
-                    n = (insn >> 1) & 0x7f;
-                else
-                    n = insn & 0xff;
-
-                if (w && !(((insn >> 23) ^ (insn >> 24)) & 1)) {
-                    /* P == U , W == 1  => UNDEF */
-                    return 1;
-                }
-                if (n == 0 || (rd + n) > 32 || (dp && n > 16)) {
-                    /* UNPREDICTABLE cases for bad immediates: we choose to
-                     * UNDEF to avoid generating huge numbers of TCG ops
-                     */
-                    return 1;
-                }
-                if (rn == 15 && w) {
-                    /* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
-                    return 1;
-                }
-
-                if (s->thumb && rn == 15) {
-                    /* This is actually UNPREDICTABLE */
-                    addr = tcg_temp_new_i32();
-                    tcg_gen_movi_i32(addr, s->pc & ~2);
-                } else {
-                    addr = load_reg(s, rn);
-                }
-                if (insn & (1 << 24)) /* pre-decrement */
-                    tcg_gen_addi_i32(addr, addr, -((insn & 0xff) << 2));
-
-                if (s->v8m_stackcheck && rn == 13 && w) {
-                    /*
-                     * Here 'addr' is the lowest address we will store to,
-                     * and is either the old SP (if post-increment) or
-                     * the new SP (if pre-decrement). For post-increment
-                     * where the old value is below the limit and the new
-                     * value is above, it is UNKNOWN whether the limit check
-                     * triggers; we choose to trigger.
-                     */
-                    gen_helper_v8m_stackcheck(cpu_env, addr);
-                }
-
-                if (dp)
-                    offset = 8;
-                else
-                    offset = 4;
-                for (i = 0; i < n; i++) {
-                    if (insn & ARM_CP_RW_BIT) {
-                        /* load */
-                        gen_vfp_ld(s, dp, addr);
-                        gen_mov_vreg_F0(dp, rd + i);
-                    } else {
-                        /* store */
-                        gen_mov_F0_vreg(dp, rd + i);
-                        gen_vfp_st(s, dp, addr);
-                    }
-                    tcg_gen_addi_i32(addr, addr, offset);
-                }
-                if (w) {
-                    /* writeback */
-                    if (insn & (1 << 24))
-                        offset = -offset * n;
-                    else if (dp && (insn & 1))
-                        offset = 4;
-                    else
-                        offset = 0;
-
-                    if (offset != 0)
-                        tcg_gen_addi_i32(addr, addr, offset);
-                    store_reg(s, rn, addr);
-                } else {
-                    tcg_temp_free_i32(addr);
-                }
-            }
-        }
-        break;
+        /* Already handled by decodetree */
+        return 1;
     default:
         /* Should never happen.  */
         return 1;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index 8fa7fa0bead..68c9ffcfd3c 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -78,3 +78,21 @@ VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8 \
              vd=%vd_sp
 VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8 \
              vd=%vd_dp
+
+# We split the load/store multiple up into two patterns to avoid
+# overlap with other insns in the "Advanced SIMD load/store and 64-bit move"
+# grouping:
+#   P=0 U=0 W=0 is 64-bit VMOV
+#   P=1 W=0 is VLDR/VSTR
+#   P=U W=1 is UNDEF
+# leaving P=0 U=1 W=x and P=1 U=0 W=1 for load/store multiple.
+# These include FSTM/FLDM.
+VLDM_VSTM_sp ---- 1100 1 . w:1 l:1 rn:4 .... 1010 imm:8 \
+             vd=%vd_sp p=0 u=1
+VLDM_VSTM_dp ---- 1100 1 . w:1 l:1 rn:4 .... 1011 imm:8 \
+             vd=%vd_dp p=0 u=1
+
+VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \
+             vd=%vd_sp p=1 u=0 w=1
+VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
+             vd=%vd_dp p=1 u=0 w=1
-- 
2.20.1

next prev parent reply	other threads:[~2019-06-11 11:24 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-11 10:53 [Qemu-devel] [PATCH v2 00/42] target/arm: Convert VFP decoder to decodetree Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 01/42] decodetree: Fix comparison of Field Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 02/42] target/arm: Add stubs for AArch32 VFP decodetree Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 03/42] target/arm: Factor out VFP access checking code Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 04/42] target/arm: Fix Cortex-R5F MVFR values Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 05/42] target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 06/42] target/arm: Convert the VSEL instructions to decodetree Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 07/42] target/arm: Convert VMINNM, VMAXNM " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 08/42] target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 09/42] target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 10/42] target/arm: Move the VFP trans_* functions to translate-vfp.inc.c Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 11/42] target/arm: Add helpers for VFP register loads and stores Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 12/42] target/arm: Convert "double-precision" register moves to decodetree Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 13/42] target/arm: Convert "single-precision" " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 14/42] target/arm: Convert VFP two-register transfer insns " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 15/42] target/arm: Convert VFP VLDR and VSTR " Peter Maydell
2019-06-11 10:53 ` Peter Maydell [this message]
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 17/42] target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 18/42] target/arm: Convert VFP VMLA to decodetree Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 19/42] target/arm: Convert VFP VMLS " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 20/42] target/arm: Convert VFP VNMLS " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 21/42] target/arm: Convert VFP VNMLA " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 22/42] target/arm: Convert VMUL " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 23/42] target/arm: Convert VNMUL " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 24/42] target/arm: Convert VADD " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 25/42] target/arm: Convert VSUB " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 26/42] target/arm: Convert VDIV " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 27/42] target/arm: Convert VFP fused multiply-add insns " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 28/42] target/arm: Convert VMOV (imm) " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 29/42] target/arm: Convert VABS " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 30/42] target/arm: Convert VNEG " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 31/42] target/arm: Convert VSQRT " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 32/42] target/arm: Convert VMOV (register) " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 33/42] target/arm: Convert VFP comparison insns " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 34/42] target/arm: Convert the VCVT-from-f16 " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 35/42] target/arm: Convert the VCVT-to-f16 " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 36/42] target/arm: Convert VFP round " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 37/42] target/arm: Convert double-single precision conversion " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 38/42] target/arm: Convert integer-to-float " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 39/42] target/arm: Convert VJCVT " Peter Maydell
2019-06-11 15:08   ` Richard Henderson
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 40/42] target/arm: Convert VCVT fp/fixed-point conversion insns " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 41/42] target/arm: Convert float-to-integer VCVT " Peter Maydell
2019-06-11 10:53 ` [Qemu-devel] [PATCH v2 42/42] target/arm: Fix short-vector increment behaviour Peter Maydell

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:40f2cac3e2 dfblob:32a1805e58 dfblob:d954e8de1e
dfblob:5a9d0c30d3 dfblob:8fa7fa0bea dfblob:68c9ffcfd3 )
 OR (
bs:"[Qemu-devel] [PATCH v2 16/42] target/arm: Convert the VFP load/store multiple insns to decodetree" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190611105351.9871-17-peter.maydell@linaro.org \
    --to=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).