From: Peter Maydell <peter.maydell@linaro.org>
To: qemu-devel@nongnu.org
Cc: patches@linaro.org, "Michael Matz" <matz@suse.de>,
"Alexander Graf" <agraf@suse.de>,
"Claudio Fontana" <claudio.fontana@linaro.org>,
"Dirk Mueller" <dmueller@suse.de>,
"Will Newton" <will.newton@linaro.org>,
"Laurent Desnogues" <laurent.desnogues@gmail.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
kvmarm@lists.cs.columbia.edu,
"Christoffer Dall" <christoffer.dall@linaro.org>,
"Richard Henderson" <rth@twiddle.net>
Subject: [Qemu-devel] [PATCH 01/10] target-arm: A64: Add SIMD ld/st multiple
Date: Fri, 10 Jan 2014 17:12:43 +0000 [thread overview]
Message-ID: <1389373972-27686-2-git-send-email-peter.maydell@linaro.org> (raw)
In-Reply-To: <1389373972-27686-1-git-send-email-peter.maydell@linaro.org>
From: Alex Bennée <alex.bennee@linaro.org>
This adds support support for the SIMD load/store
multiple category of instructions.
This also brings in a couple of helper functions for manipulating
sections of the SIMD registers:
* do_vec_get - fetch value from a slice of a vector register
* do_vec_set - set a slice of a vector register
which use vec_reg_offset for consistent processing of offsets in an
endian aware manner. There are also additional helpers:
* do_vec_ld - load value into SIMD
* do_vec_st - store value from SIMD
which load or store a slice of a vector register to memory.
These don't zero extend like the fp variants.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
v2 -> v3:
- use extract32/sextract32 instead of get_bits and get_sbits
v3 -> v4 (ajb):
- move into new decoder structure
- use new API for loading temp addr
- push various variables to local blocks
- fix semantics of clearing V reg on load
- tested with risu
v4 -> v5 (ajb):
- catch more unallocated values
- add missing returns
- use do_fp_ld for offset==0 instead of explicit clear_reg
v5 -> v6 (ajb):
- merge all the various vector helpers into one commit
---
target-arm/translate-a64.c | 247 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 245 insertions(+), 2 deletions(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index cf80c46..4482e73 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -308,6 +308,28 @@ static TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
return v;
}
+/* Return the offset into CPUARMState of an element of specified
+ * size, 'element' places in from the least significant end of
+ * the FP/vector register Qn.
+ */
+static inline int vec_reg_offset(int regno, int element, TCGMemOp size)
+{
+ int offs = offsetof(CPUARMState, vfp.regs[regno * 2]);
+#ifdef HOST_WORDS_BIGENDIAN
+ /* This is complicated slightly because vfp.regs[2n] is
+ * still the low half and vfp.regs[2n+1] the high half
+ * of the 128 bit vector, even on big endian systems.
+ * Calculate the offset assuming a fully bigendian 128 bits,
+ * then XOR to account for the order of the two 64 bit halves.
+ */
+ offs += (16 - ((element + 1) * (1 << size)));
+ offs ^= 8;
+#else
+ offs += element * (1 << size);
+#endif
+ return offs;
+}
+
/* Return the offset into CPUARMState of a slice (from
* the least significant end) of FP register Qn (ie
* Dn, Sn, Hn or Bn).
@@ -661,6 +683,108 @@ static void do_fp_ld(DisasContext *s, int destidx, TCGv_i64 tcg_addr, int size)
}
/*
+ * Vector load/store helpers.
+ *
+ * The principal difference between this and a FP load is that we don't
+ * zero extend as we are filling a partial chunk of the vector register.
+ * These functions don't support 128 bit loads/stores, which would be
+ * normal load/store operations.
+ */
+
+/* Get value of an element within a vector register */
+static void read_vec_element(DisasContext *s, TCGv_i64 tcg_dest, int srcidx,
+ int element, TCGMemOp memop)
+{
+ int vect_off = vec_reg_offset(srcidx, element, memop & MO_SIZE);
+ switch (memop) {
+ case MO_8:
+ tcg_gen_ld8u_i64(tcg_dest, cpu_env, vect_off);
+ break;
+ case MO_16:
+ tcg_gen_ld16u_i64(tcg_dest, cpu_env, vect_off);
+ break;
+ case MO_32:
+ tcg_gen_ld32u_i64(tcg_dest, cpu_env, vect_off);
+ break;
+ case MO_8|MO_SIGN:
+ tcg_gen_ld8s_i64(tcg_dest, cpu_env, vect_off);
+ break;
+ case MO_16|MO_SIGN:
+ tcg_gen_ld16s_i64(tcg_dest, cpu_env, vect_off);
+ break;
+ case MO_32|MO_SIGN:
+ tcg_gen_ld32s_i64(tcg_dest, cpu_env, vect_off);
+ break;
+ case MO_64:
+ case MO_64|MO_SIGN:
+ tcg_gen_ld_i64(tcg_dest, cpu_env, vect_off);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+}
+
+/* Set value of an element within a vector register */
+static void write_vec_element(DisasContext *s, TCGv_i64 tcg_src, int destidx,
+ int element, TCGMemOp memop)
+{
+ int vect_off = vec_reg_offset(destidx, element, memop & MO_SIZE);
+ switch (memop) {
+ case MO_8:
+ tcg_gen_st8_i64(tcg_src, cpu_env, vect_off);
+ break;
+ case MO_16:
+ tcg_gen_st16_i64(tcg_src, cpu_env, vect_off);
+ break;
+ case MO_32:
+ tcg_gen_st32_i64(tcg_src, cpu_env, vect_off);
+ break;
+ case MO_64:
+ tcg_gen_st_i64(tcg_src, cpu_env, vect_off);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+}
+
+/* Clear the high 64 bits of a 128 bit vector (in general non-quad
+ * vector ops all need to do this).
+ */
+static void clear_vec_high(DisasContext *s, int rd)
+{
+ TCGv_i64 tcg_zero = tcg_const_i64(0);
+
+ write_vec_element(s, tcg_zero, rd, 1, MO_64);
+ tcg_temp_free_i64(tcg_zero);
+}
+
+/* Store from vector register to memory */
+static void do_vec_st(DisasContext *s, int srcidx, int element,
+ TCGv_i64 tcg_addr, int size)
+{
+ TCGMemOp memop = MO_TE + size;
+ TCGv_i64 tcg_tmp = tcg_temp_new_i64();
+
+ read_vec_element(s, tcg_tmp, srcidx, element, size);
+ tcg_gen_qemu_st_i64(tcg_tmp, tcg_addr, get_mem_index(s), memop);
+
+ tcg_temp_free_i64(tcg_tmp);
+}
+
+/* Load from memory to vector register */
+static void do_vec_ld(DisasContext *s, int destidx, int element,
+ TCGv_i64 tcg_addr, int size)
+{
+ TCGMemOp memop = MO_TE + size;
+ TCGv_i64 tcg_tmp = tcg_temp_new_i64();
+
+ tcg_gen_qemu_ld_i64(tcg_tmp, tcg_addr, get_mem_index(s), memop);
+ write_vec_element(s, tcg_tmp, destidx, element, size);
+
+ tcg_temp_free_i64(tcg_tmp);
+}
+
+/*
* This utility function is for doing register extension with an
* optional shift. You will likely want to pass a temporary for the
* destination register. See DecodeRegExtend() in the ARM ARM.
@@ -1835,10 +1959,129 @@ static void disas_ldst_reg(DisasContext *s, uint32_t insn)
}
}
-/* AdvSIMD load/store multiple structures */
+/* C3.3.1 AdvSIMD load/store multiple structures
+ *
+ * 31 30 29 23 22 21 16 15 12 11 10 9 5 4 0
+ * +---+---+---------------+---+-------------+--------+------+------+------+
+ * | 0 | Q | 0 0 1 1 0 0 0 | L | 0 0 0 0 0 0 | opcode | size | Rn | Rt |
+ * +---+---+---------------+---+-------------+--------+------+------+------+
+ *
+ * C3.3.2 AdvSIMD load/store multiple structures (post-indexed)
+ *
+ * 31 30 29 23 22 21 20 16 15 12 11 10 9 5 4 0
+ * +---+---+---------------+---+---+---------+--------+------+------+------+
+ * | 0 | Q | 0 0 1 1 0 0 1 | L | 0 | Rm | opcode | size | Rn | Rt |
+ * +---+---+---------------+---+---+---------+--------+------+------+------+
+ *
+ * Rt: first (or only) SIMD&FP register to be transferred
+ * Rn: base address or SP
+ * Rm (post-index only): post-index register (when !31) or size dependent #imm
+ */
static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ int rt = extract32(insn, 0, 5);
+ int rn = extract32(insn, 5, 5);
+ int size = extract32(insn, 10, 2);
+ int opcode = extract32(insn, 12, 4);
+ bool is_store = !extract32(insn, 22, 1);
+ bool is_postidx = extract32(insn, 23, 1);
+ bool is_q = extract32(insn, 30, 1);
+ TCGv_i64 tcg_addr;
+
+ int ebytes = 1 << size;
+ int elements = (is_q ? 128 : 64) / (8 << size);
+ int rpt; /* num iterations */
+ int selem; /* structure elements */
+ int r;
+
+ if (extract32(insn, 31, 1) || extract32(insn, 21, 1)) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ /* From the shared decode logic */
+ switch (opcode) {
+ case 0x0:
+ rpt = 1;
+ selem = 4;
+ break;
+ case 0x2:
+ rpt = 4;
+ selem = 1;
+ break;
+ case 0x4:
+ rpt = 1;
+ selem = 3;
+ break;
+ case 0x6:
+ rpt = 3;
+ selem = 1;
+ break;
+ case 0x7:
+ rpt = 1;
+ selem = 1;
+ break;
+ case 0x8:
+ rpt = 1;
+ selem = 2;
+ break;
+ case 0xa:
+ rpt = 2;
+ selem = 1;
+ break;
+ default:
+ unallocated_encoding(s);
+ return;
+ }
+
+ if (size == 3 && !is_q && selem != 1) {
+ /* reserved */
+ unallocated_encoding(s);
+ return;
+ }
+
+ tcg_addr = read_cpu_reg_sp(s, rn, 1);
+
+ if (rn == 31) {
+ gen_check_sp_alignment(s);
+ }
+
+ for (r = 0; r < rpt; r++) {
+ int e;
+ for (e = 0; e < elements; e++) {
+ int tt = (rt + r) % 32;
+ int xs;
+ for (xs = 0; xs < selem; xs++) {
+ if (is_store) {
+ do_vec_st(s, tt, e, tcg_addr, size);
+ } else {
+ do_vec_ld(s, tt, e, tcg_addr, size);
+
+ /* For non-quad operations, setting a slice of the low
+ * 64 bits of the register clears the high 64 bits (in
+ * the ARM ARM pseudocode this is implicit in the fact
+ * that 'rval' is a 64 bit wide variable). We optimize
+ * by noticing that we only need to do this the first
+ * time we touch a register.
+ */
+ if (!is_q && e == 0 && (r == 0 || xs == selem - 1)) {
+ clear_vec_high(s, tt);
+ }
+ }
+ tcg_gen_addi_i64(tcg_addr, tcg_addr, ebytes);
+ tt = (tt + 1) % 32;
+ }
+ }
+ }
+
+ if (is_postidx) {
+ int rm = extract32(insn, 16, 5);
+ if (rm == 31) {
+ tcg_gen_mov_i64(cpu_reg_sp(s, rn), tcg_addr);
+ } else {
+ tcg_gen_add_i64(cpu_reg_sp(s, rn), cpu_reg(s, rn), cpu_reg(s, rm));
+ }
+ }
}
/* AdvSIMD load/store single structure */
--
1.8.5
next prev parent reply other threads:[~2014-01-10 17:27 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-10 17:12 [Qemu-devel] [PATCH 00/10] A64 SIMD patchset one: ld/st, C3.6.1..C3.6.7 Peter Maydell
2014-01-10 17:12 ` Peter Maydell [this message]
2014-01-10 18:05 ` [Qemu-devel] [PATCH 01/10] target-arm: A64: Add SIMD ld/st multiple Richard Henderson
2014-01-10 18:18 ` Peter Maydell
2014-01-10 18:28 ` Richard Henderson
2014-01-10 18:37 ` Peter Maydell
2014-01-10 19:00 ` Richard Henderson
2014-01-10 17:12 ` [Qemu-devel] [PATCH 02/10] target-arm: A64: Add SIMD ld/st single Peter Maydell
2014-01-10 18:12 ` Richard Henderson
2014-01-10 17:12 ` [Qemu-devel] [PATCH 03/10] target-arm: A64: Add decode skeleton for SIMD data processing insns Peter Maydell
2014-01-10 18:55 ` Richard Henderson
2014-01-10 19:05 ` Richard Henderson
2014-01-11 0:01 ` Peter Maydell
2014-01-10 17:12 ` [Qemu-devel] [PATCH 04/10] target-arm: A64: Add SIMD EXT Peter Maydell
2014-01-10 19:13 ` Richard Henderson
2014-01-10 17:12 ` [Qemu-devel] [PATCH 05/10] target-arm: A64: Add SIMD TBL/TBLX Peter Maydell
2014-01-10 19:19 ` Richard Henderson
2014-01-10 17:12 ` [Qemu-devel] [PATCH 06/10] target-arm: A64: Add SIMD ZIP/UZP/TRN Peter Maydell
2014-01-10 19:29 ` Richard Henderson
2014-01-11 8:30 ` Alex Bennée
2014-01-10 17:12 ` [Qemu-devel] [PATCH 07/10] target-arm: A64: Add SIMD across-lanes instructions Peter Maydell
2014-01-10 19:38 ` Richard Henderson
2014-01-10 17:12 ` [Qemu-devel] [PATCH 08/10] target-arm: A64: Add SIMD copy operations Peter Maydell
2014-01-10 19:50 ` Richard Henderson
2014-01-10 17:12 ` [Qemu-devel] [PATCH 09/10] target-arm: A64: Add SIMD modified immediate group Peter Maydell
2014-01-10 20:00 ` Richard Henderson
2014-01-10 17:12 ` [Qemu-devel] [PATCH 10/10] target-arm: A64: Add SIMD scalar copy instructions Peter Maydell
2014-01-10 20:03 ` Richard Henderson
2014-01-15 15:10 ` Claudio Fontana
2014-01-15 18:01 ` Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1389373972-27686-2-git-send-email-peter.maydell@linaro.org \
--to=peter.maydell@linaro.org \
--cc=agraf@suse.de \
--cc=alex.bennee@linaro.org \
--cc=christoffer.dall@linaro.org \
--cc=claudio.fontana@linaro.org \
--cc=dmueller@suse.de \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=laurent.desnogues@gmail.com \
--cc=matz@suse.de \
--cc=patches@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
--cc=will.newton@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).