[Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions
@ 2019-02-25 20:03 David Hildenbrand
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 1/7] s390x/tcg: RXE has an optional M3 field David Hildenbrand
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-02-25 20:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Before we start with the real magic, some more cleanups and refactorings.
This series does not depend on other patches not yet in master.

Also add a variant of "LOAD LENGTHENED" that is used along with
vector instructions in linux (HFP instructions that can be used without
HFP  ). Implement "LOAD COUNT TO BLOCK BOUNDARY", introduced with
vector facility but not operating on vectors.

v1 -> v2:
- "s390x/tcg: Simplify disassembler operands initialization"
-- s/simply/simplify/ in description
- "s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY"
-- Use bit magic without a helper to calculate the count

David Hildenbrand (7):
  s390x/tcg: RXE has an optional M3 field
  s390x/tcg: Simplify disassembler operands initialization
  s390x/tcg: Clarify terminology in vec_reg_offset()
  s390x/tcg: Factor out vec_full_reg_offset()
  s390x/tcg: Factor out gen_addi_and_wrap_i64() from get_address()
  s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP
  s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY

 target/s390x/cc_helper.c     |  8 +++
 target/s390x/helper.c        |  1 +
 target/s390x/insn-data.def   |  4 ++
 target/s390x/insn-format.def |  2 +-
 target/s390x/internal.h      |  1 +
 target/s390x/translate.c     | 94 +++++++++++++++++++++++++-----------
 6 files changed, 80 insertions(+), 30 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 1/7] s390x/tcg: RXE has an optional M3 field
  2019-02-25 20:03 [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
@ 2019-02-25 20:03 ` David Hildenbrand
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 2/7] s390x/tcg: Simplify disassembler operands initialization David Hildenbrand
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-02-25 20:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Will be needed, so add it to the format description.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-format.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/insn-format.def b/target/s390x/insn-format.def
index a412d90fb7..4297ff4165 100644
--- a/target/s390x/insn-format.def
+++ b/target/s390x/insn-format.def
@@ -36,7 +36,7 @@ F3(RSY_a, R(1, 8),     BDL(2),      R(3,12))
 F3(RSY_b, R(1, 8),     BDL(2),      M(3,12))
 F2(RX_a,  R(1, 8),     BXD(2))
 F2(RX_b,  M(1, 8),     BXD(2))
-F2(RXE,   R(1, 8),     BXD(2))
+F3(RXE,   R(1, 8),     BXD(2),      M(3,32))
 F3(RXF,   R(1,32),     BXD(2),      R(3, 8))
 F2(RXY_a, R(1, 8),     BXDL(2))
 F2(RXY_b, M(1, 8),     BXDL(2))
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 2/7] s390x/tcg: Simplify disassembler operands initialization
  2019-02-25 20:03 [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 1/7] s390x/tcg: RXE has an optional M3 field David Hildenbrand
@ 2019-02-25 20:03 ` David Hildenbrand
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 3/7] s390x/tcg: Clarify terminology in vec_reg_offset() David Hildenbrand
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-02-25 20:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Let's simplify initialization to 0.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/translate.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 19072efec6..c646e50eb3 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -6091,7 +6091,7 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
     const DisasInsn *insn;
     DisasJumpType ret = DISAS_NEXT;
     DisasFields f;
-    DisasOps o;
+    DisasOps o = {};
 
     /* Search for the insn in the table.  */
     insn = extract_insn(env, s, &f);
@@ -6161,12 +6161,6 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
     /* Set up the strutures we use to communicate with the helpers. */
     s->insn = insn;
     s->fields = &f;
-    o.g_out = o.g_out2 = o.g_in1 = o.g_in2 = false;
-    o.out = NULL;
-    o.out2 = NULL;
-    o.in1 = NULL;
-    o.in2 = NULL;
-    o.addr1 = NULL;
 
     /* Implement the instruction.  */
     if (insn->help_in1) {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 3/7] s390x/tcg: Clarify terminology in vec_reg_offset()
  2019-02-25 20:03 [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 1/7] s390x/tcg: RXE has an optional M3 field David Hildenbrand
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 2/7] s390x/tcg: Simplify disassembler operands initialization David Hildenbrand
@ 2019-02-25 20:03 ` David Hildenbrand
  2019-02-25 22:28   ` David Hildenbrand
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 4/7] s390x/tcg: Factor out vec_full_reg_offset() David Hildenbrand
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2019-02-25 20:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

We will use s390x speak "Element Size" (es) for MO_8 == 0, MO_16 == 1
... Simple rename of variables.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/translate.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index c646e50eb3..5e3955e4d7 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -145,10 +145,11 @@ void s390x_translate_init(void)
     }
 }
 
-static inline int vec_reg_offset(uint8_t reg, uint8_t enr, TCGMemOp size)
+static inline int vec_reg_offset(uint8_t reg, uint8_t enr, TCGMemOp es)
 {
-    const uint8_t es = 1 << size;
-    int offs = enr * es;
+    /* Convert element size (es) - e.g. MO_U8 - to bytes */
+    const uint8_t bytes = 1 << es;
+    int offs = enr * bytes;
 
     g_assert(reg < 32);
     /*
@@ -173,9 +174,9 @@ static inline int vec_reg_offset(uint8_t reg, uint8_t enr, TCGMemOp size)
      * the two 8 byte elements have to be loaded separately. Let's force all
      * 16 byte operations to handle it in a special way.
      */
-    g_assert(size <= MO_64);
+    g_assert(es <= MO_64);
 #ifndef HOST_WORDS_BIGENDIAN
-    offs ^= (8 - es);
+    offs ^= (8 - bytes);
 #endif
     return offs + offsetof(CPUS390XState, vregs[reg][0].d);
 }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/7] s390x/tcg: Clarify terminology in vec_reg_offset()
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 3/7] s390x/tcg: Clarify terminology in vec_reg_offset() David Hildenbrand
@ 2019-02-25 22:28   ` David Hildenbrand
  2019-02-25 22:44     ` Cornelia Huck
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2019-02-25 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson

On 25.02.19 21:03, David Hildenbrand wrote:
> We will use s390x speak "Element Size" (es) for MO_8 == 0, MO_16 == 1
> ... Simple rename of variables.
> 
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/translate.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/target/s390x/translate.c b/target/s390x/translate.c
> index c646e50eb3..5e3955e4d7 100644
> --- a/target/s390x/translate.c
> +++ b/target/s390x/translate.c
> @@ -145,10 +145,11 @@ void s390x_translate_init(void)
>      }
>  }
>  
> -static inline int vec_reg_offset(uint8_t reg, uint8_t enr, TCGMemOp size)
> +static inline int vec_reg_offset(uint8_t reg, uint8_t enr, TCGMemOp es)
>  {
> -    const uint8_t es = 1 << size;
> -    int offs = enr * es;
> +    /* Convert element size (es) - e.g. MO_U8 - to bytes */

s/MO_U8/MO_8/  :(

Conny, I assume you can fix that up in case there are no other comments.
Thanks!

> +    const uint8_t bytes = 1 << es;
> +    int offs = enr * bytes;
>  
>      g_assert(reg < 32);
>      /*
> @@ -173,9 +174,9 @@ static inline int vec_reg_offset(uint8_t reg, uint8_t enr, TCGMemOp size)
>       * the two 8 byte elements have to be loaded separately. Let's force all
>       * 16 byte operations to handle it in a special way.
>       */
> -    g_assert(size <= MO_64);
> +    g_assert(es <= MO_64);
>  #ifndef HOST_WORDS_BIGENDIAN
> -    offs ^= (8 - es);
> +    offs ^= (8 - bytes);
>  #endif
>      return offs + offsetof(CPUS390XState, vregs[reg][0].d);
>  }
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/7] s390x/tcg: Clarify terminology in vec_reg_offset()
  2019-02-25 22:28   ` David Hildenbrand
@ 2019-02-25 22:44     ` Cornelia Huck
  0 siblings, 0 replies; 13+ messages in thread
From: Cornelia Huck @ 2019-02-25 22:44 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: qemu-devel, qemu-s390x, Thomas Huth, Richard Henderson

On Mon, 25 Feb 2019 23:28:27 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 25.02.19 21:03, David Hildenbrand wrote:
> > We will use s390x speak "Element Size" (es) for MO_8 == 0, MO_16 == 1
> > ... Simple rename of variables.
> > 
> > Reviewed-by: Thomas Huth <thuth@redhat.com>
> > Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> > Signed-off-by: David Hildenbrand <david@redhat.com>
> > ---
> >  target/s390x/translate.c | 11 ++++++-----
> >  1 file changed, 6 insertions(+), 5 deletions(-)
> > 
> > diff --git a/target/s390x/translate.c b/target/s390x/translate.c
> > index c646e50eb3..5e3955e4d7 100644
> > --- a/target/s390x/translate.c
> > +++ b/target/s390x/translate.c
> > @@ -145,10 +145,11 @@ void s390x_translate_init(void)
> >      }
> >  }
> >  
> > -static inline int vec_reg_offset(uint8_t reg, uint8_t enr, TCGMemOp size)
> > +static inline int vec_reg_offset(uint8_t reg, uint8_t enr, TCGMemOp es)
> >  {
> > -    const uint8_t es = 1 << size;
> > -    int offs = enr * es;
> > +    /* Convert element size (es) - e.g. MO_U8 - to bytes */  
> 
> s/MO_U8/MO_8/  :(
> 
> Conny, I assume you can fix that up in case there are no other comments.
> Thanks!

Sure, no problem.

> 
> > +    const uint8_t bytes = 1 << es;
> > +    int offs = enr * bytes;
> >  
> >      g_assert(reg < 32);
> >      /*
> > @@ -173,9 +174,9 @@ static inline int vec_reg_offset(uint8_t reg, uint8_t enr, TCGMemOp size)
> >       * the two 8 byte elements have to be loaded separately. Let's force all
> >       * 16 byte operations to handle it in a special way.
> >       */
> > -    g_assert(size <= MO_64);
> > +    g_assert(es <= MO_64);
> >  #ifndef HOST_WORDS_BIGENDIAN
> > -    offs ^= (8 - es);
> > +    offs ^= (8 - bytes);
> >  #endif
> >      return offs + offsetof(CPUS390XState, vregs[reg][0].d);
> >  }
> >   
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 4/7] s390x/tcg: Factor out vec_full_reg_offset()
  2019-02-25 20:03 [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
                   ` (2 preceding siblings ...)
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 3/7] s390x/tcg: Clarify terminology in vec_reg_offset() David Hildenbrand
@ 2019-02-25 20:03 ` David Hildenbrand
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 5/7] s390x/tcg: Factor out gen_addi_and_wrap_i64() from get_address() David Hildenbrand
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-02-25 20:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

We'll use that a lot along with gvec helpers, to calculate the start
address of a vector.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/translate.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 5e3955e4d7..6d36cfed2a 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -145,13 +145,18 @@ void s390x_translate_init(void)
     }
 }
 
+static inline int vec_full_reg_offset(uint8_t reg)
+{
+    g_assert(reg < 32);
+    return offsetof(CPUS390XState, vregs[reg][0].d);
+}
+
 static inline int vec_reg_offset(uint8_t reg, uint8_t enr, TCGMemOp es)
 {
     /* Convert element size (es) - e.g. MO_U8 - to bytes */
     const uint8_t bytes = 1 << es;
     int offs = enr * bytes;
 
-    g_assert(reg < 32);
     /*
      * vregs[n][0] is the lowest 8 byte and vregs[n][1] the highest 8 byte
      * of the 16 byte vector, on both, little and big endian systems.
@@ -178,7 +183,7 @@ static inline int vec_reg_offset(uint8_t reg, uint8_t enr, TCGMemOp es)
 #ifndef HOST_WORDS_BIGENDIAN
     offs ^= (8 - bytes);
 #endif
-    return offs + offsetof(CPUS390XState, vregs[reg][0].d);
+    return offs + vec_full_reg_offset(reg);
 }
 
 static inline int freg64_offset(uint8_t reg)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 5/7] s390x/tcg: Factor out gen_addi_and_wrap_i64() from get_address()
  2019-02-25 20:03 [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
                   ` (3 preceding siblings ...)
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 4/7] s390x/tcg: Factor out vec_full_reg_offset() David Hildenbrand
@ 2019-02-25 20:03 ` David Hildenbrand
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 6/7] s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP David Hildenbrand
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-02-25 20:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Also properly wrap in 24bit mode. While at it, convert the comment (and
drop the comment about fundamental TCG optimizations).

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/translate.c | 41 +++++++++++++++++++++++++---------------
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 6d36cfed2a..cc52300334 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -382,32 +382,43 @@ static inline void gen_trap(DisasContext *s)
     gen_data_exception(0xff);
 }
 
+static void gen_addi_and_wrap_i64(DisasContext *s, TCGv_i64 dst, TCGv_i64 src,
+                                  int64_t imm)
+{
+    tcg_gen_addi_i64(dst, src, imm);
+    if (!(s->base.tb->flags & FLAG_MASK_64)) {
+        if (s->base.tb->flags & FLAG_MASK_32) {
+            tcg_gen_andi_i64(dst, dst, 0x7fffffff);
+        } else {
+            tcg_gen_andi_i64(dst, dst, 0x00ffffff);
+        }
+    }
+}
+
 static TCGv_i64 get_address(DisasContext *s, int x2, int b2, int d2)
 {
     TCGv_i64 tmp = tcg_temp_new_i64();
-    bool need_31 = !(s->base.tb->flags & FLAG_MASK_64);
-
-    /* Note that d2 is limited to 20 bits, signed.  If we crop negative
-       displacements early we create larger immedate addends.  */
 
-    /* Note that addi optimizes the imm==0 case.  */
+    /*
+     * Note that d2 is limited to 20 bits, signed.  If we crop negative
+     * displacements early we create larger immedate addends.
+     */
     if (b2 && x2) {
         tcg_gen_add_i64(tmp, regs[b2], regs[x2]);
-        tcg_gen_addi_i64(tmp, tmp, d2);
+        gen_addi_and_wrap_i64(s, tmp, tmp, d2);
     } else if (b2) {
-        tcg_gen_addi_i64(tmp, regs[b2], d2);
+        gen_addi_and_wrap_i64(s, tmp, regs[b2], d2);
     } else if (x2) {
-        tcg_gen_addi_i64(tmp, regs[x2], d2);
-    } else {
-        if (need_31) {
-            d2 &= 0x7fffffff;
-            need_31 = false;
+        gen_addi_and_wrap_i64(s, tmp, regs[x2], d2);
+    } else if (!(s->base.tb->flags & FLAG_MASK_64)) {
+        if (s->base.tb->flags & FLAG_MASK_32) {
+            tcg_gen_movi_i64(tmp, d2 & 0x7fffffff);
+        } else {
+            tcg_gen_movi_i64(tmp, d2 & 0x00ffffff);
         }
+    } else {
         tcg_gen_movi_i64(tmp, d2);
     }
-    if (need_31) {
-        tcg_gen_andi_i64(tmp, tmp, 0x7fffffff);
-    }
 
     return tmp;
 }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 6/7] s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP
  2019-02-25 20:03 [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
                   ` (4 preceding siblings ...)
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 5/7] s390x/tcg: Factor out gen_addi_and_wrap_i64() from get_address() David Hildenbrand
@ 2019-02-25 20:03 ` David Hildenbrand
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 7/7] s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY David Hildenbrand
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-02-25 20:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Nice trick to load a 32 bit value into vector element 0 (32 bit element
size) from memory, zeroing out element1. The short HFP to long HFP
conversion really only is a shift.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def | 2 ++
 target/s390x/translate.c   | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 61582372ab..fb6ee18650 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -598,6 +598,8 @@
     F(0xed04, LDEB,    RXE,   Z,   0, m2_32u, new, f1, ldeb, 0, IF_BFP)
     F(0xed05, LXDB,    RXE,   Z,   0, m2_64, new_P, x1, lxdb, 0, IF_BFP)
     F(0xed06, LXEB,    RXE,   Z,   0, m2_32u, new_P, x1, lxeb, 0, IF_BFP)
+    F(0xb324, LDER,    RXE,   Z,   0, e2, new, f1, lde, 0, IF_AFP1)
+    F(0xed24, LDE,     RXE,   Z,   0, m2_32u, new, f1, lde, 0, IF_AFP1)
 /* LOAD ROUNDED */
     F(0xb344, LEDBR,   RRE,   Z,   0, f2, new, e1, ledb, 0, IF_BFP)
     F(0xb345, LDXBR,   RRE,   Z,   x2h, x2l, new, f1, ldxb, 0, IF_BFP)
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index cc52300334..6515aa028a 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -2725,6 +2725,12 @@ static DisasJumpType op_lxeb(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_lde(DisasContext *s, DisasOps *o)
+{
+    tcg_gen_shli_i64(o->out, o->in2, 32);
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_llgt(DisasContext *s, DisasOps *o)
 {
     tcg_gen_andi_i64(o->out, o->in2, 0x7fffffff);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 7/7] s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY
  2019-02-25 20:03 [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
                   ` (5 preceding siblings ...)
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 6/7] s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP David Hildenbrand
@ 2019-02-25 20:03 ` David Hildenbrand
  2019-02-25 20:18   ` Richard Henderson
  2019-02-25 20:04 ` [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
  2019-02-26  9:15 ` Cornelia Huck
  8 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2019-02-25 20:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Use a new CC helper to calculate the CC lazily if needed. While the
PoP mentions that "A 32-bit unsigned binary integer" is placed into the
first operand, there is no word telling that the other 32 bits (high
part) are left untouched. Maybe the other 32-bit are unpredictable.
So store 64 bit for now.

Bit magic courtesy of Richard.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/cc_helper.c   |  8 ++++++++
 target/s390x/helper.c      |  1 +
 target/s390x/insn-data.def |  2 ++
 target/s390x/internal.h    |  1 +
 target/s390x/translate.c   | 19 +++++++++++++++++++
 5 files changed, 31 insertions(+)

diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c
index 307ad61aee..0e467bf2b6 100644
--- a/target/s390x/cc_helper.c
+++ b/target/s390x/cc_helper.c
@@ -397,6 +397,11 @@ static uint32_t cc_calc_flogr(uint64_t dst)
     return dst ? 2 : 0;
 }
 
+static uint32_t cc_calc_lcbb(uint64_t dst)
+{
+    return dst == 16 ? 0 : 3;
+}
+
 static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op,
                                   uint64_t src, uint64_t dst, uint64_t vr)
 {
@@ -506,6 +511,9 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op,
     case CC_OP_FLOGR:
         r = cc_calc_flogr(dst);
         break;
+    case CC_OP_LCBB:
+        r = cc_calc_lcbb(dst);
+        break;
 
     case CC_OP_NZ_F32:
         r = set_cc_nz_f32(dst);
diff --git a/target/s390x/helper.c b/target/s390x/helper.c
index a7edd5df7d..8e9573221c 100644
--- a/target/s390x/helper.c
+++ b/target/s390x/helper.c
@@ -417,6 +417,7 @@ const char *cc_name(enum cc_op cc_op)
         [CC_OP_SLA_32]    = "CC_OP_SLA_32",
         [CC_OP_SLA_64]    = "CC_OP_SLA_64",
         [CC_OP_FLOGR]     = "CC_OP_FLOGR",
+        [CC_OP_LCBB]      = "CC_OP_LCBB",
     };
 
     return cc_names[cc_op];
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index fb6ee18650..f4f1d63ab4 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -479,6 +479,8 @@
     F(0xb313, LCDBR,   RRE,   Z,   0, f2, new, f1, negf64, f64, IF_BFP)
     F(0xb343, LCXBR,   RRE,   Z,   x2h, x2l, new_P, x1, negf128, f128, IF_BFP)
     F(0xb373, LCDFR,   RRE,   FPSSH, 0, f2, new, f1, negf64, 0, IF_AFP1 | IF_AFP2)
+/* LOAD COUNT TO BLOCK BOUNDARY */
+    C(0xe727, LCBB,    RXE,   V,   la2, 0, r1, 0, lcbb, 0)
 /* LOAD HALFWORD */
     C(0xb927, LHR,     RRE,   EI,  0, r2_16s, 0, r1_32, mov2, 0)
     C(0xb907, LGHR,    RRE,   EI,  0, r2_16s, 0, r1, mov2, 0)
diff --git a/target/s390x/internal.h b/target/s390x/internal.h
index b2966a3adc..9d0a45d1fe 100644
--- a/target/s390x/internal.h
+++ b/target/s390x/internal.h
@@ -236,6 +236,7 @@ enum cc_op {
     CC_OP_SLA_32,               /* Calculate shift left signed (32bit) */
     CC_OP_SLA_64,               /* Calculate shift left signed (64bit) */
     CC_OP_FLOGR,                /* find leftmost one */
+    CC_OP_LCBB,                 /* load count to block boundary */
     CC_OP_MAX
 };
 
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 6515aa028a..170fbb8cd6 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -557,6 +557,7 @@ static void gen_op_calc_cc(DisasContext *s)
     case CC_OP_NZ_F32:
     case CC_OP_NZ_F64:
     case CC_OP_FLOGR:
+    case CC_OP_LCBB:
         /* 1 argument */
         gen_helper_calc_cc(cc_op, cpu_env, local_cc_op, dummy, cc_dst, dummy);
         break;
@@ -3142,6 +3143,23 @@ static DisasJumpType op_lzrb(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_lcbb(DisasContext *s, DisasOps *o)
+{
+    const int64_t block_size = (1ull << (get_field(s->fields, m3) + 6));
+
+    if (get_field(s->fields, m3) > 6) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tcg_gen_ori_i64(o->addr1, o->addr1, -block_size);
+    tcg_gen_neg_i64(o->addr1, o->addr1);
+    tcg_gen_movi_i64(o->out, 16);
+    tcg_gen_umin_i64(o->out, o->out, o->addr1);
+    gen_op_update1_cc_i64(s, CC_OP_LCBB, o->out);
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_mov2(DisasContext *s, DisasOps *o)
 {
     o->out = o->in2;
@@ -5931,6 +5949,7 @@ enum DisasInsnEnum {
 #define FAC_ECT         S390_FEAT_EXTRACT_CPU_TIME
 #define FAC_PCI         S390_FEAT_ZPCI /* z/PCI facility */
 #define FAC_AIS         S390_FEAT_ADAPTER_INT_SUPPRESSION
+#define FAC_V           S390_FEAT_VECTOR /* vector facility */
 
 static const DisasInsn insn_info[] = {
 #include "insn-data.def"
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 7/7] s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY David Hildenbrand
@ 2019-02-25 20:18   ` Richard Henderson
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2019-02-25 20:18 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Thomas Huth, Cornelia Huck

On 2/25/19 12:03 PM, David Hildenbrand wrote:
> Use a new CC helper to calculate the CC lazily if needed. While the
> PoP mentions that "A 32-bit unsigned binary integer" is placed into the
> first operand, there is no word telling that the other 32 bits (high
> part) are left untouched. Maybe the other 32-bit are unpredictable.
> So store 64 bit for now.
> 
> Bit magic courtesy of Richard.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/cc_helper.c   |  8 ++++++++
>  target/s390x/helper.c      |  1 +
>  target/s390x/insn-data.def |  2 ++
>  target/s390x/internal.h    |  1 +
>  target/s390x/translate.c   | 19 +++++++++++++++++++
>  5 files changed, 31 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions
  2019-02-25 20:03 [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
                   ` (6 preceding siblings ...)
  2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 7/7] s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY David Hildenbrand
@ 2019-02-25 20:04 ` David Hildenbrand
  2019-02-26  9:15 ` Cornelia Huck
  8 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-02-25 20:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson

On 25.02.19 21:03, David Hildenbrand wrote:
> Before we start with the real magic, some more cleanups and refactorings.
> This series does not depend on other patches not yet in master.
> 
> Also add a variant of "LOAD LENGTHENED" that is used along with
> vector instructions in linux (HFP instructions that can be used without
> HFP  ). Implement "LOAD COUNT TO BLOCK BOUNDARY", introduced with
> vector facility but not operating on vectors.
> 
> v1 -> v2:
> - "s390x/tcg: Simplify disassembler operands initialization"
> -- s/simply/simplify/ in description
> - "s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY"
> -- Use bit magic without a helper to calculate the count

Oh, and dropped the "inline" from "s390x/tcg: Factor out
gen_addi_and_wrap_i64() from get_address()"

> 
> David Hildenbrand (7):
>   s390x/tcg: RXE has an optional M3 field
>   s390x/tcg: Simplify disassembler operands initialization
>   s390x/tcg: Clarify terminology in vec_reg_offset()
>   s390x/tcg: Factor out vec_full_reg_offset()
>   s390x/tcg: Factor out gen_addi_and_wrap_i64() from get_address()
>   s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP
>   s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY
> 
>  target/s390x/cc_helper.c     |  8 +++
>  target/s390x/helper.c        |  1 +
>  target/s390x/insn-data.def   |  4 ++
>  target/s390x/insn-format.def |  2 +-
>  target/s390x/internal.h      |  1 +
>  target/s390x/translate.c     | 94 +++++++++++++++++++++++++-----------
>  6 files changed, 80 insertions(+), 30 deletions(-)
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions
  2019-02-25 20:03 [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
                   ` (7 preceding siblings ...)
  2019-02-25 20:04 ` [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
@ 2019-02-26  9:15 ` Cornelia Huck
  8 siblings, 0 replies; 13+ messages in thread
From: Cornelia Huck @ 2019-02-26  9:15 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: qemu-devel, qemu-s390x, Thomas Huth, Richard Henderson

On Mon, 25 Feb 2019 21:03:11 +0100
David Hildenbrand <david@redhat.com> wrote:

> Before we start with the real magic, some more cleanups and refactorings.
> This series does not depend on other patches not yet in master.
> 
> Also add a variant of "LOAD LENGTHENED" that is used along with
> vector instructions in linux (HFP instructions that can be used without
> HFP  ). Implement "LOAD COUNT TO BLOCK BOUNDARY", introduced with
> vector facility but not operating on vectors.
> 
> v1 -> v2:
> - "s390x/tcg: Simplify disassembler operands initialization"
> -- s/simply/simplify/ in description
> - "s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY"
> -- Use bit magic without a helper to calculate the count
> 
> David Hildenbrand (7):
>   s390x/tcg: RXE has an optional M3 field
>   s390x/tcg: Simplify disassembler operands initialization
>   s390x/tcg: Clarify terminology in vec_reg_offset()
>   s390x/tcg: Factor out vec_full_reg_offset()
>   s390x/tcg: Factor out gen_addi_and_wrap_i64() from get_address()
>   s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP
>   s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY
> 
>  target/s390x/cc_helper.c     |  8 +++
>  target/s390x/helper.c        |  1 +
>  target/s390x/insn-data.def   |  4 ++
>  target/s390x/insn-format.def |  2 +-
>  target/s390x/internal.h      |  1 +
>  target/s390x/translate.c     | 94 +++++++++++++++++++++++++-----------
>  6 files changed, 80 insertions(+), 30 deletions(-)
> 

Thanks, applied.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-02-26  9:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-25 20:03 [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 1/7] s390x/tcg: RXE has an optional M3 field David Hildenbrand
2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 2/7] s390x/tcg: Simplify disassembler operands initialization David Hildenbrand
2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 3/7] s390x/tcg: Clarify terminology in vec_reg_offset() David Hildenbrand
2019-02-25 22:28   ` David Hildenbrand
2019-02-25 22:44     ` Cornelia Huck
2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 4/7] s390x/tcg: Factor out vec_full_reg_offset() David Hildenbrand
2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 5/7] s390x/tcg: Factor out gen_addi_and_wrap_i64() from get_address() David Hildenbrand
2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 6/7] s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP David Hildenbrand
2019-02-25 20:03 ` [Qemu-devel] [PATCH v2 7/7] s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY David Hildenbrand
2019-02-25 20:18   ` Richard Henderson
2019-02-25 20:04 ` [Qemu-devel] [PATCH v2 0/7] s390x/tcg: Cleanups and refactorings for vector instructions David Hildenbrand
2019-02-26  9:15 ` Cornelia Huck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).