qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] Hexagon (target/hexagon) Improve change-of-flow
@ 2022-10-24 23:51 Taylor Simpson
  2022-10-24 23:51 ` [PATCH v2 1/8] Hexagon (target/hexagon) Only use branch_taken when packet has multi cof Taylor Simpson
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Taylor Simpson @ 2022-10-24 23:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: tsimpson, richard.henderson, philmd, ale, anjo, bcain,
	quic_mathbern

This patch series improves change-of-flow handling.

Currently, we set the PC to a new address before exiting a TB.  The
ultimate goal is to use direct block chaining.  However, several steps
are needed along the way.

1)
When a packet has more than one change-of-flow (COF) instruction, only
the first one taken is considered.  The runtime bookkeeping is only
needed when there is more than one COF instruction in a packet.

2, 3)
Remove PC and next_PC from the runtime state and always use a
translation-time constant.  Note that next_PC is used by call instructions
to set LR and by conditional COF instructions to set the fall-through
address.

4, 5, 6)
Add helper overrides for COF instructions.  In particular, we must
distinguish those that use a PC-relative address for the destination.
These are candidates for direct block chaining later.

7)
Use direct block chaining for packets that have a single PC-relative
COF instruction.  Instead of generating the code while processing the
instruction, we record the effect in DisasContext and generate the code
during gen_end_tb.

8)
Use direct block chaining for tight loops.  We look for TBs that end
with an endloop0 that will branch back to the TB start address.


**** Changes in V2 ****
Simplify test in need_pkt_has_multi_cof
Address feedback from Matheus Tavares Bernardino <quic_mathbern@quicinc.com>
    Rearrange new-value-jump overrides
    Simplify gen_write_new_pc_addr



Taylor Simpson (8):
  Hexagon (target/hexagon) Only use branch_taken when packet has multi
    cof
  Hexagon (target/hexagon) Remove PC from the runtime state
  Hexagon (target/hexagon) Remove next_PC from runtime state
  Hexagon (target/hexagon) Add overrides for direct call instructions
  Hexagon (target/hexagon) Add overrides for compound compare and jump
  Hexagon (target/hexagon) Add overrides for various forms of jump
  Hexagon (target/hexagon) Use direct block chaining for direct
    jump/branch
  Hexagon (target/hexagon) Use direct block chaining for tight loops

 target/hexagon/cpu.h                |  18 +-
 target/hexagon/gen_tcg.h            | 390 ++++++++++++++++++++++++++++
 target/hexagon/insn.h               |   2 +
 target/hexagon/macros.h             |   6 +-
 target/hexagon/translate.h          |   6 +-
 target/hexagon/decode.c             |  15 +-
 target/hexagon/genptr.c             | 260 +++++++++++++++++++
 target/hexagon/op_helper.c          |  28 +-
 target/hexagon/translate.c          | 120 +++++++--
 target/hexagon/gen_helper_funcs.py  |  11 +
 target/hexagon/gen_helper_protos.py |  12 +-
 target/hexagon/gen_tcg_funcs.py     |  11 +
 target/hexagon/hex_common.py        |  29 ++-
 13 files changed, 863 insertions(+), 45 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/8] Hexagon (target/hexagon) Only use branch_taken when packet has multi cof
  2022-10-24 23:51 [PATCH v2 0/8] Hexagon (target/hexagon) Improve change-of-flow Taylor Simpson
@ 2022-10-24 23:51 ` Taylor Simpson
  2022-10-24 23:51 ` [PATCH v2 2/8] Hexagon (target/hexagon) Remove PC from the runtime state Taylor Simpson
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Taylor Simpson @ 2022-10-24 23:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: tsimpson, richard.henderson, philmd, ale, anjo, bcain,
	quic_mathbern

When a packet has more than one change-of-flow instruction, only the first
one to branch is considered.  We use the branch_taken variable to keep
track of this.

However, when there is a single cof instruction, we don't need the same
amount of bookkeeping.

We add the pkt_has_multi_cof member to the Packet structure, and pass this
information to the needed functions.

When there is a generated helper function with cof, the generator will
pass this pkt_has_multi_cof as a runtime value.

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/insn.h               |  1 +
 target/hexagon/macros.h             |  2 +-
 target/hexagon/decode.c             | 15 +++++++++++++--
 target/hexagon/op_helper.c          | 24 +++++++++++++++---------
 target/hexagon/translate.c          |  4 +++-
 target/hexagon/gen_helper_funcs.py  |  3 +++
 target/hexagon/gen_helper_protos.py |  6 +++++-
 target/hexagon/gen_tcg_funcs.py     |  5 +++++
 target/hexagon/hex_common.py        |  3 +++
 9 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/target/hexagon/insn.h b/target/hexagon/insn.h
index aa26389147..857a7ceb75 100644
--- a/target/hexagon/insn.h
+++ b/target/hexagon/insn.h
@@ -60,6 +60,7 @@ struct Packet {
 
     /* Pre-decodes about COF */
     bool pkt_has_cof;          /* Has any change-of-flow */
+    bool pkt_has_multi_cof;    /* Has more than one change-of-flow */
     bool pkt_has_endloop;
 
     bool pkt_has_dczeroa;
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index c8805bdaeb..e908405d82 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -407,7 +407,7 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, int shift)
 
 #define fCHECK_PCALIGN(A)
 
-#define fWRITE_NPC(A) write_new_pc(env, A)
+#define fWRITE_NPC(A) write_new_pc(env, pkt_has_multi_cof != 0, A)
 
 #define fBRANCH(LOC, TYPE)          fWRITE_NPC(LOC)
 #define fJUMPR(REGNO, TARGET, TYPE) fBRANCH(TARGET, COF_TYPE_JUMPR)
diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
index 6b73b5c60c..041c8de751 100644
--- a/target/hexagon/decode.c
+++ b/target/hexagon/decode.c
@@ -388,6 +388,7 @@ static void decode_set_insn_attr_fields(Packet *pkt)
     uint16_t opcode;
 
     pkt->pkt_has_cof = false;
+    pkt->pkt_has_multi_cof = false;
     pkt->pkt_has_endloop = false;
     pkt->pkt_has_dczeroa = false;
 
@@ -412,13 +413,23 @@ static void decode_set_insn_attr_fields(Packet *pkt)
             }
         }
 
-        pkt->pkt_has_cof |= decode_opcode_can_jump(opcode);
+        if (decode_opcode_can_jump(opcode)) {
+            if (pkt->pkt_has_cof) {
+                pkt->pkt_has_multi_cof = true;
+            }
+            pkt->pkt_has_cof = true;
+        }
 
         pkt->insn[i].is_endloop = decode_opcode_ends_loop(opcode);
 
         pkt->pkt_has_endloop |= pkt->insn[i].is_endloop;
 
-        pkt->pkt_has_cof |= pkt->pkt_has_endloop;
+        if (pkt->pkt_has_endloop) {
+            if (pkt->pkt_has_cof) {
+                pkt->pkt_has_multi_cof = true;
+            }
+            pkt->pkt_has_cof = true;
+        }
     }
 }
 
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 085afc3274..84391e25eb 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -104,20 +104,26 @@ static void log_store64(CPUHexagonState *env, target_ulong addr,
     env->mem_log_stores[slot].data64 = val;
 }
 
-static void write_new_pc(CPUHexagonState *env, target_ulong addr)
+static void write_new_pc(CPUHexagonState *env, bool pkt_has_multi_cof,
+                         target_ulong addr)
 {
     HEX_DEBUG_LOG("write_new_pc(0x" TARGET_FMT_lx ")\n", addr);
 
-    /*
-     * If more than one branch is taken in a packet, only the first one
-     * is actually done.
-     */
-    if (env->branch_taken) {
-        HEX_DEBUG_LOG("INFO: multiple branches taken in same packet, "
-                      "ignoring the second one\n");
+    if (pkt_has_multi_cof) {
+        /*
+         * If more than one branch is taken in a packet, only the first one
+         * is actually done.
+         */
+        if (env->branch_taken) {
+            HEX_DEBUG_LOG("INFO: multiple branches taken in same packet, "
+                          "ignoring the second one\n");
+        } else {
+            fCHECK_PCALIGN(addr);
+            env->next_PC = addr;
+            env->branch_taken = 1;
+        }
     } else {
         fCHECK_PCALIGN(addr);
-        env->branch_taken = 1;
         env->next_PC = addr;
     }
 }
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 2329177537..2e46cc0680 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -247,7 +247,9 @@ static void gen_start_packet(DisasContext *ctx, Packet *pkt)
         tcg_gen_movi_tl(hex_slot_cancelled, 0);
     }
     if (pkt->pkt_has_cof) {
-        tcg_gen_movi_tl(hex_branch_taken, 0);
+        if (pkt->pkt_has_multi_cof) {
+            tcg_gen_movi_tl(hex_branch_taken, 0);
+        }
         tcg_gen_movi_tl(hex_next_PC, next_PC);
     }
     if (need_pred_written(pkt)) {
diff --git a/target/hexagon/gen_helper_funcs.py b/target/hexagon/gen_helper_funcs.py
index a446c45384..f7c1a82e9f 100755
--- a/target/hexagon/gen_helper_funcs.py
+++ b/target/hexagon/gen_helper_funcs.py
@@ -238,6 +238,9 @@ def gen_helper_function(f, tag, tagregs, tagimms):
             gen_helper_arg_imm(f,immlett)
             i += 1
 
+        if (hex_common.need_pkt_has_multi_cof(tag)):
+            f.write(", uint32_t pkt_has_multi_cof")
+
         if hex_common.need_slot(tag):
             if i > 0: f.write(", ")
             f.write("uint32_t slot")
diff --git a/target/hexagon/gen_helper_protos.py b/target/hexagon/gen_helper_protos.py
index 3b4e993fd1..4530d7ba8d 100755
--- a/target/hexagon/gen_helper_protos.py
+++ b/target/hexagon/gen_helper_protos.py
@@ -82,6 +82,7 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
         ## Figure out how many arguments the helper will take
         if (numscalarresults == 0):
             def_helper_size = len(regs)+len(imms)+numscalarreadwrite+1
+            if hex_common.need_pkt_has_multi_cof(tag): def_helper_size += 1
             if hex_common.need_part1(tag): def_helper_size += 1
             if hex_common.need_slot(tag): def_helper_size += 1
             f.write('DEF_HELPER_%s(%s' % (def_helper_size, tag))
@@ -89,6 +90,7 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
             f.write(', void' )
         else:
             def_helper_size = len(regs)+len(imms)+numscalarreadwrite
+            if hex_common.need_pkt_has_multi_cof(tag): def_helper_size += 1
             if hex_common.need_part1(tag): def_helper_size += 1
             if hex_common.need_slot(tag): def_helper_size += 1
             f.write('DEF_HELPER_%s(%s' % (def_helper_size, tag))
@@ -126,7 +128,9 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
         for immlett,bits,immshift in imms:
             f.write(", s32")
 
-        ## Add the arguments for the instruction slot and part1 (if needed)
+        ## Add the arguments for the instruction pkt_has_multi_cof, slot and
+        ## part1 (if needed)
+        if hex_common.need_pkt_has_multi_cof(tag): f.write(', i32')
         if hex_common.need_slot(tag): f.write(', i32' )
         if hex_common.need_part1(tag): f.write(' , i32' )
         f.write(')\n')
diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index 6dea02b0b9..67045c80bb 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -615,6 +615,9 @@ def gen_tcg_func(f, tag, regs, imms):
         ## Generate the call to the helper
         for immlett,bits,immshift in imms:
             gen_helper_decl_imm(f,immlett)
+        if hex_common.need_pkt_has_multi_cof(tag):
+            f.write("    TCGv pkt_has_multi_cof = ")
+            f.write("tcg_constant_tl(pkt->pkt_has_multi_cof);\n")
         if hex_common.need_part1(tag):
             f.write("    TCGv part1 = tcg_constant_tl(insn->part1);\n")
         if hex_common.need_slot(tag):
@@ -647,6 +650,8 @@ def gen_tcg_func(f, tag, regs, imms):
         for immlett,bits,immshift in imms:
             gen_helper_call_imm(f,immlett)
 
+        if hex_common.need_pkt_has_multi_cof(tag):
+            f.write(", pkt_has_multi_cof")
         if hex_common.need_slot(tag): f.write(", slot")
         if hex_common.need_part1(tag): f.write(", part1" )
         f.write(");\n")
diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index d9ba7df786..f5b58501db 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -207,6 +207,9 @@ def need_part1(tag):
 def need_ea(tag):
     return re.compile(r"\bEA\b").search(semdict[tag])
 
+def need_pkt_has_multi_cof(tag):
+    return 'A_COF' in attribdict[tag]
+
 def skip_qemu_helper(tag):
     return tag in overrides.keys()
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/8] Hexagon (target/hexagon) Remove PC from the runtime state
  2022-10-24 23:51 [PATCH v2 0/8] Hexagon (target/hexagon) Improve change-of-flow Taylor Simpson
  2022-10-24 23:51 ` [PATCH v2 1/8] Hexagon (target/hexagon) Only use branch_taken when packet has multi cof Taylor Simpson
@ 2022-10-24 23:51 ` Taylor Simpson
  2022-10-24 23:51 ` [PATCH v2 3/8] Hexagon (target/hexagon) Remove next_PC from " Taylor Simpson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Taylor Simpson @ 2022-10-24 23:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: tsimpson, richard.henderson, philmd, ale, anjo, bcain,
	quic_mathbern

Add pc field to Packet structure
For helpers that need PC, pass an extra argument
Remove slot arg from conditional jump helpers
On a trap0, copy pkt->pc into hex_gpr[HEX_REG_PC]

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg.h            | 7 +++++++
 target/hexagon/insn.h               | 1 +
 target/hexagon/macros.h             | 2 +-
 target/hexagon/translate.c          | 9 +--------
 target/hexagon/gen_helper_funcs.py  | 4 ++++
 target/hexagon/gen_helper_protos.py | 3 +++
 target/hexagon/gen_tcg_funcs.py     | 3 +++
 target/hexagon/hex_common.py        | 6 +++++-
 8 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 50634ac459..7f0ba27eb6 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -742,4 +742,11 @@
         RsV = RsV; \
     } while (0)
 
+#define fGEN_TCG_J2_trap0(SHORTCODE) \
+    do { \
+        uiV = uiV; \
+        tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], pkt->pc); \
+        TCGv excp = tcg_constant_tl(HEX_EXCP_TRAP0); \
+        gen_helper_raise_exception(cpu_env, excp); \
+    } while (0)
 #endif
diff --git a/target/hexagon/insn.h b/target/hexagon/insn.h
index 857a7ceb75..b3260d1f0b 100644
--- a/target/hexagon/insn.h
+++ b/target/hexagon/insn.h
@@ -57,6 +57,7 @@ typedef struct Instruction Insn;
 struct Packet {
     uint16_t num_insns;
     uint16_t encod_pkt_size_in_bytes;
+    uint32_t pc;
 
     /* Pre-decodes about COF */
     bool pkt_has_cof;          /* Has any change-of-flow */
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index e908405d82..469dfa5571 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -398,7 +398,7 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, int shift)
 #else
 #define fREAD_GP() READ_REG(HEX_REG_GP)
 #endif
-#define fREAD_PC() (READ_REG(HEX_REG_PC))
+#define fREAD_PC() (PC)
 
 #define fREAD_NPC() (env->next_PC & (0xfffffffe))
 
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 2e46cc0680..fd4f0efa26 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -194,11 +194,6 @@ static bool check_for_attrib(Packet *pkt, int attrib)
     return false;
 }
 
-static bool need_pc(Packet *pkt)
-{
-    return check_for_attrib(pkt, A_IMPLICIT_READS_PC);
-}
-
 static bool need_slot_cancelled(Packet *pkt)
 {
     return check_for_attrib(pkt, A_CONDEXEC);
@@ -240,9 +235,6 @@ static void gen_start_packet(DisasContext *ctx, Packet *pkt)
     }
 
     /* Initialize the runtime state for packet semantics */
-    if (need_pc(pkt)) {
-        tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], ctx->base.pc_next);
-    }
     if (need_slot_cancelled(pkt)) {
         tcg_gen_movi_tl(hex_slot_cancelled, 0);
     }
@@ -768,6 +760,7 @@ static void decode_and_translate_packet(CPUHexagonState *env, DisasContext *ctx)
     }
 
     if (decode_packet(nwords, words, &pkt, false) > 0) {
+        pkt.pc = ctx->base.pc_next;
         HEX_DEBUG_PRINT_PKT(&pkt);
         gen_start_packet(ctx, &pkt);
         for (i = 0; i < pkt.num_insns; i++) {
diff --git a/target/hexagon/gen_helper_funcs.py b/target/hexagon/gen_helper_funcs.py
index f7c1a82e9f..8ab144b20a 100755
--- a/target/hexagon/gen_helper_funcs.py
+++ b/target/hexagon/gen_helper_funcs.py
@@ -241,6 +241,10 @@ def gen_helper_function(f, tag, tagregs, tagimms):
         if (hex_common.need_pkt_has_multi_cof(tag)):
             f.write(", uint32_t pkt_has_multi_cof")
 
+        if hex_common.need_PC(tag):
+            if i > 0: f.write(", ")
+            f.write("target_ulong PC")
+            i += 1
         if hex_common.need_slot(tag):
             if i > 0: f.write(", ")
             f.write("uint32_t slot")
diff --git a/target/hexagon/gen_helper_protos.py b/target/hexagon/gen_helper_protos.py
index 4530d7ba8d..2385717dda 100755
--- a/target/hexagon/gen_helper_protos.py
+++ b/target/hexagon/gen_helper_protos.py
@@ -85,6 +85,7 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
             if hex_common.need_pkt_has_multi_cof(tag): def_helper_size += 1
             if hex_common.need_part1(tag): def_helper_size += 1
             if hex_common.need_slot(tag): def_helper_size += 1
+            if hex_common.need_PC(tag): def_helper_size += 1
             f.write('DEF_HELPER_%s(%s' % (def_helper_size, tag))
             ## The return type is void
             f.write(', void' )
@@ -93,6 +94,7 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
             if hex_common.need_pkt_has_multi_cof(tag): def_helper_size += 1
             if hex_common.need_part1(tag): def_helper_size += 1
             if hex_common.need_slot(tag): def_helper_size += 1
+            if hex_common.need_PC(tag): def_helper_size += 1
             f.write('DEF_HELPER_%s(%s' % (def_helper_size, tag))
 
         ## Generate the qemu DEF_HELPER type for each result
@@ -131,6 +133,7 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
         ## Add the arguments for the instruction pkt_has_multi_cof, slot and
         ## part1 (if needed)
         if hex_common.need_pkt_has_multi_cof(tag): f.write(', i32')
+        if hex_common.need_PC(tag): f.write(', i32')
         if hex_common.need_slot(tag): f.write(', i32' )
         if hex_common.need_part1(tag): f.write(' , i32' )
         f.write(')\n')
diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index 67045c80bb..2225bb08da 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -622,6 +622,8 @@ def gen_tcg_func(f, tag, regs, imms):
             f.write("    TCGv part1 = tcg_constant_tl(insn->part1);\n")
         if hex_common.need_slot(tag):
             f.write("    TCGv slot = tcg_constant_tl(insn->slot);\n")
+        if hex_common.need_PC(tag):
+            f.write("    TCGv PC = tcg_constant_tl(pkt->pc);\n")
         f.write("    gen_helper_%s(" % (tag))
         i=0
         ## If there is a scalar result, it is the return type
@@ -652,6 +654,7 @@ def gen_tcg_func(f, tag, regs, imms):
 
         if hex_common.need_pkt_has_multi_cof(tag):
             f.write(", pkt_has_multi_cof")
+        if hex_common.need_PC(tag): f.write(", PC")
         if hex_common.need_slot(tag): f.write(", slot")
         if hex_common.need_part1(tag): f.write(", part1" )
         f.write(");\n")
diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index f5b58501db..cfe5fe7b35 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -194,7 +194,8 @@ def is_new_val(regtype, regid, tag):
     return regtype+regid+'N' in semdict[tag]
 
 def need_slot(tag):
-    if ('A_CONDEXEC' in attribdict[tag] or
+    if (('A_CONDEXEC' in attribdict[tag] and
+         'A_JUMP' not in attribdict[tag]) or
         'A_STORE' in attribdict[tag] or
         'A_LOAD' in attribdict[tag]):
         return 1
@@ -207,6 +208,9 @@ def need_part1(tag):
 def need_ea(tag):
     return re.compile(r"\bEA\b").search(semdict[tag])
 
+def need_PC(tag):
+    return 'A_IMPLICIT_READS_PC' in attribdict[tag]
+
 def need_pkt_has_multi_cof(tag):
     return 'A_COF' in attribdict[tag]
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/8] Hexagon (target/hexagon) Remove next_PC from runtime state
  2022-10-24 23:51 [PATCH v2 0/8] Hexagon (target/hexagon) Improve change-of-flow Taylor Simpson
  2022-10-24 23:51 ` [PATCH v2 1/8] Hexagon (target/hexagon) Only use branch_taken when packet has multi cof Taylor Simpson
  2022-10-24 23:51 ` [PATCH v2 2/8] Hexagon (target/hexagon) Remove PC from the runtime state Taylor Simpson
@ 2022-10-24 23:51 ` Taylor Simpson
  2022-10-24 23:51 ` [PATCH v2 4/8] Hexagon (target/hexagon) Add overrides for direct call instructions Taylor Simpson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Taylor Simpson @ 2022-10-24 23:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: tsimpson, richard.henderson, philmd, ale, anjo, bcain,
	quic_mathbern

The imported files don't properly mark all CONDEXEC instructions, so
we add some logic to hex_common.py to add the attribute.

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/cpu.h                |  1 -
 target/hexagon/gen_tcg.h            |  6 ++++++
 target/hexagon/macros.h             |  2 +-
 target/hexagon/translate.h          |  2 +-
 target/hexagon/op_helper.c          |  6 +++---
 target/hexagon/translate.c          | 27 +++++++++++++++++++++------
 target/hexagon/gen_helper_funcs.py  |  4 ++++
 target/hexagon/gen_helper_protos.py |  3 +++
 target/hexagon/gen_tcg_funcs.py     |  3 +++
 target/hexagon/hex_common.py        | 20 ++++++++++++++++++++
 10 files changed, 62 insertions(+), 12 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 2a65a57bab..ff8c26272d 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -78,7 +78,6 @@ typedef struct CPUArchState {
     target_ulong gpr[TOTAL_PER_THREAD_REGS];
     target_ulong pred[NUM_PREGS];
     target_ulong branch_taken;
-    target_ulong next_PC;
 
     /* For comparing with LLDB on target - see adjust_stack_ptrs function */
     target_ulong last_pc_dumped;
diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 7f0ba27eb6..e6fc7d97d2 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -612,6 +612,12 @@
         tcg_temp_free(tmp); \
     } while (0)
 
+#define fGEN_TCG_J2_pause(SHORTCODE) \
+    do { \
+        uiV = uiV; \
+        tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], ctx->next_PC); \
+    } while (0)
+
 /* Floating point */
 #define fGEN_TCG_F2_conv_sf2df(SHORTCODE) \
     gen_helper_conv_sf2df(RddV, cpu_env, RsV)
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 469dfa5571..2fc549c37e 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -400,7 +400,7 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, int shift)
 #endif
 #define fREAD_PC() (PC)
 
-#define fREAD_NPC() (env->next_PC & (0xfffffffe))
+#define fREAD_NPC() (next_PC & (0xfffffffe))
 
 #define fREAD_P0() (READ_PREG(0))
 #define fREAD_P3() (READ_PREG(3))
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index a245172827..eae358cf33 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -27,6 +27,7 @@
 
 typedef struct DisasContext {
     DisasContextBase base;
+    uint32_t next_PC;
     uint32_t mem_idx;
     uint32_t num_packets;
     uint32_t num_insns;
@@ -125,7 +126,6 @@ static inline void ctx_log_qreg_write(DisasContext *ctx,
 
 extern TCGv hex_gpr[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_pred[NUM_PREGS];
-extern TCGv hex_next_PC;
 extern TCGv hex_this_PC;
 extern TCGv hex_slot_cancelled;
 extern TCGv hex_branch_taken;
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 84391e25eb..aad0195eb6 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -119,12 +119,12 @@ static void write_new_pc(CPUHexagonState *env, bool pkt_has_multi_cof,
                           "ignoring the second one\n");
         } else {
             fCHECK_PCALIGN(addr);
-            env->next_PC = addr;
+            env->gpr[HEX_REG_PC] = addr;
             env->branch_taken = 1;
         }
     } else {
         fCHECK_PCALIGN(addr);
-        env->next_PC = addr;
+        env->gpr[HEX_REG_PC] = addr;
     }
 }
 
@@ -299,7 +299,7 @@ void HELPER(debug_commit_end)(CPUHexagonState *env, int has_st0, int has_st1)
         }
     }
 
-    HEX_DEBUG_LOG("Next PC = " TARGET_FMT_lx "\n", env->next_PC);
+    HEX_DEBUG_LOG("Next PC = " TARGET_FMT_lx "\n", env->gpr[HEX_REG_PC]);
     HEX_DEBUG_LOG("Exec counters: pkt = " TARGET_FMT_lx
                   ", insn = " TARGET_FMT_lx
                   ", hvx = " TARGET_FMT_lx "\n",
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index fd4f0efa26..71ad2da682 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -31,7 +31,6 @@
 
 TCGv hex_gpr[TOTAL_PER_THREAD_REGS];
 TCGv hex_pred[NUM_PREGS];
-TCGv hex_next_PC;
 TCGv hex_this_PC;
 TCGv hex_slot_cancelled;
 TCGv hex_branch_taken;
@@ -120,7 +119,6 @@ static void gen_exec_counters(DisasContext *ctx)
 static void gen_end_tb(DisasContext *ctx)
 {
     gen_exec_counters(ctx);
-    tcg_gen_mov_tl(hex_gpr[HEX_REG_PC], hex_next_PC);
     tcg_gen_exit_tb(NULL, 0);
     ctx->base.is_jmp = DISAS_NORETURN;
 }
@@ -128,7 +126,7 @@ static void gen_end_tb(DisasContext *ctx)
 static void gen_exception_end_tb(DisasContext *ctx, int excp)
 {
     gen_exec_counters(ctx);
-    tcg_gen_mov_tl(hex_gpr[HEX_REG_PC], hex_next_PC);
+    tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], ctx->next_PC);
     gen_exception_raw(excp);
     ctx->base.is_jmp = DISAS_NORETURN;
 
@@ -204,12 +202,29 @@ static bool need_pred_written(Packet *pkt)
     return check_for_attrib(pkt, A_WRITES_PRED_REG);
 }
 
+static bool need_next_PC(Packet *pkt)
+{
+    /* Check for conditional control flow or HW loop end */
+    for (int i = 0; i < pkt->num_insns; i++) {
+        uint16_t opcode = pkt->insn[i].opcode;
+        if (GET_ATTRIB(opcode, A_CONDEXEC) && GET_ATTRIB(opcode, A_COF)) {
+            return true;
+        }
+        if (GET_ATTRIB(opcode, A_HWLOOP0_END) ||
+            GET_ATTRIB(opcode, A_HWLOOP1_END)) {
+            return true;
+        }
+    }
+    return false;
+}
+
 static void gen_start_packet(DisasContext *ctx, Packet *pkt)
 {
     target_ulong next_PC = ctx->base.pc_next + pkt->encod_pkt_size_in_bytes;
     int i;
 
     /* Clear out the disassembly context */
+    ctx->next_PC = next_PC;
     ctx->reg_log_idx = 0;
     bitmap_zero(ctx->regs_written, TOTAL_PER_THREAD_REGS);
     ctx->preg_log_idx = 0;
@@ -242,7 +257,9 @@ static void gen_start_packet(DisasContext *ctx, Packet *pkt)
         if (pkt->pkt_has_multi_cof) {
             tcg_gen_movi_tl(hex_branch_taken, 0);
         }
-        tcg_gen_movi_tl(hex_next_PC, next_PC);
+        if (need_next_PC(pkt)) {
+            tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], next_PC);
+        }
     }
     if (need_pred_written(pkt)) {
         tcg_gen_movi_tl(hex_pred_written, 0);
@@ -930,8 +947,6 @@ void hexagon_translate_init(void)
     }
     hex_pred_written = tcg_global_mem_new(cpu_env,
         offsetof(CPUHexagonState, pred_written), "pred_written");
-    hex_next_PC = tcg_global_mem_new(cpu_env,
-        offsetof(CPUHexagonState, next_PC), "next_PC");
     hex_this_PC = tcg_global_mem_new(cpu_env,
         offsetof(CPUHexagonState, this_PC), "this_PC");
     hex_slot_cancelled = tcg_global_mem_new(cpu_env,
diff --git a/target/hexagon/gen_helper_funcs.py b/target/hexagon/gen_helper_funcs.py
index 8ab144b20a..00fc4471e2 100755
--- a/target/hexagon/gen_helper_funcs.py
+++ b/target/hexagon/gen_helper_funcs.py
@@ -245,6 +245,10 @@ def gen_helper_function(f, tag, tagregs, tagimms):
             if i > 0: f.write(", ")
             f.write("target_ulong PC")
             i += 1
+        if hex_common.helper_needs_next_PC(tag):
+            if i > 0: f.write(", ")
+            f.write("target_ulong next_PC")
+            i += 1
         if hex_common.need_slot(tag):
             if i > 0: f.write(", ")
             f.write("uint32_t slot")
diff --git a/target/hexagon/gen_helper_protos.py b/target/hexagon/gen_helper_protos.py
index 2385717dda..ada9302be1 100755
--- a/target/hexagon/gen_helper_protos.py
+++ b/target/hexagon/gen_helper_protos.py
@@ -86,6 +86,7 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
             if hex_common.need_part1(tag): def_helper_size += 1
             if hex_common.need_slot(tag): def_helper_size += 1
             if hex_common.need_PC(tag): def_helper_size += 1
+            if hex_common.helper_needs_next_PC(tag): def_helper_size += 1
             f.write('DEF_HELPER_%s(%s' % (def_helper_size, tag))
             ## The return type is void
             f.write(', void' )
@@ -95,6 +96,7 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
             if hex_common.need_part1(tag): def_helper_size += 1
             if hex_common.need_slot(tag): def_helper_size += 1
             if hex_common.need_PC(tag): def_helper_size += 1
+            if hex_common.helper_needs_next_PC(tag): def_helper_size += 1
             f.write('DEF_HELPER_%s(%s' % (def_helper_size, tag))
 
         ## Generate the qemu DEF_HELPER type for each result
@@ -134,6 +136,7 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
         ## part1 (if needed)
         if hex_common.need_pkt_has_multi_cof(tag): f.write(', i32')
         if hex_common.need_PC(tag): f.write(', i32')
+        if hex_common.helper_needs_next_PC(tag): f.write(', i32')
         if hex_common.need_slot(tag): f.write(', i32' )
         if hex_common.need_part1(tag): f.write(' , i32' )
         f.write(')\n')
diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index 2225bb08da..699dd605fa 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -624,6 +624,8 @@ def gen_tcg_func(f, tag, regs, imms):
             f.write("    TCGv slot = tcg_constant_tl(insn->slot);\n")
         if hex_common.need_PC(tag):
             f.write("    TCGv PC = tcg_constant_tl(pkt->pc);\n")
+        if hex_common.helper_needs_next_PC(tag):
+            f.write("    TCGv next_PC = tcg_constant_tl(ctx->next_PC);\n")
         f.write("    gen_helper_%s(" % (tag))
         i=0
         ## If there is a scalar result, it is the return type
@@ -655,6 +657,7 @@ def gen_tcg_func(f, tag, regs, imms):
         if hex_common.need_pkt_has_multi_cof(tag):
             f.write(", pkt_has_multi_cof")
         if hex_common.need_PC(tag): f.write(", PC")
+        if hex_common.helper_needs_next_PC(tag): f.write(", next_PC")
         if hex_common.need_slot(tag): f.write(", slot")
         if hex_common.need_part1(tag): f.write(", part1" )
         f.write(");\n")
diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index cfe5fe7b35..da8e75fbc7 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -66,6 +66,18 @@ def add_qemu_macro_attrib(name, attrib):
     macros[name].attribs.add(attrib)
 
 immextre = re.compile(r'f(MUST_)?IMMEXT[(]([UuSsRr])')
+
+def is_cond_jump(tag):
+    if tag == 'J2_rte':
+        return False
+    if ('A_HWLOOP0_END' in attribdict[tag] or
+        'A_HWLOOP1_END' in attribdict[tag]):
+        return False
+    return re.compile(r"(if.*fBRANCH)|(if.*fJUMPR)").search(semdict[tag])
+
+def is_cond_call(tag):
+    return re.compile(r"(if.*fCALL)").search(semdict[tag])
+
 def calculate_attribs():
     add_qemu_macro_attrib('fREAD_PC', 'A_IMPLICIT_READS_PC')
     add_qemu_macro_attrib('fTRAP', 'A_IMPLICIT_READS_PC')
@@ -96,6 +108,11 @@ def calculate_attribs():
         for regtype, regid, toss, numregs in regs:
             if regtype == "P" and is_written(regid):
                 attribdict[tag].add('A_WRITES_PRED_REG')
+    # Mark conditional jumps and calls
+    #     Not all instructions are properly marked with A_CONDEXEC
+    for tag in tags:
+        if is_cond_jump(tag) or is_cond_call(tag):
+            attribdict[tag].add('A_CONDEXEC')
 
 def SEMANTICS(tag, beh, sem):
     #print tag,beh,sem
@@ -211,6 +228,9 @@ def need_ea(tag):
 def need_PC(tag):
     return 'A_IMPLICIT_READS_PC' in attribdict[tag]
 
+def helper_needs_next_PC(tag):
+    return 'A_CALL' in attribdict[tag]
+
 def need_pkt_has_multi_cof(tag):
     return 'A_COF' in attribdict[tag]
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 4/8] Hexagon (target/hexagon) Add overrides for direct call instructions
  2022-10-24 23:51 [PATCH v2 0/8] Hexagon (target/hexagon) Improve change-of-flow Taylor Simpson
                   ` (2 preceding siblings ...)
  2022-10-24 23:51 ` [PATCH v2 3/8] Hexagon (target/hexagon) Remove next_PC from " Taylor Simpson
@ 2022-10-24 23:51 ` Taylor Simpson
  2022-10-24 23:51 ` [PATCH v2 5/8] Hexagon (target/hexagon) Add overrides for compound compare and jump Taylor Simpson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Taylor Simpson @ 2022-10-24 23:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: tsimpson, richard.henderson, philmd, ale, anjo, bcain,
	quic_mathbern

Add overrides for
    J2_call
    J2_callt
    J2_callf

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg.h |  8 ++++++
 target/hexagon/genptr.c  | 58 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index e6fc7d97d2..ad149adbe1 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -612,6 +612,14 @@
         tcg_temp_free(tmp); \
     } while (0)
 
+#define fGEN_TCG_J2_call(SHORTCODE) \
+    gen_call(ctx, pkt, riV)
+
+#define fGEN_TCG_J2_callt(SHORTCODE) \
+    gen_cond_call(ctx, pkt, PuV, true, riV)
+#define fGEN_TCG_J2_callf(SHORTCODE) \
+    gen_cond_call(ctx, pkt, PuV, false, riV)
+
 #define fGEN_TCG_J2_pause(SHORTCODE) \
     do { \
         uiV = uiV; \
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 806d0974ff..2784b84041 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -456,6 +456,64 @@ static TCGv gen_8bitsof(TCGv result, TCGv value)
     return result;
 }
 
+static void gen_write_new_pc_addr(DisasContext *ctx, Packet *pkt,
+                                  TCGv addr, TCGv pred)
+{
+    TCGLabel *pred_false = NULL;
+    if (pred != NULL) {
+        pred_false = gen_new_label();
+        tcg_gen_brcondi_tl(TCG_COND_EQ, pred, 0, pred_false);
+    }
+
+    if (pkt->pkt_has_multi_cof) {
+        /* If there are multiple branches in a packet, ignore the second one */
+        tcg_gen_movcond_tl(TCG_COND_NE, hex_gpr[HEX_REG_PC],
+                           hex_branch_taken, tcg_constant_tl(0),
+                           hex_gpr[HEX_REG_PC], addr);
+        tcg_gen_movi_tl(hex_branch_taken, 1);
+    } else {
+        tcg_gen_mov_tl(hex_gpr[HEX_REG_PC], addr);
+    }
+
+    if (pred != NULL) {
+        gen_set_label(pred_false);
+    }
+}
+
+static void gen_write_new_pc_pcrel(DisasContext *ctx, Packet *pkt,
+                                   int pc_off, TCGv pred)
+{
+    target_ulong dest = pkt->pc + pc_off;
+    gen_write_new_pc_addr(ctx, pkt, tcg_constant_tl(dest), pred);
+}
+
+static void gen_call(DisasContext *ctx, Packet *pkt, int pc_off)
+{
+    TCGv next_PC =
+        tcg_constant_tl(pkt->pc + pkt->encod_pkt_size_in_bytes);
+    gen_log_reg_write(HEX_REG_LR, next_PC);
+    gen_write_new_pc_pcrel(ctx, pkt, pc_off, NULL);
+}
+
+static void gen_cond_call(DisasContext *ctx, Packet *pkt,
+                          TCGv pred, bool sense, int pc_off)
+{
+    TCGv next_PC;
+    TCGv lsb = tcg_temp_local_new();
+    TCGLabel *skip = gen_new_label();
+    tcg_gen_andi_tl(lsb, pred, 1);
+    if (!sense) {
+        tcg_gen_xori_tl(lsb, lsb, 1);
+    }
+    gen_write_new_pc_pcrel(ctx, pkt, pc_off, lsb);
+    tcg_gen_brcondi_tl(TCG_COND_EQ, lsb, 0, skip);
+    tcg_temp_free(lsb);
+    next_PC =
+        tcg_constant_tl(pkt->pc + pkt->encod_pkt_size_in_bytes);
+    gen_log_reg_write(HEX_REG_LR, next_PC);
+    gen_set_label(skip);
+}
+
 static intptr_t vreg_src_off(DisasContext *ctx, int num)
 {
     intptr_t offset = offsetof(CPUHexagonState, VRegs[num]);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 5/8] Hexagon (target/hexagon) Add overrides for compound compare and jump
  2022-10-24 23:51 [PATCH v2 0/8] Hexagon (target/hexagon) Improve change-of-flow Taylor Simpson
                   ` (3 preceding siblings ...)
  2022-10-24 23:51 ` [PATCH v2 4/8] Hexagon (target/hexagon) Add overrides for direct call instructions Taylor Simpson
@ 2022-10-24 23:51 ` Taylor Simpson
  2022-10-24 23:51 ` [PATCH v2 6/8] Hexagon (target/hexagon) Add overrides for various forms of jump Taylor Simpson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Taylor Simpson @ 2022-10-24 23:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: tsimpson, richard.henderson, philmd, ale, anjo, bcain,
	quic_mathbern

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg.h | 177 +++++++++++++++++++++++++++++++++++++++
 target/hexagon/genptr.c  |  74 ++++++++++++++++
 2 files changed, 251 insertions(+)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index ad149adbe1..b56b216110 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -620,6 +620,183 @@
 #define fGEN_TCG_J2_callf(SHORTCODE) \
     gen_cond_call(ctx, pkt, PuV, false, riV)
 
+/*
+ * Compound compare and jump instructions
+ * Here is a primer to understand the tag names
+ *
+ * Comparison
+ *      cmpeqi   compare equal to an immediate
+ *      cmpgti   compare greater than an immediate
+ *      cmpgtiu  compare greater than an unsigned immediate
+ *      cmpeqn1  compare equal to negative 1
+ *      cmpgtn1  compare greater than negative 1
+ *      cmpeq    compare equal (two registers)
+ *      cmpgtu   compare greater than unsigned (two registers)
+ *      tstbit0  test bit zero
+ *
+ * Condition
+ *      tp0      p0 is true     p0 = cmp.eq(r0,#5); if (p0.new) jump:nt address
+ *      fp0      p0 is false    p0 = cmp.eq(r0,#5); if (!p0.new) jump:nt address
+ *      tp1      p1 is true     p1 = cmp.eq(r0,#5); if (p1.new) jump:nt address
+ *      fp1      p1 is false    p1 = cmp.eq(r0,#5); if (!p1.new) jump:nt address
+ *
+ * Prediction (not modelled in qemu)
+ *      _nt      not taken
+ *      _t       taken
+ */
+#define fGEN_TCG_J4_cmpeq_tp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 0, TCG_COND_EQ, true, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpeq_tp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 0, TCG_COND_EQ, true, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpeq_fp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 0, TCG_COND_EQ, false, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpeq_fp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 0, TCG_COND_EQ, false, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpeq_tp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 1, TCG_COND_EQ, true, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpeq_tp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 1, TCG_COND_EQ, true, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpeq_fp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 1, TCG_COND_EQ, false, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpeq_fp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 1, TCG_COND_EQ, false, RsV, RtV, riV)
+
+#define fGEN_TCG_J4_cmpgt_tp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 0, TCG_COND_GT, true, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgt_tp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 0, TCG_COND_GT, true, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgt_fp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 0, TCG_COND_GT, false, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgt_fp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 0, TCG_COND_GT, false, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgt_tp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 1, TCG_COND_GT, true, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgt_tp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 1, TCG_COND_GT, true, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgt_fp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 1, TCG_COND_GT, false, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgt_fp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 1, TCG_COND_GT, false, RsV, RtV, riV)
+
+#define fGEN_TCG_J4_cmpgtu_tp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 0, TCG_COND_GTU, true, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgtu_tp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 0, TCG_COND_GTU, true, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgtu_fp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 0, TCG_COND_GTU, false, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgtu_fp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 0, TCG_COND_GTU, false, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgtu_tp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 1, TCG_COND_GTU, true, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgtu_tp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 1, TCG_COND_GTU, true, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgtu_fp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 1, TCG_COND_GTU, false, RsV, RtV, riV)
+#define fGEN_TCG_J4_cmpgtu_fp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, 1, TCG_COND_GTU, false, RsV, RtV, riV)
+
+#define fGEN_TCG_J4_cmpeqi_tp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 0, TCG_COND_EQ, true, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpeqi_tp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 0, TCG_COND_EQ, true, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpeqi_fp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 0, TCG_COND_EQ, false, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpeqi_fp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 0, TCG_COND_EQ, false, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpeqi_tp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 1, TCG_COND_EQ, true, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpeqi_tp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 1, TCG_COND_EQ, true, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpeqi_fp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 1, TCG_COND_EQ, false, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpeqi_fp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 1, TCG_COND_EQ, false, RsV, UiV, riV)
+
+#define fGEN_TCG_J4_cmpgti_tp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 0, TCG_COND_GT, true, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgti_tp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 0, TCG_COND_GT, true, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgti_fp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 0, TCG_COND_GT, false, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgti_fp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 0, TCG_COND_GT, false, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgti_tp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 1, TCG_COND_GT, true, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgti_tp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 1, TCG_COND_GT, true, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgti_fp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 1, TCG_COND_GT, false, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgti_fp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 1, TCG_COND_GT, false, RsV, UiV, riV)
+
+#define fGEN_TCG_J4_cmpgtui_tp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 0, TCG_COND_GTU, true, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgtui_tp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 0, TCG_COND_GTU, true, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgtui_fp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 0, TCG_COND_GTU, false, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgtui_fp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 0, TCG_COND_GTU, false, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgtui_tp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 1, TCG_COND_GTU, true, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgtui_tp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 1, TCG_COND_GTU, true, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgtui_fp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 1, TCG_COND_GTU, false, RsV, UiV, riV)
+#define fGEN_TCG_J4_cmpgtui_fp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, 1, TCG_COND_GTU, false, RsV, UiV, riV)
+
+#define fGEN_TCG_J4_cmpeqn1_tp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 0, TCG_COND_EQ, true, RsV, riV)
+#define fGEN_TCG_J4_cmpeqn1_tp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 0, TCG_COND_EQ, true, RsV, riV)
+#define fGEN_TCG_J4_cmpeqn1_fp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 0, TCG_COND_EQ, false, RsV, riV)
+#define fGEN_TCG_J4_cmpeqn1_fp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 0, TCG_COND_EQ, false, RsV, riV)
+#define fGEN_TCG_J4_cmpeqn1_tp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 1, TCG_COND_EQ, true, RsV, riV)
+#define fGEN_TCG_J4_cmpeqn1_tp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 1, TCG_COND_EQ, true, RsV, riV)
+#define fGEN_TCG_J4_cmpeqn1_fp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 1, TCG_COND_EQ, false, RsV, riV)
+#define fGEN_TCG_J4_cmpeqn1_fp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 1, TCG_COND_EQ, false, RsV, riV)
+
+#define fGEN_TCG_J4_cmpgtn1_tp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 0, TCG_COND_GT, true, RsV, riV)
+#define fGEN_TCG_J4_cmpgtn1_tp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 0, TCG_COND_GT, true, RsV, riV)
+#define fGEN_TCG_J4_cmpgtn1_fp0_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 0, TCG_COND_GT, false, RsV, riV)
+#define fGEN_TCG_J4_cmpgtn1_fp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 0, TCG_COND_GT, false, RsV, riV)
+#define fGEN_TCG_J4_cmpgtn1_tp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 1, TCG_COND_GT, true, RsV, riV)
+#define fGEN_TCG_J4_cmpgtn1_tp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 1, TCG_COND_GT, true, RsV, riV)
+#define fGEN_TCG_J4_cmpgtn1_fp1_jump_t(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 1, TCG_COND_GT, false, RsV, riV)
+#define fGEN_TCG_J4_cmpgtn1_fp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(ctx, pkt, insn, 1, TCG_COND_GT, false, RsV, riV)
+
+#define fGEN_TCG_J4_tstbit0_tp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_tstbit0_jmp(ctx, pkt, insn, 0, true, RsV, riV)
+#define fGEN_TCG_J4_tstbit0_tp0_jump_t(SHORTCODE) \
+    gen_cmpnd_tstbit0_jmp(ctx, pkt, insn, 0, true, RsV, riV)
+#define fGEN_TCG_J4_tstbit0_fp0_jump_nt(SHORTCODE) \
+    gen_cmpnd_tstbit0_jmp(ctx, pkt, insn, 0, false, RsV, riV)
+#define fGEN_TCG_J4_tstbit0_fp0_jump_t(SHORTCODE) \
+    gen_cmpnd_tstbit0_jmp(ctx, pkt, insn, 0, false, RsV, riV)
+#define fGEN_TCG_J4_tstbit0_tp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_tstbit0_jmp(ctx, pkt, insn, 1, true, RsV, riV)
+#define fGEN_TCG_J4_tstbit0_tp1_jump_t(SHORTCODE) \
+    gen_cmpnd_tstbit0_jmp(ctx, pkt, insn, 1, true, RsV, riV)
+#define fGEN_TCG_J4_tstbit0_fp1_jump_nt(SHORTCODE) \
+    gen_cmpnd_tstbit0_jmp(ctx, pkt, insn, 1, false, RsV, riV)
+#define fGEN_TCG_J4_tstbit0_fp1_jump_t(SHORTCODE) \
+    gen_cmpnd_tstbit0_jmp(ctx, pkt, insn, 1, false, RsV, riV)
+
 #define fGEN_TCG_J2_pause(SHORTCODE) \
     do { \
         uiV = uiV; \
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 2784b84041..db8d771054 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -487,6 +487,80 @@ static void gen_write_new_pc_pcrel(DisasContext *ctx, Packet *pkt,
     gen_write_new_pc_addr(ctx, pkt, tcg_constant_tl(dest), pred);
 }
 
+static void gen_compare(TCGCond cond, TCGv res, TCGv arg1, TCGv arg2)
+{
+    TCGv one = tcg_constant_tl(0xff);
+    TCGv zero = tcg_constant_tl(0);
+
+    tcg_gen_movcond_tl(cond, res, arg1, arg2, one, zero);
+}
+
+static void gen_cond_jump(DisasContext *ctx, Packet *pkt, TCGv pred, int pc_off)
+{
+    gen_write_new_pc_pcrel(ctx, pkt, pc_off, pred);
+}
+
+static void gen_cmpnd_cmp_jmp(DisasContext *ctx, Packet *pkt, Insn *insn,
+                              int pnum, TCGCond cond,
+                              bool sense, TCGv arg1, TCGv arg2,
+                              int pc_off)
+{
+    if (insn->part1) {
+        TCGv pred = tcg_temp_new();
+        gen_compare(cond, pred, arg1, arg2);
+        gen_log_pred_write(ctx, pnum, pred);
+        tcg_temp_free(pred);
+    } else {
+        TCGv pred = tcg_temp_new();
+
+        tcg_gen_mov_tl(pred, hex_new_pred_value[pnum]);
+        if (!sense) {
+            tcg_gen_xori_tl(pred, pred, 0xff);
+        }
+
+        gen_cond_jump(ctx, pkt, pred, pc_off);
+
+        tcg_temp_free(pred);
+    }
+}
+
+static void gen_cmpnd_cmpi_jmp(DisasContext *ctx, Packet *pkt, Insn *insn,
+                               int pnum, TCGCond cond,
+                               bool sense, TCGv arg1, int arg2,
+                               int pc_off)
+{
+    TCGv tmp = tcg_constant_tl(arg2);
+    gen_cmpnd_cmp_jmp(ctx, pkt, insn, pnum, cond, sense, arg1, tmp, pc_off);
+
+}
+
+static void gen_cmpnd_cmp_n1_jmp(DisasContext *ctx, Packet *pkt, Insn *insn,
+                                 int pnum, TCGCond cond,
+                                 bool sense, TCGv arg, int pc_off)
+{
+    gen_cmpnd_cmpi_jmp(ctx, pkt, insn, pnum, cond, sense, arg, -1, pc_off);
+}
+
+static void gen_cmpnd_tstbit0_jmp(DisasContext *ctx, Packet *pkt, Insn *insn,
+                                  int pnum, bool sense, TCGv arg, int pc_off)
+{
+    if (insn->part1) {
+        TCGv pred = tcg_temp_new();
+        tcg_gen_andi_tl(pred, arg, 1);
+        gen_8bitsof(pred, pred);
+        gen_log_pred_write(ctx, pnum, pred);
+        tcg_temp_free(pred);
+    } else {
+        TCGv pred = tcg_temp_new();
+        tcg_gen_mov_tl(pred, hex_new_pred_value[pnum]);
+        if (!sense) {
+            tcg_gen_xori_tl(pred, pred, 0xff);
+        }
+        gen_cond_jump(ctx, pkt, pred, pc_off);
+        tcg_temp_free(pred);
+    }
+}
+
 static void gen_call(DisasContext *ctx, Packet *pkt, int pc_off)
 {
     TCGv next_PC =
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 6/8] Hexagon (target/hexagon) Add overrides for various forms of jump
  2022-10-24 23:51 [PATCH v2 0/8] Hexagon (target/hexagon) Improve change-of-flow Taylor Simpson
                   ` (4 preceding siblings ...)
  2022-10-24 23:51 ` [PATCH v2 5/8] Hexagon (target/hexagon) Add overrides for compound compare and jump Taylor Simpson
@ 2022-10-24 23:51 ` Taylor Simpson
  2022-10-24 23:51 ` [PATCH v2 7/8] Hexagon (target/hexagon) Use direct block chaining for direct jump/branch Taylor Simpson
  2022-10-24 23:51 ` [PATCH v2 8/8] Hexagon (target/hexagon) Use direct block chaining for tight loops Taylor Simpson
  7 siblings, 0 replies; 9+ messages in thread
From: Taylor Simpson @ 2022-10-24 23:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: tsimpson, richard.henderson, philmd, ale, anjo, bcain,
	quic_mathbern

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg.h | 189 +++++++++++++++++++++++++++++++++++++++
 target/hexagon/genptr.c  |  46 ++++++++++
 2 files changed, 235 insertions(+)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index b56b216110..216862352c 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -797,6 +797,195 @@
 #define fGEN_TCG_J4_tstbit0_fp1_jump_t(SHORTCODE) \
     gen_cmpnd_tstbit0_jmp(ctx, pkt, insn, 1, false, RsV, riV)
 
+#define fGEN_TCG_J2_jump(SHORTCODE) \
+    gen_jump(ctx, pkt, riV)
+#define fGEN_TCG_J2_jumpr(SHORTCODE) \
+    gen_jumpr(ctx, pkt, RsV)
+#define fGEN_TCG_J4_jumpseti(SHORTCODE) \
+    do { \
+        tcg_gen_movi_tl(RdV, UiV); \
+        gen_jump(ctx, pkt, riV); \
+    } while (0)
+
+#define fGEN_TCG_cond_jump(COND) \
+    do { \
+        TCGv LSB = tcg_temp_new(); \
+        COND; \
+        gen_cond_jump(ctx, pkt, LSB, riV); \
+        tcg_temp_free(LSB); \
+    } while (0)
+
+#define fGEN_TCG_J2_jumpt(SHORTCODE) \
+    fGEN_TCG_cond_jump(fLSBOLD(PuV))
+#define fGEN_TCG_J2_jumptpt(SHORTCODE) \
+    fGEN_TCG_cond_jump(fLSBOLD(PuV))
+#define fGEN_TCG_J2_jumpf(SHORTCODE) \
+    fGEN_TCG_cond_jump(fLSBOLDNOT(PuV))
+#define fGEN_TCG_J2_jumpfpt(SHORTCODE) \
+    fGEN_TCG_cond_jump(fLSBOLDNOT(PuV))
+#define fGEN_TCG_J2_jumptnew(SHORTCODE) \
+    gen_cond_jump(ctx, pkt, PuN, riV)
+#define fGEN_TCG_J2_jumptnewpt(SHORTCODE) \
+    gen_cond_jump(ctx, pkt, PuN, riV)
+#define fGEN_TCG_J2_jumpfnewpt(SHORTCODE) \
+    fGEN_TCG_cond_jump(fLSBNEWNOT(PuN))
+#define fGEN_TCG_J2_jumpfnew(SHORTCODE) \
+    fGEN_TCG_cond_jump(fLSBNEWNOT(PuN))
+#define fGEN_TCG_J2_jumprz(SHORTCODE) \
+    fGEN_TCG_cond_jump(tcg_gen_setcondi_tl(TCG_COND_NE, LSB, RsV, 0))
+#define fGEN_TCG_J2_jumprzpt(SHORTCODE) \
+    fGEN_TCG_cond_jump(tcg_gen_setcondi_tl(TCG_COND_NE, LSB, RsV, 0))
+#define fGEN_TCG_J2_jumprnz(SHORTCODE) \
+    fGEN_TCG_cond_jump(tcg_gen_setcondi_tl(TCG_COND_EQ, LSB, RsV, 0))
+#define fGEN_TCG_J2_jumprnzpt(SHORTCODE) \
+    fGEN_TCG_cond_jump(tcg_gen_setcondi_tl(TCG_COND_EQ, LSB, RsV, 0))
+#define fGEN_TCG_J2_jumprgtez(SHORTCODE) \
+    fGEN_TCG_cond_jump(tcg_gen_setcondi_tl(TCG_COND_GE, LSB, RsV, 0))
+#define fGEN_TCG_J2_jumprgtezpt(SHORTCODE) \
+    fGEN_TCG_cond_jump(tcg_gen_setcondi_tl(TCG_COND_GE, LSB, RsV, 0))
+#define fGEN_TCG_J2_jumprltez(SHORTCODE) \
+    fGEN_TCG_cond_jump(tcg_gen_setcondi_tl(TCG_COND_LE, LSB, RsV, 0))
+#define fGEN_TCG_J2_jumprltezpt(SHORTCODE) \
+    fGEN_TCG_cond_jump(tcg_gen_setcondi_tl(TCG_COND_LE, LSB, RsV, 0))
+
+#define fGEN_TCG_cond_jumpr(COND) \
+    do { \
+        TCGv LSB = tcg_temp_new(); \
+        COND; \
+        gen_cond_jumpr(ctx, pkt, LSB, RsV); \
+        tcg_temp_free(LSB); \
+    } while (0)
+
+#define fGEN_TCG_J2_jumprt(SHORTCODE) \
+    fGEN_TCG_cond_jumpr(fLSBOLD(PuV))
+#define fGEN_TCG_J2_jumprtpt(SHORTCODE) \
+    fGEN_TCG_cond_jumpr(fLSBOLD(PuV))
+#define fGEN_TCG_J2_jumprf(SHORTCODE) \
+    fGEN_TCG_cond_jumpr(fLSBOLDNOT(PuV))
+#define fGEN_TCG_J2_jumprfpt(SHORTCODE) \
+    fGEN_TCG_cond_jumpr(fLSBOLDNOT(PuV))
+#define fGEN_TCG_J2_jumprtnew(SHORTCODE) \
+    fGEN_TCG_cond_jumpr(fLSBNEW(PuN))
+#define fGEN_TCG_J2_jumprtnewpt(SHORTCODE) \
+    fGEN_TCG_cond_jumpr(fLSBNEW(PuN))
+#define fGEN_TCG_J2_jumprfnew(SHORTCODE) \
+    fGEN_TCG_cond_jumpr(fLSBNEWNOT(PuN))
+#define fGEN_TCG_J2_jumprfnewpt(SHORTCODE) \
+    fGEN_TCG_cond_jumpr(fLSBNEWNOT(PuN))
+#define fGEN_TCG_J2_jumprfnewpt(SHORTCODE) \
+    fGEN_TCG_cond_jumpr(fLSBNEWNOT(PuN))
+
+/*
+ * New value compare & jump instructions
+ * if ([!]COND(r0.new, r1) jump:t address
+ * if ([!]COND(r0.new, #7) jump:t address
+ */
+#define fGEN_TCG_J4_cmpgt_t_jumpnv_t(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_GT, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmpgt_t_jumpnv_nt(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_GT, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmpgt_f_jumpnv_t(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_LE, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmpgt_f_jumpnv_nt(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_LE, NsN, RtV, riV)
+
+#define fGEN_TCG_J4_cmpeq_t_jumpnv_t(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_EQ, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmpeq_t_jumpnv_nt(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_EQ, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmpeq_f_jumpnv_t(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_NE, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmpeq_f_jumpnv_nt(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_NE, NsN, RtV, riV)
+
+#define fGEN_TCG_J4_cmplt_t_jumpnv_t(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_LT, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmplt_t_jumpnv_nt(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_LT, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmplt_f_jumpnv_t(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_GE, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmplt_f_jumpnv_nt(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_GE, NsN, RtV, riV)
+
+#define fGEN_TCG_J4_cmpeqi_t_jumpnv_t(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_EQ, NsN, UiV, riV)
+#define fGEN_TCG_J4_cmpeqi_t_jumpnv_nt(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_EQ, NsN, UiV, riV)
+#define fGEN_TCG_J4_cmpeqi_f_jumpnv_t(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_NE, NsN, UiV, riV)
+#define fGEN_TCG_J4_cmpeqi_f_jumpnv_nt(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_NE, NsN, UiV, riV)
+
+#define fGEN_TCG_J4_cmpgti_t_jumpnv_t(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_GT, NsN, UiV, riV)
+#define fGEN_TCG_J4_cmpgti_t_jumpnv_nt(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_GT, NsN, UiV, riV)
+#define fGEN_TCG_J4_cmpgti_f_jumpnv_t(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_LE, NsN, UiV, riV)
+#define fGEN_TCG_J4_cmpgti_f_jumpnv_nt(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_LE, NsN, UiV, riV)
+
+#define fGEN_TCG_J4_cmpltu_t_jumpnv_t(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_LTU, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmpltu_t_jumpnv_nt(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_LTU, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmpltu_f_jumpnv_t(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_GEU, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmpltu_f_jumpnv_nt(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_GEU, NsN, RtV, riV)
+
+#define fGEN_TCG_J4_cmpgtui_t_jumpnv_t(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_GTU, NsN, UiV, riV)
+#define fGEN_TCG_J4_cmpgtui_t_jumpnv_nt(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_GTU, NsN, UiV, riV)
+#define fGEN_TCG_J4_cmpgtui_f_jumpnv_t(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_LEU, NsN, UiV, riV)
+#define fGEN_TCG_J4_cmpgtui_f_jumpnv_nt(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_LEU, NsN, UiV, riV)
+
+#define fGEN_TCG_J4_cmpgtu_t_jumpnv_t(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_GTU, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmpgtu_t_jumpnv_nt(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_GTU, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmpgtu_f_jumpnv_t(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_LEU, NsN, RtV, riV)
+#define fGEN_TCG_J4_cmpgtu_f_jumpnv_nt(SHORTCODE) \
+    gen_cmp_jumpnv(ctx, pkt, TCG_COND_LEU, NsN, RtV, riV)
+
+#define fGEN_TCG_J4_cmpeqn1_t_jumpnv_t(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_EQ, NsN, -1, riV)
+#define fGEN_TCG_J4_cmpeqn1_t_jumpnv_nt(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_EQ, NsN, -1, riV)
+#define fGEN_TCG_J4_cmpeqn1_f_jumpnv_t(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_NE, NsN, -1, riV)
+#define fGEN_TCG_J4_cmpeqn1_f_jumpnv_nt(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_NE, NsN, -1, riV)
+
+#define fGEN_TCG_J4_cmpgtn1_t_jumpnv_t(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_GT, NsN, -1, riV)
+#define fGEN_TCG_J4_cmpgtn1_t_jumpnv_nt(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_GT, NsN, -1, riV)
+#define fGEN_TCG_J4_cmpgtn1_f_jumpnv_t(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_LE, NsN, -1, riV)
+#define fGEN_TCG_J4_cmpgtn1_f_jumpnv_nt(SHORTCODE) \
+    gen_cmpi_jumpnv(ctx, pkt, TCG_COND_LE, NsN, -1, riV)
+
+#define fGEN_TCG_J4_tstbit0_t_jumpnv_t(SHORTCODE) \
+    gen_testbit0_jumpnv(ctx, pkt, true, NsN, riV)
+#define fGEN_TCG_J4_tstbit0_t_jumpnv_nt(SHORTCODE) \
+    gen_testbit0_jumpnv(ctx, pkt, true, NsN, riV)
+#define fGEN_TCG_J4_tstbit0_f_jumpnv_t(SHORTCODE) \
+    gen_testbit0_jumpnv(ctx, pkt, false, NsN, riV)
+#define fGEN_TCG_J4_tstbit0_f_jumpnv_nt(SHORTCODE) \
+    gen_testbit0_jumpnv(ctx, pkt, false, NsN, riV)
+
+/* r0 = r1 ; jump address */
+#define fGEN_TCG_J4_jumpsetr(SHORTCODE) \
+    do { \
+        tcg_gen_mov_tl(RdV, RsV); \
+        gen_jump(ctx, pkt, riV); \
+    } while (0)
+
 #define fGEN_TCG_J2_pause(SHORTCODE) \
     do { \
         uiV = uiV; \
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index db8d771054..437250c0f9 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -495,6 +495,12 @@ static void gen_compare(TCGCond cond, TCGv res, TCGv arg1, TCGv arg2)
     tcg_gen_movcond_tl(cond, res, arg1, arg2, one, zero);
 }
 
+static void gen_cond_jumpr(DisasContext *ctx, Packet *pkt,
+                           TCGv pred, TCGv dst_pc)
+{
+    gen_write_new_pc_addr(ctx, pkt, dst_pc, pred);
+}
+
 static void gen_cond_jump(DisasContext *ctx, Packet *pkt, TCGv pred, int pc_off)
 {
     gen_write_new_pc_pcrel(ctx, pkt, pc_off, pred);
@@ -561,6 +567,28 @@ static void gen_cmpnd_tstbit0_jmp(DisasContext *ctx, Packet *pkt, Insn *insn,
     }
 }
 
+static void gen_testbit0_jumpnv(DisasContext *ctx, Packet *pkt,
+                                bool sense, TCGv arg, int pc_off)
+{
+    TCGv pred = tcg_temp_new();
+    tcg_gen_andi_tl(pred, arg, 1);
+    if (!sense) {
+        tcg_gen_xori_tl(pred, pred, 1);
+    }
+    gen_cond_jump(ctx, pkt, pred, pc_off);
+    tcg_temp_free(pred);
+}
+
+static void gen_jump(DisasContext *ctx, Packet *pkt, int pc_off)
+{
+    gen_write_new_pc_pcrel(ctx, pkt, pc_off, NULL);
+}
+
+static void gen_jumpr(DisasContext *ctx, Packet *pkt, TCGv new_pc)
+{
+    gen_write_new_pc_addr(ctx, pkt, new_pc, NULL);
+}
+
 static void gen_call(DisasContext *ctx, Packet *pkt, int pc_off)
 {
     TCGv next_PC =
@@ -588,6 +616,24 @@ static void gen_cond_call(DisasContext *ctx, Packet *pkt,
     gen_set_label(skip);
 }
 
+static void gen_cmp_jumpnv(DisasContext *ctx, Packet *pkt,
+                           TCGCond cond, TCGv val, TCGv src, int pc_off)
+{
+    TCGv pred = tcg_temp_new();
+    tcg_gen_setcond_tl(cond, pred, val, src);
+    gen_cond_jump(ctx, pkt, pred, pc_off);
+    tcg_temp_free(pred);
+}
+
+static void gen_cmpi_jumpnv(DisasContext *ctx, Packet *pkt,
+                            TCGCond cond, TCGv val, int src, int pc_off)
+{
+    TCGv pred = tcg_temp_new();
+    tcg_gen_setcondi_tl(cond, pred, val, src);
+    gen_cond_jump(ctx, pkt, pred, pc_off);
+    tcg_temp_free(pred);
+}
+
 static intptr_t vreg_src_off(DisasContext *ctx, int num)
 {
     intptr_t offset = offsetof(CPUHexagonState, VRegs[num]);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 7/8] Hexagon (target/hexagon) Use direct block chaining for direct jump/branch
  2022-10-24 23:51 [PATCH v2 0/8] Hexagon (target/hexagon) Improve change-of-flow Taylor Simpson
                   ` (5 preceding siblings ...)
  2022-10-24 23:51 ` [PATCH v2 6/8] Hexagon (target/hexagon) Add overrides for various forms of jump Taylor Simpson
@ 2022-10-24 23:51 ` Taylor Simpson
  2022-10-24 23:51 ` [PATCH v2 8/8] Hexagon (target/hexagon) Use direct block chaining for tight loops Taylor Simpson
  7 siblings, 0 replies; 9+ messages in thread
From: Taylor Simpson @ 2022-10-24 23:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: tsimpson, richard.henderson, philmd, ale, anjo, bcain,
	quic_mathbern

Direct block chaining is documented here
https://qemu.readthedocs.io/en/latest/devel/tcg.html#direct-block-chaining

Recall that Hexagon allows packets with multiple jumps where only the first
one with a true predicate will actually jump.  So, we can only use direct
block chaining when the packet contains a single PC-relative jump.  We add
the following to DisasContext in order to perform direct block chaining at
the end of packet commit (in gen_end_tb)
    has_single_direct_branch
        Indicates that we can use direct block chaining
    branch_cond
        The condition under which the branch is taken
    branch_dest
        The destination of the branch

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/translate.h |  3 +++
 target/hexagon/genptr.c    | 13 ++++++++++++-
 target/hexagon/translate.c | 39 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index eae358cf33..e60dbf0e7a 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -54,6 +54,9 @@ typedef struct DisasContext {
     bool qreg_is_predicated[NUM_QREGS];
     int qreg_log_idx;
     bool pre_commit;
+    bool has_single_direct_branch;
+    TCGv branch_cond;
+    target_ulong branch_dest;
 } DisasContext;
 
 static inline void ctx_log_reg_write(DisasContext *ctx, int rnum)
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 437250c0f9..c75a6aae84 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -484,7 +484,18 @@ static void gen_write_new_pc_pcrel(DisasContext *ctx, Packet *pkt,
                                    int pc_off, TCGv pred)
 {
     target_ulong dest = pkt->pc + pc_off;
-    gen_write_new_pc_addr(ctx, pkt, tcg_constant_tl(dest), pred);
+    if (pkt->pkt_has_multi_cof) {
+        gen_write_new_pc_addr(ctx, pkt, tcg_constant_tl(dest), pred);
+    } else {
+        /* Defer this jump to the end of the TB */
+        g_assert(ctx->branch_cond == NULL);
+        ctx->has_single_direct_branch = true;
+        if (pred != NULL) {
+            ctx->branch_cond = tcg_temp_local_new();
+            tcg_gen_mov_tl(ctx->branch_cond, pred);
+        }
+        ctx->branch_dest = dest;
+    }
 }
 
 static void gen_compare(TCGCond cond, TCGv res, TCGv arg1, TCGv arg2)
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 71ad2da682..29e2caaf0f 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -116,10 +116,44 @@ static void gen_exec_counters(DisasContext *ctx)
                     hex_gpr[HEX_REG_QEMU_HVX_CNT], ctx->num_hvx_insns);
 }
 
+static bool use_goto_tb(DisasContext *ctx, target_ulong dest)
+{
+    return translator_use_goto_tb(&ctx->base, dest);
+}
+
+static void gen_goto_tb(DisasContext *ctx, int idx, target_ulong dest)
+{
+    if (use_goto_tb(ctx, dest)) {
+        tcg_gen_goto_tb(idx);
+        tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], dest);
+        tcg_gen_exit_tb(ctx->base.tb, idx);
+    } else {
+        tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], dest);
+        tcg_gen_lookup_and_goto_ptr();
+    }
+}
+
 static void gen_end_tb(DisasContext *ctx)
 {
     gen_exec_counters(ctx);
-    tcg_gen_exit_tb(NULL, 0);
+
+    if (ctx->has_single_direct_branch) {
+        if (ctx->branch_cond != NULL) {
+            TCGLabel *skip = gen_new_label();
+            tcg_gen_brcondi_tl(TCG_COND_EQ, ctx->branch_cond, 0, skip);
+            gen_goto_tb(ctx, 0, ctx->branch_dest);
+            gen_set_label(skip);
+            gen_goto_tb(ctx, 1, ctx->next_PC);
+            tcg_temp_free(ctx->branch_cond);
+            ctx->branch_cond = NULL;
+        } else {
+            gen_goto_tb(ctx, 0, ctx->branch_dest);
+        }
+    } else {
+        tcg_gen_lookup_and_goto_ptr();
+    }
+
+    g_assert(ctx->branch_cond == NULL);
     ctx->base.is_jmp = DISAS_NORETURN;
 }
 
@@ -803,6 +837,9 @@ static void hexagon_tr_init_disas_context(DisasContextBase *dcbase,
 
 static void hexagon_tr_tb_start(DisasContextBase *db, CPUState *cpu)
 {
+    DisasContext *ctx = container_of(db, DisasContext, base);
+    ctx->has_single_direct_branch = false;
+    ctx->branch_cond = NULL;
 }
 
 static void hexagon_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 8/8] Hexagon (target/hexagon) Use direct block chaining for tight loops
  2022-10-24 23:51 [PATCH v2 0/8] Hexagon (target/hexagon) Improve change-of-flow Taylor Simpson
                   ` (6 preceding siblings ...)
  2022-10-24 23:51 ` [PATCH v2 7/8] Hexagon (target/hexagon) Use direct block chaining for direct jump/branch Taylor Simpson
@ 2022-10-24 23:51 ` Taylor Simpson
  7 siblings, 0 replies; 9+ messages in thread
From: Taylor Simpson @ 2022-10-24 23:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: tsimpson, richard.henderson, philmd, ale, anjo, bcain,
	quic_mathbern

Direct block chaining is documented here
https://qemu.readthedocs.io/en/latest/devel/tcg.html#direct-block-chaining

Hexagon inner loops end with the endloop0 instruction
To go back to the beginning of the loop, this instructions writes to PC
from register SA0 (start address 0).  To use direct block chaining, we
have to assign PC with a constant value.  So, we specialize the code
generation when the start of the translation block is equal to SA0.

When this is the case, we defer the compare/branch from endloop0 to
gen_end_tb.  When this is done, we can assign the start address of the TB
to PC.

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/cpu.h       | 17 ++++++---
 target/hexagon/gen_tcg.h   |  3 ++
 target/hexagon/translate.h |  1 +
 target/hexagon/genptr.c    | 71 ++++++++++++++++++++++++++++++++++++++
 target/hexagon/translate.c | 41 +++++++++++++++++++---
 5 files changed, 124 insertions(+), 9 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index ff8c26272d..5260e0f127 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -152,16 +152,23 @@ struct ArchCPU {
 
 #include "cpu_bits.h"
 
+typedef union {
+    uint32_t i;
+    struct {
+        bool is_tight_loop:1;
+    };
+} HexStateFlags;
+
 static inline void cpu_get_tb_cpu_state(CPUHexagonState *env, target_ulong *pc,
                                         target_ulong *cs_base, uint32_t *flags)
 {
+    HexStateFlags hex_flags = { 0 };
     *pc = env->gpr[HEX_REG_PC];
     *cs_base = 0;
-#ifdef CONFIG_USER_ONLY
-    *flags = 0;
-#else
-#error System mode not supported on Hexagon yet
-#endif
+    if (*pc == env->gpr[HEX_REG_SA0]) {
+        hex_flags.is_tight_loop = true;
+    }
+    *flags = hex_flags.i;
 }
 
 static inline int cpu_mmu_index(CPUHexagonState *env, bool ifetch)
diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 216862352c..552258064b 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -620,6 +620,9 @@
 #define fGEN_TCG_J2_callf(SHORTCODE) \
     gen_cond_call(ctx, pkt, PuV, false, riV)
 
+#define fGEN_TCG_J2_endloop0(SHORTCODE) \
+    gen_endloop0(ctx, pkt)
+
 /*
  * Compound compare and jump instructions
  * Here is a primer to understand the tag names
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index e60dbf0e7a..34abe86b5c 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -57,6 +57,7 @@ typedef struct DisasContext {
     bool has_single_direct_branch;
     TCGv branch_cond;
     target_ulong branch_dest;
+    bool is_tight_loop;
 } DisasContext;
 
 static inline void ctx_log_reg_write(DisasContext *ctx, int rnum)
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index c75a6aae84..188be88a95 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -498,6 +498,20 @@ static void gen_write_new_pc_pcrel(DisasContext *ctx, Packet *pkt,
     }
 }
 
+static void gen_set_usr_field(int field, TCGv val)
+{
+    tcg_gen_deposit_tl(hex_new_value[HEX_REG_USR], hex_new_value[HEX_REG_USR],
+                       val,
+                       reg_field_info[field].offset,
+                       reg_field_info[field].width);
+}
+
+static void gen_set_usr_fieldi(int field, int x)
+{
+    TCGv val = tcg_constant_tl(x);
+    gen_set_usr_field(field, val);
+}
+
 static void gen_compare(TCGCond cond, TCGv res, TCGv arg1, TCGv arg2)
 {
     TCGv one = tcg_constant_tl(0xff);
@@ -627,6 +641,63 @@ static void gen_cond_call(DisasContext *ctx, Packet *pkt,
     gen_set_label(skip);
 }
 
+static void gen_endloop0(DisasContext *ctx, Packet *pkt)
+{
+    TCGv lpcfg = tcg_temp_local_new();
+
+    GET_USR_FIELD(USR_LPCFG, lpcfg);
+
+    /*
+     *    if (lpcfg == 1) {
+     *        hex_new_pred_value[3] = 0xff;
+     *        hex_pred_written |= 1 << 3;
+     *    }
+     */
+    TCGLabel *label1 = gen_new_label();
+    tcg_gen_brcondi_tl(TCG_COND_NE, lpcfg, 1, label1);
+    {
+        tcg_gen_movi_tl(hex_new_pred_value[3], 0xff);
+        tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << 3);
+    }
+    gen_set_label(label1);
+
+    /*
+     *    if (lpcfg) {
+     *        SET_USR_FIELD(USR_LPCFG, lpcfg - 1);
+     *    }
+     */
+    TCGLabel *label2 = gen_new_label();
+    tcg_gen_brcondi_tl(TCG_COND_EQ, lpcfg, 0, label2);
+    {
+        tcg_gen_subi_tl(lpcfg, lpcfg, 1);
+        SET_USR_FIELD(USR_LPCFG, lpcfg);
+    }
+    gen_set_label(label2);
+
+    /*
+     * If we're in a tight loop, we'll do this at the end of the TB to take
+     * advantage of direct block chaining.
+     */
+    if (!ctx->is_tight_loop) {
+        /*
+         *    if (hex_gpr[HEX_REG_LC0] > 1) {
+         *        PC = hex_gpr[HEX_REG_SA0];
+         *        hex_new_value[HEX_REG_LC0] = hex_gpr[HEX_REG_LC0] - 1;
+         *    }
+         */
+        TCGLabel *label3 = gen_new_label();
+        tcg_gen_brcondi_tl(TCG_COND_LEU, hex_gpr[HEX_REG_LC0], 1, label3);
+        {
+            gen_jumpr(ctx, pkt, hex_gpr[HEX_REG_SA0]);
+            tcg_gen_subi_tl(hex_new_value[HEX_REG_LC0],
+                            hex_gpr[HEX_REG_LC0], 1);
+        }
+        gen_set_label(label3);
+    }
+
+    tcg_temp_free(lpcfg);
+}
+
 static void gen_cmp_jumpnv(DisasContext *ctx, Packet *pkt,
                            TCGCond cond, TCGv val, TCGv src, int pc_off)
 {
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 29e2caaf0f..18eb27c651 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -133,7 +133,7 @@ static void gen_goto_tb(DisasContext *ctx, int idx, target_ulong dest)
     }
 }
 
-static void gen_end_tb(DisasContext *ctx)
+static void gen_end_tb(DisasContext *ctx, Packet *pkt)
 {
     gen_exec_counters(ctx);
 
@@ -149,6 +149,18 @@ static void gen_end_tb(DisasContext *ctx)
         } else {
             gen_goto_tb(ctx, 0, ctx->branch_dest);
         }
+    } else if (ctx->is_tight_loop &&
+        pkt->insn[pkt->num_insns - 1].opcode == J2_endloop0) {
+        /*
+         * When we're in a tight loop, we defer the endloop0 processing
+         * to take advantage of direct block chaining
+         */
+        TCGLabel *skip = gen_new_label();
+        tcg_gen_brcondi_tl(TCG_COND_LEU, hex_gpr[HEX_REG_LC0], 1, skip);
+        tcg_gen_subi_tl(hex_gpr[HEX_REG_LC0], hex_gpr[HEX_REG_LC0], 1);
+        gen_goto_tb(ctx, 0, ctx->base.tb->pc);
+        gen_set_label(skip);
+        gen_goto_tb(ctx, 1, ctx->next_PC);
     } else {
         tcg_gen_lookup_and_goto_ptr();
     }
@@ -328,13 +340,23 @@ bool is_gather_store_insn(Insn *insn, Packet *pkt)
 static void mark_implicit_reg_write(DisasContext *ctx, Insn *insn,
                                     int attrib, int rnum)
 {
-    if (GET_ATTRIB(insn->opcode, attrib)) {
+    uint16_t opcode = insn->opcode;
+    if (GET_ATTRIB(opcode, attrib)) {
         /*
          * USR is used to set overflow and FP exceptions,
          * so treat it as conditional
          */
-        bool is_predicated = GET_ATTRIB(insn->opcode, A_CONDEXEC) ||
+        bool is_predicated = GET_ATTRIB(opcode, A_CONDEXEC) ||
                              rnum == HEX_REG_USR;
+
+        /* LC0/LC1 is conditionally written by endloop instructions */
+        if ((rnum == HEX_REG_LC0 || rnum == HEX_REG_LC1) &&
+            (opcode == J2_endloop0 ||
+             opcode == J2_endloop1 ||
+             opcode == J2_endloop01)) {
+            is_predicated = true;
+        }
+
         if (is_predicated && !is_preloaded(ctx, rnum)) {
             tcg_gen_mov_tl(hex_new_value[rnum], hex_gpr[rnum]);
         }
@@ -420,6 +442,14 @@ static void gen_reg_writes(DisasContext *ctx)
         int reg_num = ctx->reg_log[i];
 
         tcg_gen_mov_tl(hex_gpr[reg_num], hex_new_value[reg_num]);
+
+        /*
+         * ctx->is_tight_loop is set when SA0 points to the beginning of the TB.
+         * If we write to SA0, we have to turn off tight loop handling.
+         */
+        if (reg_num == HEX_REG_SA0) {
+            ctx->is_tight_loop = false;
+        }
     }
 }
 
@@ -793,7 +823,7 @@ static void gen_commit_packet(CPUHexagonState *env, DisasContext *ctx,
     }
 
     if (pkt->pkt_has_cof) {
-        gen_end_tb(ctx);
+        gen_end_tb(ctx, pkt);
     }
 }
 
@@ -838,8 +868,11 @@ static void hexagon_tr_init_disas_context(DisasContextBase *dcbase,
 static void hexagon_tr_tb_start(DisasContextBase *db, CPUState *cpu)
 {
     DisasContext *ctx = container_of(db, DisasContext, base);
+    HexStateFlags hex_flags = { db->tb->flags };
+
     ctx->has_single_direct_branch = false;
     ctx->branch_cond = NULL;
+    ctx->is_tight_loop = hex_flags.is_tight_loop;
 }
 
 static void hexagon_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-10-24 23:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-24 23:51 [PATCH v2 0/8] Hexagon (target/hexagon) Improve change-of-flow Taylor Simpson
2022-10-24 23:51 ` [PATCH v2 1/8] Hexagon (target/hexagon) Only use branch_taken when packet has multi cof Taylor Simpson
2022-10-24 23:51 ` [PATCH v2 2/8] Hexagon (target/hexagon) Remove PC from the runtime state Taylor Simpson
2022-10-24 23:51 ` [PATCH v2 3/8] Hexagon (target/hexagon) Remove next_PC from " Taylor Simpson
2022-10-24 23:51 ` [PATCH v2 4/8] Hexagon (target/hexagon) Add overrides for direct call instructions Taylor Simpson
2022-10-24 23:51 ` [PATCH v2 5/8] Hexagon (target/hexagon) Add overrides for compound compare and jump Taylor Simpson
2022-10-24 23:51 ` [PATCH v2 6/8] Hexagon (target/hexagon) Add overrides for various forms of jump Taylor Simpson
2022-10-24 23:51 ` [PATCH v2 7/8] Hexagon (target/hexagon) Use direct block chaining for direct jump/branch Taylor Simpson
2022-10-24 23:51 ` [PATCH v2 8/8] Hexagon (target/hexagon) Use direct block chaining for tight loops Taylor Simpson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).