qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr
@ 2023-06-21 18:05 Richard Henderson
  2023-06-21 18:06 ` [PATCH v2 1/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb Richard Henderson
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Richard Henderson @ 2023-06-21 18:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland

Changes from v1:
  * Split into teeny weeny pieces.

  * It turns out the sparc_tr_tb_stop hunk of v1 was buggy,
    in that things that are not simple branches use DYNAMIC_PC,
    e.g. the RETT (return from trap) instruction.

    Introduce DYNAMIC_PC_LOOKUP to distinguish the couple of
    places where we have a dynamic pc, but no other change
    of state (conditional branches, JMPL, RETURN).

  * Drop the change for WRFPRS, because it's too infrequent.
    The WRASI change affects memcpy/memset, so that's more important.

Boots Mark's sol8 install cdrom.  :-)

Top of the profile changes from

    41.55%  qemu-system-sparc              [.] cpu_exec_loop
    14.02%  qemu-system-sparc              [.] cpu_tb_exec
     8.74%  qemu-system-sparc              [.] tb_lookup
     2.11%  qemu-system-sparc              [.] tcg_splitwx_to_rw
     1.63%  memfd:tcg-jit (deleted)        [.] 0x0000000000000004

to

    31.59%  qemu-system-sparc              [.] helper_lookup_tb_ptr
    17.79%  qemu-system-sparc              [.] tb_lookup
     5.38%  qemu-system-sparc              [.] compute_all_sub
     2.38%  qemu-system-sparc              [.] helper_compute_psr
     2.36%  qemu-system-sparc              [.] helper_check_align
     1.79%  memfd:tcg-jit (deleted)        [.] 0x000000000063fc8e

This probably indicates that cpu_get_tb_cpu_state could be
improved to not consume so much overhead.


r~


Richard Henderson (8):
  target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb
  target/sparc: Fix npc comparison in sparc_tr_insn_start
  target/sparc: Drop inline markers from translate.c
  target/sparc: Introduce DYNAMIC_PC_LOOKUP
  target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches
  target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL
  target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN
  target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI

 target/sparc/translate.c | 410 ++++++++++++++++++++++-----------------
 1 file changed, 233 insertions(+), 177 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb
  2023-06-21 18:05 [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
@ 2023-06-21 18:06 ` Richard Henderson
  2023-06-22  9:03   ` Philippe Mathieu-Daudé
  2023-06-21 18:06 ` [PATCH v2 2/8] target/sparc: Fix npc comparison in sparc_tr_insn_start Richard Henderson
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2023-06-21 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/sparc/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index bad2ec90a0..28d4cdb8b4 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -318,10 +318,10 @@ static void gen_goto_tb(DisasContext *s, int tb_num,
         tcg_gen_movi_tl(cpu_npc, npc);
         tcg_gen_exit_tb(s->base.tb, tb_num);
     } else {
-        /* jump to another page: currently not optimized */
+        /* jump to another page: we can use an indirect jump */
         tcg_gen_movi_tl(cpu_pc, pc);
         tcg_gen_movi_tl(cpu_npc, npc);
-        tcg_gen_exit_tb(NULL, 0);
+        tcg_gen_lookup_and_goto_ptr();
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/8] target/sparc: Fix npc comparison in sparc_tr_insn_start
  2023-06-21 18:05 [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
  2023-06-21 18:06 ` [PATCH v2 1/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb Richard Henderson
@ 2023-06-21 18:06 ` Richard Henderson
  2023-06-27  9:11   ` Philippe Mathieu-Daudé
  2023-06-21 18:06 ` [PATCH v2 3/8] target/sparc: Drop inline markers from translate.c Richard Henderson
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2023-06-21 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland

During translation, npc == address, DYNAMIC_PC, or JUMP_PC.
It is only the encoding between here and sparc_restore_state_to_opc
that considers JUMP_PC to be a bit within a larger value.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/sparc/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 28d4cdb8b4..eec6f9ca67 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -5594,7 +5594,7 @@ static void sparc_tr_insn_start(DisasContextBase *dcbase, CPUState *cs)
 {
     DisasContext *dc = container_of(dcbase, DisasContext, base);
 
-    if (dc->npc & JUMP_PC) {
+    if (dc->npc == JUMP_PC) {
         assert(dc->jump_pc[1] == dc->pc + 4);
         tcg_gen_insn_start(dc->pc, dc->jump_pc[0] | JUMP_PC);
     } else {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/8] target/sparc: Drop inline markers from translate.c
  2023-06-21 18:05 [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
  2023-06-21 18:06 ` [PATCH v2 1/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb Richard Henderson
  2023-06-21 18:06 ` [PATCH v2 2/8] target/sparc: Fix npc comparison in sparc_tr_insn_start Richard Henderson
@ 2023-06-21 18:06 ` Richard Henderson
  2023-06-27  8:58   ` Philippe Mathieu-Daudé
  2023-06-21 18:06 ` [PATCH v2 4/8] target/sparc: Introduce DYNAMIC_PC_LOOKUP Richard Henderson
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2023-06-21 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland

Let the compiler decide about inlining.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/sparc/translate.c | 237 ++++++++++++++++++---------------------
 1 file changed, 111 insertions(+), 126 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index eec6f9ca67..1312c3e94d 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -125,7 +125,7 @@ static int sign_extend(int x, int len)
 
 #define IS_IMM (insn & (1<<13))
 
-static inline void gen_update_fprs_dirty(DisasContext *dc, int rd)
+static void gen_update_fprs_dirty(DisasContext *dc, int rd)
 {
 #if defined(TARGET_SPARC64)
     int bit = (rd < 32) ? 1 : 2;
@@ -264,7 +264,7 @@ static void gen_move_Q(DisasContext *dc, unsigned int rd, unsigned int rs)
 #endif
 #endif
 
-static inline void gen_address_mask(DisasContext *dc, TCGv addr)
+static void gen_address_mask(DisasContext *dc, TCGv addr)
 {
 #ifdef TARGET_SPARC64
     if (AM_CHECK(dc))
@@ -272,7 +272,7 @@ static inline void gen_address_mask(DisasContext *dc, TCGv addr)
 #endif
 }
 
-static inline TCGv gen_load_gpr(DisasContext *dc, int reg)
+static TCGv gen_load_gpr(DisasContext *dc, int reg)
 {
     if (reg > 0) {
         assert(reg < 32);
@@ -284,7 +284,7 @@ static inline TCGv gen_load_gpr(DisasContext *dc, int reg)
     }
 }
 
-static inline void gen_store_gpr(DisasContext *dc, int reg, TCGv v)
+static void gen_store_gpr(DisasContext *dc, int reg, TCGv v)
 {
     if (reg > 0) {
         assert(reg < 32);
@@ -292,7 +292,7 @@ static inline void gen_store_gpr(DisasContext *dc, int reg, TCGv v)
     }
 }
 
-static inline TCGv gen_dest_gpr(DisasContext *dc, int reg)
+static TCGv gen_dest_gpr(DisasContext *dc, int reg)
 {
     if (reg > 0) {
         assert(reg < 32);
@@ -326,31 +326,31 @@ static void gen_goto_tb(DisasContext *s, int tb_num,
 }
 
 // XXX suboptimal
-static inline void gen_mov_reg_N(TCGv reg, TCGv_i32 src)
+static void gen_mov_reg_N(TCGv reg, TCGv_i32 src)
 {
     tcg_gen_extu_i32_tl(reg, src);
     tcg_gen_extract_tl(reg, reg, PSR_NEG_SHIFT, 1);
 }
 
-static inline void gen_mov_reg_Z(TCGv reg, TCGv_i32 src)
+static void gen_mov_reg_Z(TCGv reg, TCGv_i32 src)
 {
     tcg_gen_extu_i32_tl(reg, src);
     tcg_gen_extract_tl(reg, reg, PSR_ZERO_SHIFT, 1);
 }
 
-static inline void gen_mov_reg_V(TCGv reg, TCGv_i32 src)
+static void gen_mov_reg_V(TCGv reg, TCGv_i32 src)
 {
     tcg_gen_extu_i32_tl(reg, src);
     tcg_gen_extract_tl(reg, reg, PSR_OVF_SHIFT, 1);
 }
 
-static inline void gen_mov_reg_C(TCGv reg, TCGv_i32 src)
+static void gen_mov_reg_C(TCGv reg, TCGv_i32 src)
 {
     tcg_gen_extu_i32_tl(reg, src);
     tcg_gen_extract_tl(reg, reg, PSR_CARRY_SHIFT, 1);
 }
 
-static inline void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2)
+static void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2)
 {
     tcg_gen_mov_tl(cpu_cc_src, src1);
     tcg_gen_mov_tl(cpu_cc_src2, src2);
@@ -465,7 +465,7 @@ static void gen_op_addx_int(DisasContext *dc, TCGv dst, TCGv src1,
     }
 }
 
-static inline void gen_op_sub_cc(TCGv dst, TCGv src1, TCGv src2)
+static void gen_op_sub_cc(TCGv dst, TCGv src1, TCGv src2)
 {
     tcg_gen_mov_tl(cpu_cc_src, src1);
     tcg_gen_mov_tl(cpu_cc_src2, src2);
@@ -538,7 +538,7 @@ static void gen_op_subx_int(DisasContext *dc, TCGv dst, TCGv src1,
     }
 }
 
-static inline void gen_op_mulscc(TCGv dst, TCGv src1, TCGv src2)
+static void gen_op_mulscc(TCGv dst, TCGv src1, TCGv src2)
 {
     TCGv r_temp, zero, t0;
 
@@ -577,7 +577,7 @@ static inline void gen_op_mulscc(TCGv dst, TCGv src1, TCGv src2)
     tcg_gen_mov_tl(dst, cpu_cc_dst);
 }
 
-static inline void gen_op_multiply(TCGv dst, TCGv src1, TCGv src2, int sign_ext)
+static void gen_op_multiply(TCGv dst, TCGv src1, TCGv src2, int sign_ext)
 {
 #if TARGET_LONG_BITS == 32
     if (sign_ext) {
@@ -602,32 +602,32 @@ static inline void gen_op_multiply(TCGv dst, TCGv src1, TCGv src2, int sign_ext)
 #endif
 }
 
-static inline void gen_op_umul(TCGv dst, TCGv src1, TCGv src2)
+static void gen_op_umul(TCGv dst, TCGv src1, TCGv src2)
 {
     /* zero-extend truncated operands before multiplication */
     gen_op_multiply(dst, src1, src2, 0);
 }
 
-static inline void gen_op_smul(TCGv dst, TCGv src1, TCGv src2)
+static void gen_op_smul(TCGv dst, TCGv src1, TCGv src2)
 {
     /* sign-extend truncated operands before multiplication */
     gen_op_multiply(dst, src1, src2, 1);
 }
 
 // 1
-static inline void gen_op_eval_ba(TCGv dst)
+static void gen_op_eval_ba(TCGv dst)
 {
     tcg_gen_movi_tl(dst, 1);
 }
 
 // Z
-static inline void gen_op_eval_be(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_be(TCGv dst, TCGv_i32 src)
 {
     gen_mov_reg_Z(dst, src);
 }
 
 // Z | (N ^ V)
-static inline void gen_op_eval_ble(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_ble(TCGv dst, TCGv_i32 src)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_N(t0, src);
@@ -638,7 +638,7 @@ static inline void gen_op_eval_ble(TCGv dst, TCGv_i32 src)
 }
 
 // N ^ V
-static inline void gen_op_eval_bl(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_bl(TCGv dst, TCGv_i32 src)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_V(t0, src);
@@ -647,7 +647,7 @@ static inline void gen_op_eval_bl(TCGv dst, TCGv_i32 src)
 }
 
 // C | Z
-static inline void gen_op_eval_bleu(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_bleu(TCGv dst, TCGv_i32 src)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_Z(t0, src);
@@ -656,73 +656,73 @@ static inline void gen_op_eval_bleu(TCGv dst, TCGv_i32 src)
 }
 
 // C
-static inline void gen_op_eval_bcs(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_bcs(TCGv dst, TCGv_i32 src)
 {
     gen_mov_reg_C(dst, src);
 }
 
 // V
-static inline void gen_op_eval_bvs(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_bvs(TCGv dst, TCGv_i32 src)
 {
     gen_mov_reg_V(dst, src);
 }
 
 // 0
-static inline void gen_op_eval_bn(TCGv dst)
+static void gen_op_eval_bn(TCGv dst)
 {
     tcg_gen_movi_tl(dst, 0);
 }
 
 // N
-static inline void gen_op_eval_bneg(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_bneg(TCGv dst, TCGv_i32 src)
 {
     gen_mov_reg_N(dst, src);
 }
 
 // !Z
-static inline void gen_op_eval_bne(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_bne(TCGv dst, TCGv_i32 src)
 {
     gen_mov_reg_Z(dst, src);
     tcg_gen_xori_tl(dst, dst, 0x1);
 }
 
 // !(Z | (N ^ V))
-static inline void gen_op_eval_bg(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_bg(TCGv dst, TCGv_i32 src)
 {
     gen_op_eval_ble(dst, src);
     tcg_gen_xori_tl(dst, dst, 0x1);
 }
 
 // !(N ^ V)
-static inline void gen_op_eval_bge(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_bge(TCGv dst, TCGv_i32 src)
 {
     gen_op_eval_bl(dst, src);
     tcg_gen_xori_tl(dst, dst, 0x1);
 }
 
 // !(C | Z)
-static inline void gen_op_eval_bgu(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_bgu(TCGv dst, TCGv_i32 src)
 {
     gen_op_eval_bleu(dst, src);
     tcg_gen_xori_tl(dst, dst, 0x1);
 }
 
 // !C
-static inline void gen_op_eval_bcc(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_bcc(TCGv dst, TCGv_i32 src)
 {
     gen_mov_reg_C(dst, src);
     tcg_gen_xori_tl(dst, dst, 0x1);
 }
 
 // !N
-static inline void gen_op_eval_bpos(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_bpos(TCGv dst, TCGv_i32 src)
 {
     gen_mov_reg_N(dst, src);
     tcg_gen_xori_tl(dst, dst, 0x1);
 }
 
 // !V
-static inline void gen_op_eval_bvc(TCGv dst, TCGv_i32 src)
+static void gen_op_eval_bvc(TCGv dst, TCGv_i32 src)
 {
     gen_mov_reg_V(dst, src);
     tcg_gen_xori_tl(dst, dst, 0x1);
@@ -735,23 +735,21 @@ static inline void gen_op_eval_bvc(TCGv dst, TCGv_i32 src)
    2 >
    3 unordered
 */
-static inline void gen_mov_reg_FCC0(TCGv reg, TCGv src,
+static void gen_mov_reg_FCC0(TCGv reg, TCGv src,
                                     unsigned int fcc_offset)
 {
     tcg_gen_shri_tl(reg, src, FSR_FCC0_SHIFT + fcc_offset);
     tcg_gen_andi_tl(reg, reg, 0x1);
 }
 
-static inline void gen_mov_reg_FCC1(TCGv reg, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_mov_reg_FCC1(TCGv reg, TCGv src, unsigned int fcc_offset)
 {
     tcg_gen_shri_tl(reg, src, FSR_FCC1_SHIFT + fcc_offset);
     tcg_gen_andi_tl(reg, reg, 0x1);
 }
 
 // !0: FCC0 | FCC1
-static inline void gen_op_eval_fbne(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fbne(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_FCC0(dst, src, fcc_offset);
@@ -760,8 +758,7 @@ static inline void gen_op_eval_fbne(TCGv dst, TCGv src,
 }
 
 // 1 or 2: FCC0 ^ FCC1
-static inline void gen_op_eval_fblg(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fblg(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_FCC0(dst, src, fcc_offset);
@@ -770,15 +767,13 @@ static inline void gen_op_eval_fblg(TCGv dst, TCGv src,
 }
 
 // 1 or 3: FCC0
-static inline void gen_op_eval_fbul(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fbul(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     gen_mov_reg_FCC0(dst, src, fcc_offset);
 }
 
 // 1: FCC0 & !FCC1
-static inline void gen_op_eval_fbl(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fbl(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_FCC0(dst, src, fcc_offset);
@@ -787,15 +782,13 @@ static inline void gen_op_eval_fbl(TCGv dst, TCGv src,
 }
 
 // 2 or 3: FCC1
-static inline void gen_op_eval_fbug(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fbug(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     gen_mov_reg_FCC1(dst, src, fcc_offset);
 }
 
 // 2: !FCC0 & FCC1
-static inline void gen_op_eval_fbg(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fbg(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_FCC0(dst, src, fcc_offset);
@@ -804,8 +797,7 @@ static inline void gen_op_eval_fbg(TCGv dst, TCGv src,
 }
 
 // 3: FCC0 & FCC1
-static inline void gen_op_eval_fbu(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fbu(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_FCC0(dst, src, fcc_offset);
@@ -814,8 +806,7 @@ static inline void gen_op_eval_fbu(TCGv dst, TCGv src,
 }
 
 // 0: !(FCC0 | FCC1)
-static inline void gen_op_eval_fbe(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fbe(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_FCC0(dst, src, fcc_offset);
@@ -825,8 +816,7 @@ static inline void gen_op_eval_fbe(TCGv dst, TCGv src,
 }
 
 // 0 or 3: !(FCC0 ^ FCC1)
-static inline void gen_op_eval_fbue(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fbue(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_FCC0(dst, src, fcc_offset);
@@ -836,16 +826,14 @@ static inline void gen_op_eval_fbue(TCGv dst, TCGv src,
 }
 
 // 0 or 2: !FCC0
-static inline void gen_op_eval_fbge(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fbge(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     gen_mov_reg_FCC0(dst, src, fcc_offset);
     tcg_gen_xori_tl(dst, dst, 0x1);
 }
 
 // !1: !(FCC0 & !FCC1)
-static inline void gen_op_eval_fbuge(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fbuge(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_FCC0(dst, src, fcc_offset);
@@ -855,16 +843,14 @@ static inline void gen_op_eval_fbuge(TCGv dst, TCGv src,
 }
 
 // 0 or 1: !FCC1
-static inline void gen_op_eval_fble(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fble(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     gen_mov_reg_FCC1(dst, src, fcc_offset);
     tcg_gen_xori_tl(dst, dst, 0x1);
 }
 
 // !2: !(!FCC0 & FCC1)
-static inline void gen_op_eval_fbule(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fbule(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_FCC0(dst, src, fcc_offset);
@@ -874,8 +860,7 @@ static inline void gen_op_eval_fbule(TCGv dst, TCGv src,
 }
 
 // !3: !(FCC0 & FCC1)
-static inline void gen_op_eval_fbo(TCGv dst, TCGv src,
-                                    unsigned int fcc_offset)
+static void gen_op_eval_fbo(TCGv dst, TCGv src, unsigned int fcc_offset)
 {
     TCGv t0 = tcg_temp_new();
     gen_mov_reg_FCC0(dst, src, fcc_offset);
@@ -884,8 +869,8 @@ static inline void gen_op_eval_fbo(TCGv dst, TCGv src,
     tcg_gen_xori_tl(dst, dst, 0x1);
 }
 
-static inline void gen_branch2(DisasContext *dc, target_ulong pc1,
-                               target_ulong pc2, TCGv r_cond)
+static void gen_branch2(DisasContext *dc, target_ulong pc1,
+                        target_ulong pc2, TCGv r_cond)
 {
     TCGLabel *l1 = gen_new_label();
 
@@ -935,7 +920,7 @@ static void gen_branch_n(DisasContext *dc, target_ulong pc1)
     }
 }
 
-static inline void gen_generic_branch(DisasContext *dc)
+static void gen_generic_branch(DisasContext *dc)
 {
     TCGv npc0 = tcg_constant_tl(dc->jump_pc[0]);
     TCGv npc1 = tcg_constant_tl(dc->jump_pc[1]);
@@ -946,7 +931,7 @@ static inline void gen_generic_branch(DisasContext *dc)
 
 /* call this function before using the condition register as it may
    have been set for a jump */
-static inline void flush_cond(DisasContext *dc)
+static void flush_cond(DisasContext *dc)
 {
     if (dc->npc == JUMP_PC) {
         gen_generic_branch(dc);
@@ -954,7 +939,7 @@ static inline void flush_cond(DisasContext *dc)
     }
 }
 
-static inline void save_npc(DisasContext *dc)
+static void save_npc(DisasContext *dc)
 {
     if (dc->npc == JUMP_PC) {
         gen_generic_branch(dc);
@@ -964,7 +949,7 @@ static inline void save_npc(DisasContext *dc)
     }
 }
 
-static inline void update_psr(DisasContext *dc)
+static void update_psr(DisasContext *dc)
 {
     if (dc->cc_op != CC_OP_FLAGS) {
         dc->cc_op = CC_OP_FLAGS;
@@ -972,7 +957,7 @@ static inline void update_psr(DisasContext *dc)
     }
 }
 
-static inline void save_state(DisasContext *dc)
+static void save_state(DisasContext *dc)
 {
     tcg_gen_movi_tl(cpu_pc, dc->pc);
     save_npc(dc);
@@ -990,7 +975,7 @@ static void gen_check_align(TCGv addr, int mask)
     gen_helper_check_align(cpu_env, addr, tcg_constant_i32(mask));
 }
 
-static inline void gen_mov_pc_npc(DisasContext *dc)
+static void gen_mov_pc_npc(DisasContext *dc)
 {
     if (dc->npc == JUMP_PC) {
         gen_generic_branch(dc);
@@ -1004,7 +989,7 @@ static inline void gen_mov_pc_npc(DisasContext *dc)
     }
 }
 
-static inline void gen_op_next_insn(void)
+static void gen_op_next_insn(void)
 {
     tcg_gen_mov_tl(cpu_pc, cpu_npc);
     tcg_gen_addi_tl(cpu_npc, cpu_npc, 4);
@@ -1305,7 +1290,7 @@ static void gen_compare_reg(DisasCompare *cmp, int cond, TCGv r_src)
     cmp->c2 = tcg_constant_tl(0);
 }
 
-static inline void gen_cond_reg(TCGv r_dst, int cond, TCGv r_src)
+static void gen_cond_reg(TCGv r_dst, int cond, TCGv r_src)
 {
     DisasCompare cmp;
     gen_compare_reg(&cmp, cond, r_src);
@@ -1414,7 +1399,7 @@ static void do_branch_reg(DisasContext *dc, int32_t offset, uint32_t insn,
     }
 }
 
-static inline void gen_op_fcmps(int fccno, TCGv_i32 r_rs1, TCGv_i32 r_rs2)
+static void gen_op_fcmps(int fccno, TCGv_i32 r_rs1, TCGv_i32 r_rs2)
 {
     switch (fccno) {
     case 0:
@@ -1432,7 +1417,7 @@ static inline void gen_op_fcmps(int fccno, TCGv_i32 r_rs1, TCGv_i32 r_rs2)
     }
 }
 
-static inline void gen_op_fcmpd(int fccno, TCGv_i64 r_rs1, TCGv_i64 r_rs2)
+static void gen_op_fcmpd(int fccno, TCGv_i64 r_rs1, TCGv_i64 r_rs2)
 {
     switch (fccno) {
     case 0:
@@ -1450,7 +1435,7 @@ static inline void gen_op_fcmpd(int fccno, TCGv_i64 r_rs1, TCGv_i64 r_rs2)
     }
 }
 
-static inline void gen_op_fcmpq(int fccno)
+static void gen_op_fcmpq(int fccno)
 {
     switch (fccno) {
     case 0:
@@ -1468,7 +1453,7 @@ static inline void gen_op_fcmpq(int fccno)
     }
 }
 
-static inline void gen_op_fcmpes(int fccno, TCGv_i32 r_rs1, TCGv_i32 r_rs2)
+static void gen_op_fcmpes(int fccno, TCGv_i32 r_rs1, TCGv_i32 r_rs2)
 {
     switch (fccno) {
     case 0:
@@ -1486,7 +1471,7 @@ static inline void gen_op_fcmpes(int fccno, TCGv_i32 r_rs1, TCGv_i32 r_rs2)
     }
 }
 
-static inline void gen_op_fcmped(int fccno, TCGv_i64 r_rs1, TCGv_i64 r_rs2)
+static void gen_op_fcmped(int fccno, TCGv_i64 r_rs1, TCGv_i64 r_rs2)
 {
     switch (fccno) {
     case 0:
@@ -1504,7 +1489,7 @@ static inline void gen_op_fcmped(int fccno, TCGv_i64 r_rs1, TCGv_i64 r_rs2)
     }
 }
 
-static inline void gen_op_fcmpeq(int fccno)
+static void gen_op_fcmpeq(int fccno)
 {
     switch (fccno) {
     case 0:
@@ -1524,32 +1509,32 @@ static inline void gen_op_fcmpeq(int fccno)
 
 #else
 
-static inline void gen_op_fcmps(int fccno, TCGv r_rs1, TCGv r_rs2)
+static void gen_op_fcmps(int fccno, TCGv r_rs1, TCGv r_rs2)
 {
     gen_helper_fcmps(cpu_fsr, cpu_env, r_rs1, r_rs2);
 }
 
-static inline void gen_op_fcmpd(int fccno, TCGv_i64 r_rs1, TCGv_i64 r_rs2)
+static void gen_op_fcmpd(int fccno, TCGv_i64 r_rs1, TCGv_i64 r_rs2)
 {
     gen_helper_fcmpd(cpu_fsr, cpu_env, r_rs1, r_rs2);
 }
 
-static inline void gen_op_fcmpq(int fccno)
+static void gen_op_fcmpq(int fccno)
 {
     gen_helper_fcmpq(cpu_fsr, cpu_env);
 }
 
-static inline void gen_op_fcmpes(int fccno, TCGv r_rs1, TCGv r_rs2)
+static void gen_op_fcmpes(int fccno, TCGv r_rs1, TCGv r_rs2)
 {
     gen_helper_fcmpes(cpu_fsr, cpu_env, r_rs1, r_rs2);
 }
 
-static inline void gen_op_fcmped(int fccno, TCGv_i64 r_rs1, TCGv_i64 r_rs2)
+static void gen_op_fcmped(int fccno, TCGv_i64 r_rs1, TCGv_i64 r_rs2)
 {
     gen_helper_fcmped(cpu_fsr, cpu_env, r_rs1, r_rs2);
 }
 
-static inline void gen_op_fcmpeq(int fccno)
+static void gen_op_fcmpeq(int fccno)
 {
     gen_helper_fcmpeq(cpu_fsr, cpu_env);
 }
@@ -1573,12 +1558,12 @@ static int gen_trap_ifnofpu(DisasContext *dc)
     return 0;
 }
 
-static inline void gen_op_clear_ieee_excp_and_FTT(void)
+static void gen_op_clear_ieee_excp_and_FTT(void)
 {
     tcg_gen_andi_tl(cpu_fsr, cpu_fsr, FSR_FTT_CEXC_NMASK);
 }
 
-static inline void gen_fop_FF(DisasContext *dc, int rd, int rs,
+static void gen_fop_FF(DisasContext *dc, int rd, int rs,
                               void (*gen)(TCGv_i32, TCGv_ptr, TCGv_i32))
 {
     TCGv_i32 dst, src;
@@ -1592,8 +1577,8 @@ static inline void gen_fop_FF(DisasContext *dc, int rd, int rs,
     gen_store_fpr_F(dc, rd, dst);
 }
 
-static inline void gen_ne_fop_FF(DisasContext *dc, int rd, int rs,
-                                 void (*gen)(TCGv_i32, TCGv_i32))
+static void gen_ne_fop_FF(DisasContext *dc, int rd, int rs,
+                          void (*gen)(TCGv_i32, TCGv_i32))
 {
     TCGv_i32 dst, src;
 
@@ -1605,7 +1590,7 @@ static inline void gen_ne_fop_FF(DisasContext *dc, int rd, int rs,
     gen_store_fpr_F(dc, rd, dst);
 }
 
-static inline void gen_fop_FFF(DisasContext *dc, int rd, int rs1, int rs2,
+static void gen_fop_FFF(DisasContext *dc, int rd, int rs1, int rs2,
                         void (*gen)(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32))
 {
     TCGv_i32 dst, src1, src2;
@@ -1621,8 +1606,8 @@ static inline void gen_fop_FFF(DisasContext *dc, int rd, int rs1, int rs2,
 }
 
 #ifdef TARGET_SPARC64
-static inline void gen_ne_fop_FFF(DisasContext *dc, int rd, int rs1, int rs2,
-                                  void (*gen)(TCGv_i32, TCGv_i32, TCGv_i32))
+static void gen_ne_fop_FFF(DisasContext *dc, int rd, int rs1, int rs2,
+                           void (*gen)(TCGv_i32, TCGv_i32, TCGv_i32))
 {
     TCGv_i32 dst, src1, src2;
 
@@ -1636,8 +1621,8 @@ static inline void gen_ne_fop_FFF(DisasContext *dc, int rd, int rs1, int rs2,
 }
 #endif
 
-static inline void gen_fop_DD(DisasContext *dc, int rd, int rs,
-                              void (*gen)(TCGv_i64, TCGv_ptr, TCGv_i64))
+static void gen_fop_DD(DisasContext *dc, int rd, int rs,
+                       void (*gen)(TCGv_i64, TCGv_ptr, TCGv_i64))
 {
     TCGv_i64 dst, src;
 
@@ -1651,8 +1636,8 @@ static inline void gen_fop_DD(DisasContext *dc, int rd, int rs,
 }
 
 #ifdef TARGET_SPARC64
-static inline void gen_ne_fop_DD(DisasContext *dc, int rd, int rs,
-                                 void (*gen)(TCGv_i64, TCGv_i64))
+static void gen_ne_fop_DD(DisasContext *dc, int rd, int rs,
+                          void (*gen)(TCGv_i64, TCGv_i64))
 {
     TCGv_i64 dst, src;
 
@@ -1665,7 +1650,7 @@ static inline void gen_ne_fop_DD(DisasContext *dc, int rd, int rs,
 }
 #endif
 
-static inline void gen_fop_DDD(DisasContext *dc, int rd, int rs1, int rs2,
+static void gen_fop_DDD(DisasContext *dc, int rd, int rs1, int rs2,
                         void (*gen)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
 {
     TCGv_i64 dst, src1, src2;
@@ -1681,8 +1666,8 @@ static inline void gen_fop_DDD(DisasContext *dc, int rd, int rs1, int rs2,
 }
 
 #ifdef TARGET_SPARC64
-static inline void gen_ne_fop_DDD(DisasContext *dc, int rd, int rs1, int rs2,
-                                  void (*gen)(TCGv_i64, TCGv_i64, TCGv_i64))
+static void gen_ne_fop_DDD(DisasContext *dc, int rd, int rs1, int rs2,
+                           void (*gen)(TCGv_i64, TCGv_i64, TCGv_i64))
 {
     TCGv_i64 dst, src1, src2;
 
@@ -1695,8 +1680,8 @@ static inline void gen_ne_fop_DDD(DisasContext *dc, int rd, int rs1, int rs2,
     gen_store_fpr_D(dc, rd, dst);
 }
 
-static inline void gen_gsr_fop_DDD(DisasContext *dc, int rd, int rs1, int rs2,
-                           void (*gen)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64))
+static void gen_gsr_fop_DDD(DisasContext *dc, int rd, int rs1, int rs2,
+                            void (*gen)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64))
 {
     TCGv_i64 dst, src1, src2;
 
@@ -1709,8 +1694,8 @@ static inline void gen_gsr_fop_DDD(DisasContext *dc, int rd, int rs1, int rs2,
     gen_store_fpr_D(dc, rd, dst);
 }
 
-static inline void gen_ne_fop_DDDD(DisasContext *dc, int rd, int rs1, int rs2,
-                           void (*gen)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64))
+static void gen_ne_fop_DDDD(DisasContext *dc, int rd, int rs1, int rs2,
+                            void (*gen)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64))
 {
     TCGv_i64 dst, src0, src1, src2;
 
@@ -1725,8 +1710,8 @@ static inline void gen_ne_fop_DDDD(DisasContext *dc, int rd, int rs1, int rs2,
 }
 #endif
 
-static inline void gen_fop_QQ(DisasContext *dc, int rd, int rs,
-                              void (*gen)(TCGv_ptr))
+static void gen_fop_QQ(DisasContext *dc, int rd, int rs,
+                       void (*gen)(TCGv_ptr))
 {
     gen_op_load_fpr_QT1(QFPREG(rs));
 
@@ -1738,8 +1723,8 @@ static inline void gen_fop_QQ(DisasContext *dc, int rd, int rs,
 }
 
 #ifdef TARGET_SPARC64
-static inline void gen_ne_fop_QQ(DisasContext *dc, int rd, int rs,
-                                 void (*gen)(TCGv_ptr))
+static void gen_ne_fop_QQ(DisasContext *dc, int rd, int rs,
+                          void (*gen)(TCGv_ptr))
 {
     gen_op_load_fpr_QT1(QFPREG(rs));
 
@@ -1750,8 +1735,8 @@ static inline void gen_ne_fop_QQ(DisasContext *dc, int rd, int rs,
 }
 #endif
 
-static inline void gen_fop_QQQ(DisasContext *dc, int rd, int rs1, int rs2,
-                               void (*gen)(TCGv_ptr))
+static void gen_fop_QQQ(DisasContext *dc, int rd, int rs1, int rs2,
+                        void (*gen)(TCGv_ptr))
 {
     gen_op_load_fpr_QT0(QFPREG(rs1));
     gen_op_load_fpr_QT1(QFPREG(rs2));
@@ -1763,7 +1748,7 @@ static inline void gen_fop_QQQ(DisasContext *dc, int rd, int rs1, int rs2,
     gen_update_fprs_dirty(dc, QFPREG(rd));
 }
 
-static inline void gen_fop_DFF(DisasContext *dc, int rd, int rs1, int rs2,
+static void gen_fop_DFF(DisasContext *dc, int rd, int rs1, int rs2,
                         void (*gen)(TCGv_i64, TCGv_ptr, TCGv_i32, TCGv_i32))
 {
     TCGv_i64 dst;
@@ -1779,8 +1764,8 @@ static inline void gen_fop_DFF(DisasContext *dc, int rd, int rs1, int rs2,
     gen_store_fpr_D(dc, rd, dst);
 }
 
-static inline void gen_fop_QDD(DisasContext *dc, int rd, int rs1, int rs2,
-                               void (*gen)(TCGv_ptr, TCGv_i64, TCGv_i64))
+static void gen_fop_QDD(DisasContext *dc, int rd, int rs1, int rs2,
+                        void (*gen)(TCGv_ptr, TCGv_i64, TCGv_i64))
 {
     TCGv_i64 src1, src2;
 
@@ -1795,8 +1780,8 @@ static inline void gen_fop_QDD(DisasContext *dc, int rd, int rs1, int rs2,
 }
 
 #ifdef TARGET_SPARC64
-static inline void gen_fop_DF(DisasContext *dc, int rd, int rs,
-                              void (*gen)(TCGv_i64, TCGv_ptr, TCGv_i32))
+static void gen_fop_DF(DisasContext *dc, int rd, int rs,
+                       void (*gen)(TCGv_i64, TCGv_ptr, TCGv_i32))
 {
     TCGv_i64 dst;
     TCGv_i32 src;
@@ -1811,8 +1796,8 @@ static inline void gen_fop_DF(DisasContext *dc, int rd, int rs,
 }
 #endif
 
-static inline void gen_ne_fop_DF(DisasContext *dc, int rd, int rs,
-                                 void (*gen)(TCGv_i64, TCGv_ptr, TCGv_i32))
+static void gen_ne_fop_DF(DisasContext *dc, int rd, int rs,
+                          void (*gen)(TCGv_i64, TCGv_ptr, TCGv_i32))
 {
     TCGv_i64 dst;
     TCGv_i32 src;
@@ -1825,8 +1810,8 @@ static inline void gen_ne_fop_DF(DisasContext *dc, int rd, int rs,
     gen_store_fpr_D(dc, rd, dst);
 }
 
-static inline void gen_fop_FD(DisasContext *dc, int rd, int rs,
-                              void (*gen)(TCGv_i32, TCGv_ptr, TCGv_i64))
+static void gen_fop_FD(DisasContext *dc, int rd, int rs,
+                       void (*gen)(TCGv_i32, TCGv_ptr, TCGv_i64))
 {
     TCGv_i32 dst;
     TCGv_i64 src;
@@ -1840,8 +1825,8 @@ static inline void gen_fop_FD(DisasContext *dc, int rd, int rs,
     gen_store_fpr_F(dc, rd, dst);
 }
 
-static inline void gen_fop_FQ(DisasContext *dc, int rd, int rs,
-                              void (*gen)(TCGv_i32, TCGv_ptr))
+static void gen_fop_FQ(DisasContext *dc, int rd, int rs,
+                       void (*gen)(TCGv_i32, TCGv_ptr))
 {
     TCGv_i32 dst;
 
@@ -1854,8 +1839,8 @@ static inline void gen_fop_FQ(DisasContext *dc, int rd, int rs,
     gen_store_fpr_F(dc, rd, dst);
 }
 
-static inline void gen_fop_DQ(DisasContext *dc, int rd, int rs,
-                              void (*gen)(TCGv_i64, TCGv_ptr))
+static void gen_fop_DQ(DisasContext *dc, int rd, int rs,
+                       void (*gen)(TCGv_i64, TCGv_ptr))
 {
     TCGv_i64 dst;
 
@@ -1868,8 +1853,8 @@ static inline void gen_fop_DQ(DisasContext *dc, int rd, int rs,
     gen_store_fpr_D(dc, rd, dst);
 }
 
-static inline void gen_ne_fop_QF(DisasContext *dc, int rd, int rs,
-                                 void (*gen)(TCGv_ptr, TCGv_i32))
+static void gen_ne_fop_QF(DisasContext *dc, int rd, int rs,
+                          void (*gen)(TCGv_ptr, TCGv_i32))
 {
     TCGv_i32 src;
 
@@ -1881,8 +1866,8 @@ static inline void gen_ne_fop_QF(DisasContext *dc, int rd, int rs,
     gen_update_fprs_dirty(dc, QFPREG(rd));
 }
 
-static inline void gen_ne_fop_QD(DisasContext *dc, int rd, int rs,
-                                 void (*gen)(TCGv_ptr, TCGv_i64))
+static void gen_ne_fop_QD(DisasContext *dc, int rd, int rs,
+                          void (*gen)(TCGv_ptr, TCGv_i64))
 {
     TCGv_i64 src;
 
@@ -2813,7 +2798,7 @@ static void gen_fmovq(DisasContext *dc, DisasCompare *cmp, int rd, int rs)
 }
 
 #ifndef CONFIG_USER_ONLY
-static inline void gen_load_trap_state_at_tl(TCGv_ptr r_tsptr, TCGv_env cpu_env)
+static void gen_load_trap_state_at_tl(TCGv_ptr r_tsptr, TCGv_env cpu_env)
 {
     TCGv_i32 r_tl = tcg_temp_new_i32();
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/8] target/sparc: Introduce DYNAMIC_PC_LOOKUP
  2023-06-21 18:05 [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
                   ` (2 preceding siblings ...)
  2023-06-21 18:06 ` [PATCH v2 3/8] target/sparc: Drop inline markers from translate.c Richard Henderson
@ 2023-06-21 18:06 ` Richard Henderson
  2023-06-27  9:08   ` Philippe Mathieu-Daudé
  2023-06-21 18:06 ` [PATCH v2 5/8] target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches Richard Henderson
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2023-06-21 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland

Create a new artificial "next pc" which also indicates
that nothing has changed within the cpu state which
requires returning to the main loop.

Pipe this new value though all pc/npc checks.
Do not produce this new value yet.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/sparc/translate.c | 155 ++++++++++++++++++++++++++++-----------
 1 file changed, 111 insertions(+), 44 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 1312c3e94d..d12f2ac423 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -37,9 +37,12 @@
 #include "exec/helper-info.c.inc"
 #undef  HELPER_H
 
-#define DYNAMIC_PC  1 /* dynamic pc value */
-#define JUMP_PC     2 /* dynamic pc value which takes only two values
-                         according to jump_pc[T2] */
+/* Dynamic PC, must exit to main loop. */
+#define DYNAMIC_PC         1
+/* Dynamic PC, one of two values according to jump_pc[T2]. */
+#define JUMP_PC            2
+/* Dynamic PC, may lookup next TB. */
+#define DYNAMIC_PC_LOOKUP  3
 
 #define DISAS_EXIT  DISAS_TARGET_0
 
@@ -901,22 +904,25 @@ static void gen_branch_n(DisasContext *dc, target_ulong pc1)
 {
     target_ulong npc = dc->npc;
 
-    if (likely(npc != DYNAMIC_PC)) {
+    if (npc & 3) {
+        switch (npc) {
+        case DYNAMIC_PC:
+        case DYNAMIC_PC_LOOKUP:
+            tcg_gen_mov_tl(cpu_pc, cpu_npc);
+            tcg_gen_addi_tl(cpu_npc, cpu_npc, 4);
+            tcg_gen_movcond_tl(TCG_COND_NE, cpu_npc,
+                               cpu_cond, tcg_constant_tl(0),
+                               tcg_constant_tl(pc1), cpu_npc);
+            dc->pc = npc;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    } else {
         dc->pc = npc;
         dc->jump_pc[0] = pc1;
         dc->jump_pc[1] = npc + 4;
         dc->npc = JUMP_PC;
-    } else {
-        TCGv t, z;
-
-        tcg_gen_mov_tl(cpu_pc, cpu_npc);
-
-        tcg_gen_addi_tl(cpu_npc, cpu_npc, 4);
-        t = tcg_constant_tl(pc1);
-        z = tcg_constant_tl(0);
-        tcg_gen_movcond_tl(TCG_COND_NE, cpu_npc, cpu_cond, z, t, cpu_npc);
-
-        dc->pc = DYNAMIC_PC;
     }
 }
 
@@ -941,10 +947,19 @@ static void flush_cond(DisasContext *dc)
 
 static void save_npc(DisasContext *dc)
 {
-    if (dc->npc == JUMP_PC) {
-        gen_generic_branch(dc);
-        dc->npc = DYNAMIC_PC;
-    } else if (dc->npc != DYNAMIC_PC) {
+    if (dc->npc & 3) {
+        switch (dc->npc) {
+        case JUMP_PC:
+            gen_generic_branch(dc);
+            dc->npc = DYNAMIC_PC;
+            break;
+        case DYNAMIC_PC:
+        case DYNAMIC_PC_LOOKUP:
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    } else {
         tcg_gen_movi_tl(cpu_npc, dc->npc);
     }
 }
@@ -977,13 +992,21 @@ static void gen_check_align(TCGv addr, int mask)
 
 static void gen_mov_pc_npc(DisasContext *dc)
 {
-    if (dc->npc == JUMP_PC) {
-        gen_generic_branch(dc);
-        tcg_gen_mov_tl(cpu_pc, cpu_npc);
-        dc->pc = DYNAMIC_PC;
-    } else if (dc->npc == DYNAMIC_PC) {
-        tcg_gen_mov_tl(cpu_pc, cpu_npc);
-        dc->pc = DYNAMIC_PC;
+    if (dc->npc & 3) {
+        switch (dc->npc) {
+        case JUMP_PC:
+            gen_generic_branch(dc);
+            tcg_gen_mov_tl(cpu_pc, cpu_npc);
+            dc->pc = DYNAMIC_PC;
+            break;
+        case DYNAMIC_PC:
+        case DYNAMIC_PC_LOOKUP:
+            tcg_gen_mov_tl(cpu_pc, cpu_npc);
+            dc->pc = dc->npc;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     } else {
         dc->pc = dc->npc;
     }
@@ -5501,13 +5524,21 @@ static void disas_sparc_insn(DisasContext * dc, unsigned int insn)
         break;
     }
     /* default case for non jump instructions */
-    if (dc->npc == DYNAMIC_PC) {
-        dc->pc = DYNAMIC_PC;
-        gen_op_next_insn();
-    } else if (dc->npc == JUMP_PC) {
-        /* we can do a static jump */
-        gen_branch2(dc, dc->jump_pc[0], dc->jump_pc[1], cpu_cond);
-        dc->base.is_jmp = DISAS_NORETURN;
+    if (dc->npc & 3) {
+        switch (dc->npc) {
+        case DYNAMIC_PC:
+        case DYNAMIC_PC_LOOKUP:
+            dc->pc = dc->npc;
+            gen_op_next_insn();
+            break;
+        case JUMP_PC:
+            /* we can do a static jump */
+            gen_branch2(dc, dc->jump_pc[0], dc->jump_pc[1], cpu_cond);
+            dc->base.is_jmp = DISAS_NORETURN;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     } else {
         dc->pc = dc->npc;
         dc->npc = dc->npc + 4;
@@ -5578,13 +5609,23 @@ static void sparc_tr_tb_start(DisasContextBase *db, CPUState *cs)
 static void sparc_tr_insn_start(DisasContextBase *dcbase, CPUState *cs)
 {
     DisasContext *dc = container_of(dcbase, DisasContext, base);
+    target_ulong npc = dc->npc;
 
-    if (dc->npc == JUMP_PC) {
-        assert(dc->jump_pc[1] == dc->pc + 4);
-        tcg_gen_insn_start(dc->pc, dc->jump_pc[0] | JUMP_PC);
-    } else {
-        tcg_gen_insn_start(dc->pc, dc->npc);
+    if (npc & 3) {
+        switch (npc) {
+        case JUMP_PC:
+            assert(dc->jump_pc[1] == dc->pc + 4);
+            npc = dc->jump_pc[0] | JUMP_PC;
+            break;
+        case DYNAMIC_PC:
+        case DYNAMIC_PC_LOOKUP:
+            npc = DYNAMIC_PC;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     }
+    tcg_gen_insn_start(dc->pc, npc);
 }
 
 static void sparc_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
@@ -5608,20 +5649,46 @@ static void sparc_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
 static void sparc_tr_tb_stop(DisasContextBase *dcbase, CPUState *cs)
 {
     DisasContext *dc = container_of(dcbase, DisasContext, base);
+    bool may_lookup;
 
     switch (dc->base.is_jmp) {
     case DISAS_NEXT:
     case DISAS_TOO_MANY:
-        if (dc->pc != DYNAMIC_PC &&
-            (dc->npc != DYNAMIC_PC && dc->npc != JUMP_PC)) {
+        if (((dc->pc | dc->npc) & 3) == 0) {
             /* static PC and NPC: we can use direct chaining */
             gen_goto_tb(dc, 0, dc->pc, dc->npc);
-        } else {
-            if (dc->pc != DYNAMIC_PC) {
-                tcg_gen_movi_tl(cpu_pc, dc->pc);
+            break;
+        }
+
+        if (dc->pc & 3) {
+            switch (dc->pc) {
+            case DYNAMIC_PC_LOOKUP:
+                may_lookup = true;
+                break;
+            case DYNAMIC_PC:
+                may_lookup = false;
+                break;
+            default:
+                g_assert_not_reached();
             }
-            save_npc(dc);
+        } else {
+            tcg_gen_movi_tl(cpu_pc, dc->pc);
+            may_lookup = true;
+        }
+
+        save_npc(dc);
+        switch (dc->npc) {
+        case DYNAMIC_PC_LOOKUP:
+            if (may_lookup) {
+                tcg_gen_lookup_and_goto_ptr();
+                break;
+            }
+            /* fall through */
+        case DYNAMIC_PC:
             tcg_gen_exit_tb(NULL, 0);
+            break;
+        default:
+            g_assert_not_reached();
         }
         break;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 5/8] target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches
  2023-06-21 18:05 [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
                   ` (3 preceding siblings ...)
  2023-06-21 18:06 ` [PATCH v2 4/8] target/sparc: Introduce DYNAMIC_PC_LOOKUP Richard Henderson
@ 2023-06-21 18:06 ` Richard Henderson
  2023-06-27  9:09   ` Philippe Mathieu-Daudé
  2023-06-21 18:06 ` [PATCH v2 6/8] target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL Richard Henderson
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2023-06-21 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland

When resolving JUMP_PC, we know this is for a plain branch
with no other side effects.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/sparc/translate.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index d12f2ac423..bd3feed72e 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -941,7 +941,7 @@ static void flush_cond(DisasContext *dc)
 {
     if (dc->npc == JUMP_PC) {
         gen_generic_branch(dc);
-        dc->npc = DYNAMIC_PC;
+        dc->npc = DYNAMIC_PC_LOOKUP;
     }
 }
 
@@ -951,7 +951,7 @@ static void save_npc(DisasContext *dc)
         switch (dc->npc) {
         case JUMP_PC:
             gen_generic_branch(dc);
-            dc->npc = DYNAMIC_PC;
+            dc->npc = DYNAMIC_PC_LOOKUP;
             break;
         case DYNAMIC_PC:
         case DYNAMIC_PC_LOOKUP:
@@ -997,7 +997,7 @@ static void gen_mov_pc_npc(DisasContext *dc)
         case JUMP_PC:
             gen_generic_branch(dc);
             tcg_gen_mov_tl(cpu_pc, cpu_npc);
-            dc->pc = DYNAMIC_PC;
+            dc->pc = DYNAMIC_PC_LOOKUP;
             break;
         case DYNAMIC_PC:
         case DYNAMIC_PC_LOOKUP:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 6/8] target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL
  2023-06-21 18:05 [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
                   ` (4 preceding siblings ...)
  2023-06-21 18:06 ` [PATCH v2 5/8] target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches Richard Henderson
@ 2023-06-21 18:06 ` Richard Henderson
  2023-06-27  9:12   ` Philippe Mathieu-Daudé
  2023-06-21 18:06 ` [PATCH v2 7/8] target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN Richard Henderson
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2023-06-21 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland

This is for a plain indirect branch with no other side effects.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/sparc/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index bd3feed72e..9ed235d26d 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -5058,7 +5058,7 @@ static void disas_sparc_insn(DisasContext * dc, unsigned int insn)
                         gen_check_align(cpu_tmp0, 3);
                         gen_address_mask(dc, cpu_tmp0);
                         tcg_gen_mov_tl(cpu_npc, cpu_tmp0);
-                        dc->npc = DYNAMIC_PC;
+                        dc->npc = DYNAMIC_PC_LOOKUP;
                     }
                     goto jmp_insn;
 #if !defined(CONFIG_USER_ONLY) && !defined(TARGET_SPARC64)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 7/8] target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN
  2023-06-21 18:05 [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
                   ` (5 preceding siblings ...)
  2023-06-21 18:06 ` [PATCH v2 6/8] target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL Richard Henderson
@ 2023-06-21 18:06 ` Richard Henderson
  2023-06-27  9:10   ` Philippe Mathieu-Daudé
  2023-06-21 18:06 ` [PATCH v2 8/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI Richard Henderson
  2023-06-22 12:26 ` [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Mark Cave-Ayland
  8 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2023-06-21 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland

After the register window unwind, this is for a plain indirect
branch with no further side effects.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/sparc/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 9ed235d26d..ab7054d4eb 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -5029,7 +5029,7 @@ static void disas_sparc_insn(DisasContext * dc, unsigned int insn)
                 gen_mov_pc_npc(dc);
                 gen_check_align(cpu_tmp0, 3);
                 tcg_gen_mov_tl(cpu_npc, cpu_tmp0);
-                dc->npc = DYNAMIC_PC;
+                dc->npc = DYNAMIC_PC_LOOKUP;
                 goto jmp_insn;
 #endif
             } else {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 8/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI
  2023-06-21 18:05 [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
                   ` (6 preceding siblings ...)
  2023-06-21 18:06 ` [PATCH v2 7/8] target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN Richard Henderson
@ 2023-06-21 18:06 ` Richard Henderson
  2023-06-22  9:04   ` Philippe Mathieu-Daudé
  2023-06-22 12:26 ` [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Mark Cave-Ayland
  8 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2023-06-21 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland

We incorporate %asi into tb->flags so that we may generate
inline code for the many ASIs for which it is easy to do so.
Setting %asi is common for e.g. memcpy and memset performing
block copy and clear, so it is worth noticing this case.

We must end the TB but do not need to return to the main loop.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/sparc/translate.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index ab7054d4eb..fcc1054943 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -4147,10 +4147,14 @@ static void disas_sparc_insn(DisasContext * dc, unsigned int insn)
                                 tcg_gen_andi_tl(cpu_tmp0, cpu_tmp0, 0xff);
                                 tcg_gen_st32_tl(cpu_tmp0, cpu_env,
                                                 offsetof(CPUSPARCState, asi));
-                                /* End TB to notice changed ASI.  */
+                                /*
+                                 * End TB to notice changed ASI.
+                                 * TODO: Could notice src1 = %g0 and IS_IMM,
+                                 * update DisasContext and not exit the TB.
+                                 */
                                 save_state(dc);
                                 gen_op_next_insn();
-                                tcg_gen_exit_tb(NULL, 0);
+                                tcg_gen_lookup_and_goto_ptr();
                                 dc->base.is_jmp = DISAS_NORETURN;
                                 break;
                             case 0x6: /* V9 wrfprs */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb
  2023-06-21 18:06 ` [PATCH v2 1/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb Richard Henderson
@ 2023-06-22  9:03   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-22  9:03 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland

On 21/6/23 20:06, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/sparc/translate.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 8/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI
  2023-06-21 18:06 ` [PATCH v2 8/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI Richard Henderson
@ 2023-06-22  9:04   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-22  9:04 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland

On 21/6/23 20:06, Richard Henderson wrote:
> We incorporate %asi into tb->flags so that we may generate
> inline code for the many ASIs for which it is easy to do so.
> Setting %asi is common for e.g. memcpy and memset performing
> block copy and clear, so it is worth noticing this case.
> 
> We must end the TB but do not need to return to the main loop.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/sparc/translate.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr
  2023-06-21 18:05 [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
                   ` (7 preceding siblings ...)
  2023-06-21 18:06 ` [PATCH v2 8/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI Richard Henderson
@ 2023-06-22 12:26 ` Mark Cave-Ayland
  2023-06-27  6:46   ` Mark Cave-Ayland
  8 siblings, 1 reply; 24+ messages in thread
From: Mark Cave-Ayland @ 2023-06-22 12:26 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 21/06/2023 19:05, Richard Henderson wrote:

> Changes from v1:
>    * Split into teeny weeny pieces.
> 
>    * It turns out the sparc_tr_tb_stop hunk of v1 was buggy,
>      in that things that are not simple branches use DYNAMIC_PC,
>      e.g. the RETT (return from trap) instruction.
> 
>      Introduce DYNAMIC_PC_LOOKUP to distinguish the couple of
>      places where we have a dynamic pc, but no other change
>      of state (conditional branches, JMPL, RETURN).
> 
>    * Drop the change for WRFPRS, because it's too infrequent.
>      The WRASI change affects memcpy/memset, so that's more important.
> 
> Boots Mark's sol8 install cdrom.  :-)
> 
> Top of the profile changes from
> 
>      41.55%  qemu-system-sparc              [.] cpu_exec_loop
>      14.02%  qemu-system-sparc              [.] cpu_tb_exec
>       8.74%  qemu-system-sparc              [.] tb_lookup
>       2.11%  qemu-system-sparc              [.] tcg_splitwx_to_rw
>       1.63%  memfd:tcg-jit (deleted)        [.] 0x0000000000000004
> 
> to
> 
>      31.59%  qemu-system-sparc              [.] helper_lookup_tb_ptr
>      17.79%  qemu-system-sparc              [.] tb_lookup
>       5.38%  qemu-system-sparc              [.] compute_all_sub
>       2.38%  qemu-system-sparc              [.] helper_compute_psr
>       2.36%  qemu-system-sparc              [.] helper_check_align
>       1.79%  memfd:tcg-jit (deleted)        [.] 0x000000000063fc8e
> 
> This probably indicates that cpu_get_tb_cpu_state could be
> improved to not consume so much overhead.

Nice! I've just run this through all of my sun4m/sun4u/sun4v test images and I don't 
see any regressions with v2. The guests feel noticeably more responsive too :)

Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

I've skimmed the patches and without looking in too much detail they seem to be okay 
so I'm happy to give:

Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

Side note: the niagara tests require the patch at 
https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg03537.html which still 
hasn't been merged yet.

> Richard Henderson (8):
>    target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb
>    target/sparc: Fix npc comparison in sparc_tr_insn_start
>    target/sparc: Drop inline markers from translate.c
>    target/sparc: Introduce DYNAMIC_PC_LOOKUP
>    target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches
>    target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL
>    target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN
>    target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI
> 
>   target/sparc/translate.c | 410 ++++++++++++++++++++++-----------------
>   1 file changed, 233 insertions(+), 177 deletions(-)


ATB,

Mark.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr
  2023-06-22 12:26 ` [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Mark Cave-Ayland
@ 2023-06-27  6:46   ` Mark Cave-Ayland
  2023-06-27  9:37     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Cave-Ayland @ 2023-06-27  6:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 22/06/2023 13:26, Mark Cave-Ayland wrote:

> On 21/06/2023 19:05, Richard Henderson wrote:
> 
>> Changes from v1:
>>    * Split into teeny weeny pieces.
>>
>>    * It turns out the sparc_tr_tb_stop hunk of v1 was buggy,
>>      in that things that are not simple branches use DYNAMIC_PC,
>>      e.g. the RETT (return from trap) instruction.
>>
>>      Introduce DYNAMIC_PC_LOOKUP to distinguish the couple of
>>      places where we have a dynamic pc, but no other change
>>      of state (conditional branches, JMPL, RETURN).
>>
>>    * Drop the change for WRFPRS, because it's too infrequent.
>>      The WRASI change affects memcpy/memset, so that's more important.
>>
>> Boots Mark's sol8 install cdrom.  :-)
>>
>> Top of the profile changes from
>>
>>      41.55%  qemu-system-sparc              [.] cpu_exec_loop
>>      14.02%  qemu-system-sparc              [.] cpu_tb_exec
>>       8.74%  qemu-system-sparc              [.] tb_lookup
>>       2.11%  qemu-system-sparc              [.] tcg_splitwx_to_rw
>>       1.63%  memfd:tcg-jit (deleted)        [.] 0x0000000000000004
>>
>> to
>>
>>      31.59%  qemu-system-sparc              [.] helper_lookup_tb_ptr
>>      17.79%  qemu-system-sparc              [.] tb_lookup
>>       5.38%  qemu-system-sparc              [.] compute_all_sub
>>       2.38%  qemu-system-sparc              [.] helper_compute_psr
>>       2.36%  qemu-system-sparc              [.] helper_check_align
>>       1.79%  memfd:tcg-jit (deleted)        [.] 0x000000000063fc8e
>>
>> This probably indicates that cpu_get_tb_cpu_state could be
>> improved to not consume so much overhead.
> 
> Nice! I've just run this through all of my sun4m/sun4u/sun4v test images and I don't 
> see any regressions with v2. The guests feel noticeably more responsive too :)
> 
> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> 
> I've skimmed the patches and without looking in too much detail they seem to be okay 
> so I'm happy to give:
> 
> Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> 
> Side note: the niagara tests require the patch at 
> https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg03537.html which still 
> hasn't been merged yet.
> 
>> Richard Henderson (8):
>>    target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb
>>    target/sparc: Fix npc comparison in sparc_tr_insn_start
>>    target/sparc: Drop inline markers from translate.c
>>    target/sparc: Introduce DYNAMIC_PC_LOOKUP
>>    target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches
>>    target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL
>>    target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN
>>    target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI
>>
>>   target/sparc/translate.c | 410 ++++++++++++++++++++++-----------------
>>   1 file changed, 233 insertions(+), 177 deletions(-)

I've just noticed during testing there is an issue with this series when used with a 
real SS-5 PROM image (I was using OpenBIOS for my previous tests) which causes it to 
assert() almost immediately on startup:

$ ./qemu-system-sparc -bios ss5.bin
ERROR:../target/sparc/translate.c:5695:sparc_tr_tb_stop: code should not be reached
Bail out! ERROR:../target/sparc/translate.c:5695:sparc_tr_tb_stop: code should not be 
reached
Aborted


ATB,

Mark.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/8] target/sparc: Drop inline markers from translate.c
  2023-06-21 18:06 ` [PATCH v2 3/8] target/sparc: Drop inline markers from translate.c Richard Henderson
@ 2023-06-27  8:58   ` Philippe Mathieu-Daudé
  2023-06-28  7:05     ` Richard Henderson
  0 siblings, 1 reply; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-27  8:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland

On 21/6/23 20:06, Richard Henderson wrote:
> Let the compiler decide about inlining.

Should we clean that automatically with scripts and forbid via
checkpatch?

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/sparc/translate.c | 237 ++++++++++++++++++---------------------
>   1 file changed, 111 insertions(+), 126 deletions(-)




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/8] target/sparc: Introduce DYNAMIC_PC_LOOKUP
  2023-06-21 18:06 ` [PATCH v2 4/8] target/sparc: Introduce DYNAMIC_PC_LOOKUP Richard Henderson
@ 2023-06-27  9:08   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-27  9:08 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland

On 21/6/23 20:06, Richard Henderson wrote:
> Create a new artificial "next pc" which also indicates
> that nothing has changed within the cpu state which
> requires returning to the main loop.
> 
> Pipe this new value though all pc/npc checks.
> Do not produce this new value yet.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/sparc/translate.c | 155 ++++++++++++++++++++++++++++-----------
>   1 file changed, 111 insertions(+), 44 deletions(-)


> @@ -5608,20 +5649,46 @@ static void sparc_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
>   static void sparc_tr_tb_stop(DisasContextBase *dcbase, CPUState *cs)
>   {
>       DisasContext *dc = container_of(dcbase, DisasContext, base);
> +    bool may_lookup;
>   
>       switch (dc->base.is_jmp) {
>       case DISAS_NEXT:
>       case DISAS_TOO_MANY:
> -        if (dc->pc != DYNAMIC_PC &&
> -            (dc->npc != DYNAMIC_PC && dc->npc != JUMP_PC)) {
> +        if (((dc->pc | dc->npc) & 3) == 0) {
>               /* static PC and NPC: we can use direct chaining */
>               gen_goto_tb(dc, 0, dc->pc, dc->npc);
> -        } else {
> -            if (dc->pc != DYNAMIC_PC) {
> -                tcg_gen_movi_tl(cpu_pc, dc->pc);
> +            break;
> +        }
> +
> +        if (dc->pc & 3) {
> +            switch (dc->pc) {
> +            case DYNAMIC_PC_LOOKUP:
> +                may_lookup = true;
> +                break;
> +            case DYNAMIC_PC:
> +                may_lookup = false;
> +                break;
> +            default:
> +                g_assert_not_reached();
>               }
> -            save_npc(dc);
> +        } else {
> +            tcg_gen_movi_tl(cpu_pc, dc->pc);
> +            may_lookup = true;
> +        }
> +
> +        save_npc(dc);
> +        switch (dc->npc) {

switch (dc->npc & 3)?

> +        case DYNAMIC_PC_LOOKUP:
> +            if (may_lookup) {
> +                tcg_gen_lookup_and_goto_ptr();
> +                break;
> +            }
> +            /* fall through */
> +        case DYNAMIC_PC:
>               tcg_gen_exit_tb(NULL, 0);
> +            break;
> +        default:
> +            g_assert_not_reached();
>           }
>           break;
>   



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/8] target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches
  2023-06-21 18:06 ` [PATCH v2 5/8] target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches Richard Henderson
@ 2023-06-27  9:09   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-27  9:09 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland

On 21/6/23 20:06, Richard Henderson wrote:
> When resolving JUMP_PC, we know this is for a plain branch
> with no other side effects.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/sparc/translate.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 7/8] target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN
  2023-06-21 18:06 ` [PATCH v2 7/8] target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN Richard Henderson
@ 2023-06-27  9:10   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-27  9:10 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland

On 21/6/23 20:06, Richard Henderson wrote:
> After the register window unwind, this is for a plain indirect
> branch with no further side effects.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/sparc/translate.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/8] target/sparc: Fix npc comparison in sparc_tr_insn_start
  2023-06-21 18:06 ` [PATCH v2 2/8] target/sparc: Fix npc comparison in sparc_tr_insn_start Richard Henderson
@ 2023-06-27  9:11   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-27  9:11 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland

On 21/6/23 20:06, Richard Henderson wrote:
> During translation, npc == address, DYNAMIC_PC, or JUMP_PC.
> It is only the encoding between here and sparc_restore_state_to_opc
> that considers JUMP_PC to be a bit within a larger value.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/sparc/translate.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/8] target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL
  2023-06-21 18:06 ` [PATCH v2 6/8] target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL Richard Henderson
@ 2023-06-27  9:12   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-27  9:12 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland

On 21/6/23 20:06, Richard Henderson wrote:
> This is for a plain indirect branch with no other side effects.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/sparc/translate.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr
  2023-06-27  6:46   ` Mark Cave-Ayland
@ 2023-06-27  9:37     ` Philippe Mathieu-Daudé
  2023-06-27 10:12       ` Richard Henderson
  2023-06-27 11:19       ` Mark Cave-Ayland
  0 siblings, 2 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-27  9:37 UTC (permalink / raw)
  To: Mark Cave-Ayland, Richard Henderson, qemu-devel

On 27/6/23 08:46, Mark Cave-Ayland wrote:
> On 22/06/2023 13:26, Mark Cave-Ayland wrote:
> 
>> On 21/06/2023 19:05, Richard Henderson wrote:
>>
>>> Changes from v1:
>>>    * Split into teeny weeny pieces.
>>>
>>>    * It turns out the sparc_tr_tb_stop hunk of v1 was buggy,
>>>      in that things that are not simple branches use DYNAMIC_PC,
>>>      e.g. the RETT (return from trap) instruction.
>>>
>>>      Introduce DYNAMIC_PC_LOOKUP to distinguish the couple of
>>>      places where we have a dynamic pc, but no other change
>>>      of state (conditional branches, JMPL, RETURN).
>>>
>>>    * Drop the change for WRFPRS, because it's too infrequent.
>>>      The WRASI change affects memcpy/memset, so that's more important.
>>>
>>> Boots Mark's sol8 install cdrom.  :-)
>>>
>>> Top of the profile changes from
>>>
>>>      41.55%  qemu-system-sparc              [.] cpu_exec_loop
>>>      14.02%  qemu-system-sparc              [.] cpu_tb_exec
>>>       8.74%  qemu-system-sparc              [.] tb_lookup
>>>       2.11%  qemu-system-sparc              [.] tcg_splitwx_to_rw
>>>       1.63%  memfd:tcg-jit (deleted)        [.] 0x0000000000000004
>>>
>>> to
>>>
>>>      31.59%  qemu-system-sparc              [.] helper_lookup_tb_ptr
>>>      17.79%  qemu-system-sparc              [.] tb_lookup
>>>       5.38%  qemu-system-sparc              [.] compute_all_sub
>>>       2.38%  qemu-system-sparc              [.] helper_compute_psr
>>>       2.36%  qemu-system-sparc              [.] helper_check_align
>>>       1.79%  memfd:tcg-jit (deleted)        [.] 0x000000000063fc8e
>>>
>>> This probably indicates that cpu_get_tb_cpu_state could be
>>> improved to not consume so much overhead.
>>
>> Nice! I've just run this through all of my sun4m/sun4u/sun4v test 
>> images and I don't see any regressions with v2. The guests feel 
>> noticeably more responsive too :)
>>
>> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>
>> I've skimmed the patches and without looking in too much detail they 
>> seem to be okay so I'm happy to give:
>>
>> Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>
>> Side note: the niagara tests require the patch at 
>> https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg03537.html 
>> which still hasn't been merged yet.
>>
>>> Richard Henderson (8):
>>>    target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb
>>>    target/sparc: Fix npc comparison in sparc_tr_insn_start
>>>    target/sparc: Drop inline markers from translate.c
>>>    target/sparc: Introduce DYNAMIC_PC_LOOKUP
>>>    target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches
>>>    target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL
>>>    target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN
>>>    target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI
>>>
>>>   target/sparc/translate.c | 410 ++++++++++++++++++++++-----------------
>>>   1 file changed, 233 insertions(+), 177 deletions(-)
> 
> I've just noticed during testing there is an issue with this series when 
> used with a real SS-5 PROM image (I was using OpenBIOS for my previous 
> tests) which causes it to assert() almost immediately on startup:
> 
> $ ./qemu-system-sparc -bios ss5.bin
> ERROR:../target/sparc/translate.c:5695:sparc_tr_tb_stop: code should not 
> be reached
> Bail out! ERROR:../target/sparc/translate.c:5695:sparc_tr_tb_stop: code 
> should not be reached
> Aborted

Could you try this fix:

-- >8 --
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -5682,5 +5682,5 @@ static void sparc_tr_tb_stop(DisasContextBase 
*dcbase, CPUState *cs)

          save_npc(dc);
-        switch (dc->npc) {
+        switch (dc->npc & 3) {
          case DYNAMIC_PC_LOOKUP:
              if (may_lookup) {
---

?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr
  2023-06-27  9:37     ` Philippe Mathieu-Daudé
@ 2023-06-27 10:12       ` Richard Henderson
  2023-06-27 11:19       ` Mark Cave-Ayland
  1 sibling, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2023-06-27 10:12 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Mark Cave-Ayland, qemu-devel

On 6/27/23 11:37, Philippe Mathieu-Daudé wrote:
> On 27/6/23 08:46, Mark Cave-Ayland wrote:
>> On 22/06/2023 13:26, Mark Cave-Ayland wrote:
>>
>>> On 21/06/2023 19:05, Richard Henderson wrote:
>>>
>>>> Changes from v1:
>>>>    * Split into teeny weeny pieces.
>>>>
>>>>    * It turns out the sparc_tr_tb_stop hunk of v1 was buggy,
>>>>      in that things that are not simple branches use DYNAMIC_PC,
>>>>      e.g. the RETT (return from trap) instruction.
>>>>
>>>>      Introduce DYNAMIC_PC_LOOKUP to distinguish the couple of
>>>>      places where we have a dynamic pc, but no other change
>>>>      of state (conditional branches, JMPL, RETURN).
>>>>
>>>>    * Drop the change for WRFPRS, because it's too infrequent.
>>>>      The WRASI change affects memcpy/memset, so that's more important.
>>>>
>>>> Boots Mark's sol8 install cdrom.  :-)
>>>>
>>>> Top of the profile changes from
>>>>
>>>>      41.55%  qemu-system-sparc              [.] cpu_exec_loop
>>>>      14.02%  qemu-system-sparc              [.] cpu_tb_exec
>>>>       8.74%  qemu-system-sparc              [.] tb_lookup
>>>>       2.11%  qemu-system-sparc              [.] tcg_splitwx_to_rw
>>>>       1.63%  memfd:tcg-jit (deleted)        [.] 0x0000000000000004
>>>>
>>>> to
>>>>
>>>>      31.59%  qemu-system-sparc              [.] helper_lookup_tb_ptr
>>>>      17.79%  qemu-system-sparc              [.] tb_lookup
>>>>       5.38%  qemu-system-sparc              [.] compute_all_sub
>>>>       2.38%  qemu-system-sparc              [.] helper_compute_psr
>>>>       2.36%  qemu-system-sparc              [.] helper_check_align
>>>>       1.79%  memfd:tcg-jit (deleted)        [.] 0x000000000063fc8e
>>>>
>>>> This probably indicates that cpu_get_tb_cpu_state could be
>>>> improved to not consume so much overhead.
>>>
>>> Nice! I've just run this through all of my sun4m/sun4u/sun4v test images and I don't 
>>> see any regressions with v2. The guests feel noticeably more responsive too :)
>>>
>>> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>>
>>> I've skimmed the patches and without looking in too much detail they seem to be okay so 
>>> I'm happy to give:
>>>
>>> Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>>
>>> Side note: the niagara tests require the patch at 
>>> https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg03537.html which still hasn't 
>>> been merged yet.
>>>
>>>> Richard Henderson (8):
>>>>    target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb
>>>>    target/sparc: Fix npc comparison in sparc_tr_insn_start
>>>>    target/sparc: Drop inline markers from translate.c
>>>>    target/sparc: Introduce DYNAMIC_PC_LOOKUP
>>>>    target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches
>>>>    target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL
>>>>    target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN
>>>>    target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI
>>>>
>>>>   target/sparc/translate.c | 410 ++++++++++++++++++++++-----------------
>>>>   1 file changed, 233 insertions(+), 177 deletions(-)
>>
>> I've just noticed during testing there is an issue with this series when used with a 
>> real SS-5 PROM image (I was using OpenBIOS for my previous tests) which causes it to 
>> assert() almost immediately on startup:
>>
>> $ ./qemu-system-sparc -bios ss5.bin
>> ERROR:../target/sparc/translate.c:5695:sparc_tr_tb_stop: code should not be reached
>> Bail out! ERROR:../target/sparc/translate.c:5695:sparc_tr_tb_stop: code should not be 
>> reached
>> Aborted
> 
> Could you try this fix:
> 
> -- >8 --
> --- a/target/sparc/translate.c
> +++ b/target/sparc/translate.c
> @@ -5682,5 +5682,5 @@ static void sparc_tr_tb_stop(DisasContextBase *dcbase, CPUState *cs)
> 
>           save_npc(dc);
> -        switch (dc->npc) {
> +        switch (dc->npc & 3) {
>           case DYNAMIC_PC_LOOKUP:
>               if (may_lookup) {
> ---
> 
> ?

No, we should have exact equality on those two values.


r~


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr
  2023-06-27  9:37     ` Philippe Mathieu-Daudé
  2023-06-27 10:12       ` Richard Henderson
@ 2023-06-27 11:19       ` Mark Cave-Ayland
  2023-06-27 11:57         ` Richard Henderson
  1 sibling, 1 reply; 24+ messages in thread
From: Mark Cave-Ayland @ 2023-06-27 11:19 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Richard Henderson, qemu-devel

On 27/06/2023 10:37, Philippe Mathieu-Daudé wrote:

> On 27/6/23 08:46, Mark Cave-Ayland wrote:
>> On 22/06/2023 13:26, Mark Cave-Ayland wrote:
>>
>>> On 21/06/2023 19:05, Richard Henderson wrote:
>>>
>>>> Changes from v1:
>>>>    * Split into teeny weeny pieces.
>>>>
>>>>    * It turns out the sparc_tr_tb_stop hunk of v1 was buggy,
>>>>      in that things that are not simple branches use DYNAMIC_PC,
>>>>      e.g. the RETT (return from trap) instruction.
>>>>
>>>>      Introduce DYNAMIC_PC_LOOKUP to distinguish the couple of
>>>>      places where we have a dynamic pc, but no other change
>>>>      of state (conditional branches, JMPL, RETURN).
>>>>
>>>>    * Drop the change for WRFPRS, because it's too infrequent.
>>>>      The WRASI change affects memcpy/memset, so that's more important.
>>>>
>>>> Boots Mark's sol8 install cdrom.  :-)
>>>>
>>>> Top of the profile changes from
>>>>
>>>>      41.55%  qemu-system-sparc              [.] cpu_exec_loop
>>>>      14.02%  qemu-system-sparc              [.] cpu_tb_exec
>>>>       8.74%  qemu-system-sparc              [.] tb_lookup
>>>>       2.11%  qemu-system-sparc              [.] tcg_splitwx_to_rw
>>>>       1.63%  memfd:tcg-jit (deleted)        [.] 0x0000000000000004
>>>>
>>>> to
>>>>
>>>>      31.59%  qemu-system-sparc              [.] helper_lookup_tb_ptr
>>>>      17.79%  qemu-system-sparc              [.] tb_lookup
>>>>       5.38%  qemu-system-sparc              [.] compute_all_sub
>>>>       2.38%  qemu-system-sparc              [.] helper_compute_psr
>>>>       2.36%  qemu-system-sparc              [.] helper_check_align
>>>>       1.79%  memfd:tcg-jit (deleted)        [.] 0x000000000063fc8e
>>>>
>>>> This probably indicates that cpu_get_tb_cpu_state could be
>>>> improved to not consume so much overhead.
>>>
>>> Nice! I've just run this through all of my sun4m/sun4u/sun4v test images and I 
>>> don't see any regressions with v2. The guests feel noticeably more responsive too :)
>>>
>>> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>>
>>> I've skimmed the patches and without looking in too much detail they seem to be 
>>> okay so I'm happy to give:
>>>
>>> Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>>
>>> Side note: the niagara tests require the patch at 
>>> https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg03537.html which still 
>>> hasn't been merged yet.
>>>
>>>> Richard Henderson (8):
>>>>    target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb
>>>>    target/sparc: Fix npc comparison in sparc_tr_insn_start
>>>>    target/sparc: Drop inline markers from translate.c
>>>>    target/sparc: Introduce DYNAMIC_PC_LOOKUP
>>>>    target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches
>>>>    target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL
>>>>    target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN
>>>>    target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI
>>>>
>>>>   target/sparc/translate.c | 410 ++++++++++++++++++++++-----------------
>>>>   1 file changed, 233 insertions(+), 177 deletions(-)
>>
>> I've just noticed during testing there is an issue with this series when used with 
>> a real SS-5 PROM image (I was using OpenBIOS for my previous tests) which causes it 
>> to assert() almost immediately on startup:
>>
>> $ ./qemu-system-sparc -bios ss5.bin
>> ERROR:../target/sparc/translate.c:5695:sparc_tr_tb_stop: code should not be reached
>> Bail out! ERROR:../target/sparc/translate.c:5695:sparc_tr_tb_stop: code should not 
>> be reached
>> Aborted
> 
> Could you try this fix:
> 
> -- >8 --
> --- a/target/sparc/translate.c
> +++ b/target/sparc/translate.c
> @@ -5682,5 +5682,5 @@ static void sparc_tr_tb_stop(DisasContextBase *dcbase, CPUState 
> *cs)
> 
>           save_npc(dc);
> -        switch (dc->npc) {
> +        switch (dc->npc & 3) {
>           case DYNAMIC_PC_LOOKUP:
>               if (may_lookup) {
> ---
> 
> ?

Unfortunately that doesn't fix the issue. A quick lunchtime debugging session with 
printf() shows this just before the assert() fires:

### dc->pc: 0x3
### dc->npc: 0xffd26c70
**
ERROR:../target/sparc/translate.c:5699:sparc_tr_tb_stop: code should not be reached
Bail out! ERROR:../target/sparc/translate.c:5699:sparc_tr_tb_stop: code should not be 
reached
Aborted


ATB,

Mark.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr
  2023-06-27 11:19       ` Mark Cave-Ayland
@ 2023-06-27 11:57         ` Richard Henderson
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2023-06-27 11:57 UTC (permalink / raw)
  To: Mark Cave-Ayland, Philippe Mathieu-Daudé, qemu-devel

On 6/27/23 13:19, Mark Cave-Ayland wrote:
> On 27/06/2023 10:37, Philippe Mathieu-Daudé wrote:
> 
>> On 27/6/23 08:46, Mark Cave-Ayland wrote:
>>> On 22/06/2023 13:26, Mark Cave-Ayland wrote:
>>>
>>>> On 21/06/2023 19:05, Richard Henderson wrote:
>>>>
>>>>> Changes from v1:
>>>>>    * Split into teeny weeny pieces.
>>>>>
>>>>>    * It turns out the sparc_tr_tb_stop hunk of v1 was buggy,
>>>>>      in that things that are not simple branches use DYNAMIC_PC,
>>>>>      e.g. the RETT (return from trap) instruction.
>>>>>
>>>>>      Introduce DYNAMIC_PC_LOOKUP to distinguish the couple of
>>>>>      places where we have a dynamic pc, but no other change
>>>>>      of state (conditional branches, JMPL, RETURN).
>>>>>
>>>>>    * Drop the change for WRFPRS, because it's too infrequent.
>>>>>      The WRASI change affects memcpy/memset, so that's more important.
>>>>>
>>>>> Boots Mark's sol8 install cdrom.  :-)
>>>>>
>>>>> Top of the profile changes from
>>>>>
>>>>>      41.55%  qemu-system-sparc              [.] cpu_exec_loop
>>>>>      14.02%  qemu-system-sparc              [.] cpu_tb_exec
>>>>>       8.74%  qemu-system-sparc              [.] tb_lookup
>>>>>       2.11%  qemu-system-sparc              [.] tcg_splitwx_to_rw
>>>>>       1.63%  memfd:tcg-jit (deleted)        [.] 0x0000000000000004
>>>>>
>>>>> to
>>>>>
>>>>>      31.59%  qemu-system-sparc              [.] helper_lookup_tb_ptr
>>>>>      17.79%  qemu-system-sparc              [.] tb_lookup
>>>>>       5.38%  qemu-system-sparc              [.] compute_all_sub
>>>>>       2.38%  qemu-system-sparc              [.] helper_compute_psr
>>>>>       2.36%  qemu-system-sparc              [.] helper_check_align
>>>>>       1.79%  memfd:tcg-jit (deleted)        [.] 0x000000000063fc8e
>>>>>
>>>>> This probably indicates that cpu_get_tb_cpu_state could be
>>>>> improved to not consume so much overhead.
>>>>
>>>> Nice! I've just run this through all of my sun4m/sun4u/sun4v test images and I don't 
>>>> see any regressions with v2. The guests feel noticeably more responsive too :)
>>>>
>>>> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>>>
>>>> I've skimmed the patches and without looking in too much detail they seem to be okay 
>>>> so I'm happy to give:
>>>>
>>>> Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>>>
>>>> Side note: the niagara tests require the patch at 
>>>> https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg03537.html which still hasn't 
>>>> been merged yet.
>>>>
>>>>> Richard Henderson (8):
>>>>>    target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb
>>>>>    target/sparc: Fix npc comparison in sparc_tr_insn_start
>>>>>    target/sparc: Drop inline markers from translate.c
>>>>>    target/sparc: Introduce DYNAMIC_PC_LOOKUP
>>>>>    target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches
>>>>>    target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL
>>>>>    target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN
>>>>>    target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI
>>>>>
>>>>>   target/sparc/translate.c | 410 ++++++++++++++++++++++-----------------
>>>>>   1 file changed, 233 insertions(+), 177 deletions(-)
>>>
>>> I've just noticed during testing there is an issue with this series when used with a 
>>> real SS-5 PROM image (I was using OpenBIOS for my previous tests) which causes it to 
>>> assert() almost immediately on startup:
>>>
>>> $ ./qemu-system-sparc -bios ss5.bin
>>> ERROR:../target/sparc/translate.c:5695:sparc_tr_tb_stop: code should not be reached
>>> Bail out! ERROR:../target/sparc/translate.c:5695:sparc_tr_tb_stop: code should not be 
>>> reached
>>> Aborted
>>
>> Could you try this fix:
>>
>> -- >8 --
>> --- a/target/sparc/translate.c
>> +++ b/target/sparc/translate.c
>> @@ -5682,5 +5682,5 @@ static void sparc_tr_tb_stop(DisasContextBase *dcbase, CPUState *cs)
>>
>>           save_npc(dc);
>> -        switch (dc->npc) {
>> +        switch (dc->npc & 3) {
>>           case DYNAMIC_PC_LOOKUP:
>>               if (may_lookup) {
>> ---
>>
>> ?
> 
> Unfortunately that doesn't fix the issue. A quick lunchtime debugging session with 
> printf() shows this just before the assert() fires:
> 
> ### dc->pc: 0x3
> ### dc->npc: 0xffd26c70
> **
> ERROR:../target/sparc/translate.c:5699:sparc_tr_tb_stop: code should not be reached
> Bail out! ERROR:../target/sparc/translate.c:5699:sparc_tr_tb_stop: code should not be reached
> Aborted

That makes no sense -- dynamic lookup pc with static npc?

Of course this happens before in_asm dump, so can you use -singlestep to figure out what 
pc, and thence instruction for which this is happening?


r~



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/8] target/sparc: Drop inline markers from translate.c
  2023-06-27  8:58   ` Philippe Mathieu-Daudé
@ 2023-06-28  7:05     ` Richard Henderson
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2023-06-28  7:05 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel; +Cc: mark.cave-ayland

On 6/27/23 10:58, Philippe Mathieu-Daudé wrote:
> On 21/6/23 20:06, Richard Henderson wrote:
>> Let the compiler decide about inlining.
> 
> Should we clean that automatically with scripts and forbid via
> checkpatch?

I'm sure there are some places that benefit from inline within C files.
And of course header files need them for unused function elimination.


r~



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-06-28  7:06 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-21 18:05 [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
2023-06-21 18:06 ` [PATCH v2 1/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb Richard Henderson
2023-06-22  9:03   ` Philippe Mathieu-Daudé
2023-06-21 18:06 ` [PATCH v2 2/8] target/sparc: Fix npc comparison in sparc_tr_insn_start Richard Henderson
2023-06-27  9:11   ` Philippe Mathieu-Daudé
2023-06-21 18:06 ` [PATCH v2 3/8] target/sparc: Drop inline markers from translate.c Richard Henderson
2023-06-27  8:58   ` Philippe Mathieu-Daudé
2023-06-28  7:05     ` Richard Henderson
2023-06-21 18:06 ` [PATCH v2 4/8] target/sparc: Introduce DYNAMIC_PC_LOOKUP Richard Henderson
2023-06-27  9:08   ` Philippe Mathieu-Daudé
2023-06-21 18:06 ` [PATCH v2 5/8] target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches Richard Henderson
2023-06-27  9:09   ` Philippe Mathieu-Daudé
2023-06-21 18:06 ` [PATCH v2 6/8] target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL Richard Henderson
2023-06-27  9:12   ` Philippe Mathieu-Daudé
2023-06-21 18:06 ` [PATCH v2 7/8] target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN Richard Henderson
2023-06-27  9:10   ` Philippe Mathieu-Daudé
2023-06-21 18:06 ` [PATCH v2 8/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI Richard Henderson
2023-06-22  9:04   ` Philippe Mathieu-Daudé
2023-06-22 12:26 ` [PATCH v2 0/8] target/sparc: Use tcg_gen_lookup_and_goto_ptr Mark Cave-Ayland
2023-06-27  6:46   ` Mark Cave-Ayland
2023-06-27  9:37     ` Philippe Mathieu-Daudé
2023-06-27 10:12       ` Richard Henderson
2023-06-27 11:19       ` Mark Cave-Ayland
2023-06-27 11:57         ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).