qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] accel/tcg: Always require can_do_io (#1866)
@ 2023-09-14 17:44 Richard Henderson
  2023-09-14 17:44 ` [PATCH 1/6] accel/tcg: Avoid load of icount_decr if unused Richard Henderson
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Richard Henderson @ 2023-09-14 17:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: philmd

The problem exposed by the fix for #1826 (et al) is that the TB that
contains the i/o instruction that alters the address space continues
on to issue other i/o instructions.

Since #1826 deferred the update to the address space, these subsequent
i/o instructions do not reference the correct address space, and things
go awry from there.

Ideally we would treat changes to the address space as specially, but
that knowledge is buried quite far down in the device models.  We don't
find out that such a change is coming until quite late such that we
cannot undo all of the other side effects and start over.

The only alternative would seem to be to treat all i/o pesimistically
and end the TB after any i/o, exactly as we do for icount.

I'm not pleased about this, because an eyeball of avocado times
suggests a slowdown.  No doubt caused by most i/o having to go
through cpu_io_recompile.

I begin to wonder if #1826 should be solved differently, like *not*
caching MemoryRegionSections within the cpu tlb, and looking up the
physical address within the address space during the i/o itself.
But even that seems like it would work only for the more common case
where the address space reorg only changes devices.  For the odd case
where an address space reorg changes RAM, we still have the result
of the phys addr lookup cached via the adjustment to host memory.

So this seems like the most reasonable solution.

Follow-up patches could optimize setting of can_do_io, and replace
previous TranslationBlocks so that we only go through cpu_io_recompile
once for a particular bit of guest code, rather than every single time.


r~


Richard Henderson (6):
  accel/tcg: Avoid load of icount_decr if unused
  accel/tcg: Hoist CF_MEMI_ONLY check outside translation loop
  accel/tcg: Track current value of can_do_io in the TB
  accel/tcg: Improve setting of can_do_io at start of TB
  accel/tcg: Always set CF_LAST_IO with CF_NOIRQ
  accel/tcg: Always require can_do_io

 include/exec/translator.h   |  2 ++
 accel/tcg/cpu-exec.c        |  2 +-
 accel/tcg/tb-maint.c        |  6 ++--
 accel/tcg/translator.c      | 72 ++++++++++++++++++-------------------
 target/mips/tcg/translate.c |  1 -
 5 files changed, 41 insertions(+), 42 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] accel/tcg: Avoid load of icount_decr if unused
  2023-09-14 17:44 [PATCH 0/6] accel/tcg: Always require can_do_io (#1866) Richard Henderson
@ 2023-09-14 17:44 ` Richard Henderson
  2023-09-15  9:23   ` Philippe Mathieu-Daudé
  2023-09-14 17:44 ` [PATCH 2/6] accel/tcg: Hoist CF_MEMI_ONLY check outside translation loop Richard Henderson
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2023-09-14 17:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: philmd

With CF_NOIRQ and without !CF_USE_ICOUNT, the load isn't used.
Avoid emitting it.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translator.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 1a6a5448c8..a3983019a5 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -49,12 +49,15 @@ bool translator_io_start(DisasContextBase *db)
 
 static TCGOp *gen_tb_start(uint32_t cflags)
 {
-    TCGv_i32 count = tcg_temp_new_i32();
+    TCGv_i32 count = NULL;
     TCGOp *icount_start_insn = NULL;
 
-    tcg_gen_ld_i32(count, cpu_env,
-                   offsetof(ArchCPU, neg.icount_decr.u32) -
-                   offsetof(ArchCPU, env));
+    if ((cflags & CF_USE_ICOUNT) || !(cflags & CF_NOIRQ)) {
+        count = tcg_temp_new_i32();
+        tcg_gen_ld_i32(count, cpu_env,
+                       offsetof(ArchCPU, neg.icount_decr.u32) -
+                       offsetof(ArchCPU, env));
+    }
 
     if (cflags & CF_USE_ICOUNT) {
         /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] accel/tcg: Hoist CF_MEMI_ONLY check outside translation loop
  2023-09-14 17:44 [PATCH 0/6] accel/tcg: Always require can_do_io (#1866) Richard Henderson
  2023-09-14 17:44 ` [PATCH 1/6] accel/tcg: Avoid load of icount_decr if unused Richard Henderson
@ 2023-09-14 17:44 ` Richard Henderson
  2023-09-15  9:26   ` Philippe Mathieu-Daudé
  2023-09-14 17:44 ` [PATCH 3/6] accel/tcg: Track current value of can_do_io in the TB Richard Henderson
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2023-09-14 17:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: philmd

The condition checked is loop invariant; check it only once.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translator.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index a3983019a5..b6ab9f3d33 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -158,7 +158,13 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
     ops->tb_start(db, cpu);
     tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
 
-    plugin_enabled = plugin_gen_tb_start(cpu, db, cflags & CF_MEMI_ONLY);
+    if (cflags & CF_MEMI_ONLY) {
+        /* We should only see CF_MEMI_ONLY for io_recompile. */
+        assert(cflags & CF_LAST_IO);
+        plugin_enabled = plugin_gen_tb_start(cpu, db, true);
+    } else {
+        plugin_enabled = plugin_gen_tb_start(cpu, db, false);
+    }
 
     while (true) {
         *max_insns = ++db->num_insns;
@@ -176,12 +182,8 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
         if (db->num_insns == db->max_insns && (cflags & CF_LAST_IO)) {
             /* Accept I/O on the last instruction.  */
             gen_io_start();
-            ops->translate_insn(db, cpu);
-        } else {
-            /* we should only see CF_MEMI_ONLY for io_recompile */
-            tcg_debug_assert(!(cflags & CF_MEMI_ONLY));
-            ops->translate_insn(db, cpu);
         }
+        ops->translate_insn(db, cpu);
 
         /*
          * We can't instrument after instructions that change control
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] accel/tcg: Track current value of can_do_io in the TB
  2023-09-14 17:44 [PATCH 0/6] accel/tcg: Always require can_do_io (#1866) Richard Henderson
  2023-09-14 17:44 ` [PATCH 1/6] accel/tcg: Avoid load of icount_decr if unused Richard Henderson
  2023-09-14 17:44 ` [PATCH 2/6] accel/tcg: Hoist CF_MEMI_ONLY check outside translation loop Richard Henderson
@ 2023-09-14 17:44 ` Richard Henderson
  2023-09-15  9:41   ` [PATCH 3/6: 1/2] accel/tcg: Refactor gen_io_start() as set_can_do_io() Philippe Mathieu-Daudé
  2023-09-14 17:44 ` [PATCH 4/6] accel/tcg: Improve setting of can_do_io at start of TB Richard Henderson
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2023-09-14 17:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: philmd

Simplify translator_io_start by recording the current
known value of can_do_io within DisasContextBase.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/translator.h |  2 ++
 accel/tcg/translator.c    | 31 ++++++++++++++-----------------
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index 4e17c4f401..9d9e980819 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -72,6 +72,7 @@ typedef enum DisasJumpType {
  * @num_insns: Number of translated instructions (including current).
  * @max_insns: Maximum number of instructions to be translated in this TB.
  * @singlestep_enabled: "Hardware" single stepping enabled.
+ * @saved_can_do_io: Known value of cpu->neg.can_do_io, or -1 for unknown.
  *
  * Architecture-agnostic disassembly context.
  */
@@ -83,6 +84,7 @@ typedef struct DisasContextBase {
     int num_insns;
     int max_insns;
     bool singlestep_enabled;
+    int8_t saved_can_do_io;
     void *host_addr[2];
 } DisasContextBase;
 
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index b6ab9f3d33..850d82e26f 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -16,11 +16,14 @@
 #include "tcg/tcg-op-common.h"
 #include "internal.h"
 
-static void gen_io_start(void)
+static void set_can_do_io(DisasContextBase *db, bool val)
 {
-    tcg_gen_st_i32(tcg_constant_i32(1), cpu_env,
-                   offsetof(ArchCPU, parent_obj.can_do_io) -
-                   offsetof(ArchCPU, env));
+    if (db->saved_can_do_io != val) {
+        db->saved_can_do_io = val;
+        tcg_gen_st_i32(tcg_constant_i32(val), cpu_env,
+                       offsetof(ArchCPU, parent_obj.can_do_io) -
+                       offsetof(ArchCPU, env));
+    }
 }
 
 bool translator_io_start(DisasContextBase *db)
@@ -30,12 +33,8 @@ bool translator_io_start(DisasContextBase *db)
     if (!(cflags & CF_USE_ICOUNT)) {
         return false;
     }
-    if (db->num_insns == db->max_insns && (cflags & CF_LAST_IO)) {
-        /* Already started in translator_loop. */
-        return true;
-    }
 
-    gen_io_start();
+    set_can_do_io(db, true);
 
     /*
      * Ensure that this instruction will be the last in the TB.
@@ -47,7 +46,7 @@ bool translator_io_start(DisasContextBase *db)
     return true;
 }
 
-static TCGOp *gen_tb_start(uint32_t cflags)
+static TCGOp *gen_tb_start(DisasContextBase *db, uint32_t cflags)
 {
     TCGv_i32 count = NULL;
     TCGOp *icount_start_insn = NULL;
@@ -91,12 +90,9 @@ static TCGOp *gen_tb_start(uint32_t cflags)
          * cpu->can_do_io is cleared automatically here at the beginning of
          * each translation block.  The cost is minimal and only paid for
          * -icount, plus it would be very easy to forget doing it in the
-         * translator. Doing it here means we don't need a gen_io_end() to
-         * go with gen_io_start().
+         * translator.
          */
-        tcg_gen_st_i32(tcg_constant_i32(0), cpu_env,
-                       offsetof(ArchCPU, parent_obj.can_do_io) -
-                       offsetof(ArchCPU, env));
+        set_can_do_io(db, false);
     }
 
     return icount_start_insn;
@@ -147,6 +143,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
     db->num_insns = 0;
     db->max_insns = *max_insns;
     db->singlestep_enabled = cflags & CF_SINGLE_STEP;
+    db->saved_can_do_io = -1;
     db->host_addr[0] = host_pc;
     db->host_addr[1] = NULL;
 
@@ -154,7 +151,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
     tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
 
     /* Start translating.  */
-    icount_start_insn = gen_tb_start(cflags);
+    icount_start_insn = gen_tb_start(db, cflags);
     ops->tb_start(db, cpu);
     tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
 
@@ -181,7 +178,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
            the next instruction.  */
         if (db->num_insns == db->max_insns && (cflags & CF_LAST_IO)) {
             /* Accept I/O on the last instruction.  */
-            gen_io_start();
+            set_can_do_io(db, true);
         }
         ops->translate_insn(db, cpu);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] accel/tcg: Improve setting of can_do_io at start of TB
  2023-09-14 17:44 [PATCH 0/6] accel/tcg: Always require can_do_io (#1866) Richard Henderson
                   ` (2 preceding siblings ...)
  2023-09-14 17:44 ` [PATCH 3/6] accel/tcg: Track current value of can_do_io in the TB Richard Henderson
@ 2023-09-14 17:44 ` Richard Henderson
  2023-09-15  9:47   ` Philippe Mathieu-Daudé
  2023-09-14 17:44 ` [PATCH 5/6] accel/tcg: Always set CF_LAST_IO with CF_NOIRQ Richard Henderson
  2023-09-14 17:44 ` [PATCH 6/6] accel/tcg: Always require can_do_io Richard Henderson
  5 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2023-09-14 17:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: philmd

Initialize can_do_io to true if this the TB has CF_LAST_IO
and will consist of a single instruction.  This avoids a
set to 0 followed immediately by a set to 1.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translator.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 850d82e26f..dd507cd471 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -87,12 +87,12 @@ static TCGOp *gen_tb_start(DisasContextBase *db, uint32_t cflags)
                          offsetof(ArchCPU, neg.icount_decr.u16.low) -
                          offsetof(ArchCPU, env));
         /*
-         * cpu->can_do_io is cleared automatically here at the beginning of
+         * cpu->can_do_io is set automatically here at the beginning of
          * each translation block.  The cost is minimal and only paid for
          * -icount, plus it would be very easy to forget doing it in the
          * translator.
          */
-        set_can_do_io(db, false);
+        set_can_do_io(db, db->max_insns == 1 && (cflags & CF_LAST_IO));
     }
 
     return icount_start_insn;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] accel/tcg: Always set CF_LAST_IO with CF_NOIRQ
  2023-09-14 17:44 [PATCH 0/6] accel/tcg: Always require can_do_io (#1866) Richard Henderson
                   ` (3 preceding siblings ...)
  2023-09-14 17:44 ` [PATCH 4/6] accel/tcg: Improve setting of can_do_io at start of TB Richard Henderson
@ 2023-09-14 17:44 ` Richard Henderson
  2023-09-19  9:26   ` Philippe Mathieu-Daudé
  2023-09-14 17:44 ` [PATCH 6/6] accel/tcg: Always require can_do_io Richard Henderson
  5 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2023-09-14 17:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: philmd

Without this we can get see loops through cpu_io_recompile,
in which the cpu makes no progress.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/cpu-exec.c | 2 +-
 accel/tcg/tb-maint.c | 6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index e2c494e75e..c724e8b6f1 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -720,7 +720,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
             && cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra == 0) {
             /* Execute just one insn to trigger exception pending in the log */
             cpu->cflags_next_tb = (curr_cflags(cpu) & ~CF_USE_ICOUNT)
-                | CF_NOIRQ | 1;
+                | CF_LAST_IO | CF_NOIRQ | 1;
         }
 #endif
         return false;
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 32ae8af61c..0c3e227409 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -1083,7 +1083,8 @@ bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc)
     if (current_tb_modified) {
         /* Force execution of one insn next time.  */
         CPUState *cpu = current_cpu;
-        cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(current_cpu);
+        cpu->cflags_next_tb =
+            1 | CF_LAST_IO | CF_NOIRQ | curr_cflags(current_cpu);
         return true;
     }
     return false;
@@ -1153,7 +1154,8 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
     if (current_tb_modified) {
         page_collection_unlock(pages);
         /* Force execution of one insn next time.  */
-        current_cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(current_cpu);
+        current_cpu->cflags_next_tb =
+            1 | CF_LAST_IO | CF_NOIRQ | curr_cflags(current_cpu);
         mmap_unlock();
         cpu_loop_exit_noexc(current_cpu);
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] accel/tcg: Always require can_do_io
  2023-09-14 17:44 [PATCH 0/6] accel/tcg: Always require can_do_io (#1866) Richard Henderson
                   ` (4 preceding siblings ...)
  2023-09-14 17:44 ` [PATCH 5/6] accel/tcg: Always set CF_LAST_IO with CF_NOIRQ Richard Henderson
@ 2023-09-14 17:44 ` Richard Henderson
  2023-09-19  9:37   ` Philippe Mathieu-Daudé
  2023-10-24  9:50   ` Clément Chigot
  5 siblings, 2 replies; 19+ messages in thread
From: Richard Henderson @ 2023-09-14 17:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: philmd

Require i/o as the last insn of a TranslationBlock always,
not only with icount.  This is required for i/o that alters
the address space, such as a pci config space write.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1866
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translator.c      | 20 +++++++-------------
 target/mips/tcg/translate.c |  1 -
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index dd507cd471..358214d526 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -28,12 +28,6 @@ static void set_can_do_io(DisasContextBase *db, bool val)
 
 bool translator_io_start(DisasContextBase *db)
 {
-    uint32_t cflags = tb_cflags(db->tb);
-
-    if (!(cflags & CF_USE_ICOUNT)) {
-        return false;
-    }
-
     set_can_do_io(db, true);
 
     /*
@@ -86,15 +80,15 @@ static TCGOp *gen_tb_start(DisasContextBase *db, uint32_t cflags)
         tcg_gen_st16_i32(count, cpu_env,
                          offsetof(ArchCPU, neg.icount_decr.u16.low) -
                          offsetof(ArchCPU, env));
-        /*
-         * cpu->can_do_io is set automatically here at the beginning of
-         * each translation block.  The cost is minimal and only paid for
-         * -icount, plus it would be very easy to forget doing it in the
-         * translator.
-         */
-        set_can_do_io(db, db->max_insns == 1 && (cflags & CF_LAST_IO));
     }
 
+    /*
+     * cpu->can_do_io is set automatically here at the beginning of
+     * each translation block.  The cost is minimal, plus it would be
+     * very easy to forget doing it in the translator.
+     */
+    set_can_do_io(db, db->max_insns == 1 && (cflags & CF_LAST_IO));
+
     return icount_start_insn;
 }
 
diff --git a/target/mips/tcg/translate.c b/target/mips/tcg/translate.c
index 9bb40f1849..593fc80458 100644
--- a/target/mips/tcg/translate.c
+++ b/target/mips/tcg/translate.c
@@ -11212,7 +11212,6 @@ static void gen_branch(DisasContext *ctx, int insn_bytes)
         /* Branches completion */
         clear_branch_hflags(ctx);
         ctx->base.is_jmp = DISAS_NORETURN;
-        /* FIXME: Need to clear can_do_io.  */
         switch (proc_hflags & MIPS_HFLAG_BMASK_BASE) {
         case MIPS_HFLAG_FBNSLOT:
             gen_goto_tb(ctx, 0, ctx->base.pc_next + insn_bytes);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] accel/tcg: Avoid load of icount_decr if unused
  2023-09-14 17:44 ` [PATCH 1/6] accel/tcg: Avoid load of icount_decr if unused Richard Henderson
@ 2023-09-15  9:23   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-15  9:23 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 14/9/23 19:44, Richard Henderson wrote:
> With CF_NOIRQ and without !CF_USE_ICOUNT, the load isn't used.
> Avoid emitting it.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   accel/tcg/translator.c | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] accel/tcg: Hoist CF_MEMI_ONLY check outside translation loop
  2023-09-14 17:44 ` [PATCH 2/6] accel/tcg: Hoist CF_MEMI_ONLY check outside translation loop Richard Henderson
@ 2023-09-15  9:26   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-15  9:26 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 14/9/23 19:44, Richard Henderson wrote:
> The condition checked is loop invariant; check it only once.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   accel/tcg/translator.c | 14 ++++++++------
>   1 file changed, 8 insertions(+), 6 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/6: 1/2] accel/tcg: Refactor gen_io_start() as set_can_do_io()
  2023-09-14 17:44 ` [PATCH 3/6] accel/tcg: Track current value of can_do_io in the TB Richard Henderson
@ 2023-09-15  9:41   ` Philippe Mathieu-Daudé
  2023-09-15  9:41     ` [PATCH 3/6: 2/2] accel/tcg: Track current value of can_do_io in the TB Philippe Mathieu-Daudé
  2023-09-15  9:42     ` [PATCH 3/6: 1/2] accel/tcg: Refactor gen_io_start() as set_can_do_io() Philippe Mathieu-Daudé
  0 siblings, 2 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-15  9:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Philippe Mathieu-Daudé

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20230914174436.1597356-4-richard.henderson@linaro.org>
[PMD: Split patch in 2, extracting set_can_do_io() first]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 accel/tcg/translator.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index b6ab9f3d33..cebf0a84cc 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -16,9 +16,9 @@
 #include "tcg/tcg-op-common.h"
 #include "internal.h"
 
-static void gen_io_start(void)
+static void set_can_do_io(DisasContextBase *db, bool val)
 {
-    tcg_gen_st_i32(tcg_constant_i32(1), cpu_env,
+    tcg_gen_st_i32(tcg_constant_i32(val), cpu_env,
                    offsetof(ArchCPU, parent_obj.can_do_io) -
                    offsetof(ArchCPU, env));
 }
@@ -35,7 +35,7 @@ bool translator_io_start(DisasContextBase *db)
         return true;
     }
 
-    gen_io_start();
+    set_can_do_io(db, true);
 
     /*
      * Ensure that this instruction will be the last in the TB.
@@ -47,7 +47,7 @@ bool translator_io_start(DisasContextBase *db)
     return true;
 }
 
-static TCGOp *gen_tb_start(uint32_t cflags)
+static TCGOp *gen_tb_start(DisasContextBase *db, uint32_t cflags)
 {
     TCGv_i32 count = NULL;
     TCGOp *icount_start_insn = NULL;
@@ -91,12 +91,9 @@ static TCGOp *gen_tb_start(uint32_t cflags)
          * cpu->can_do_io is cleared automatically here at the beginning of
          * each translation block.  The cost is minimal and only paid for
          * -icount, plus it would be very easy to forget doing it in the
-         * translator. Doing it here means we don't need a gen_io_end() to
-         * go with gen_io_start().
+         * translator.
          */
-        tcg_gen_st_i32(tcg_constant_i32(0), cpu_env,
-                       offsetof(ArchCPU, parent_obj.can_do_io) -
-                       offsetof(ArchCPU, env));
+        set_can_do_io(db, false);
     }
 
     return icount_start_insn;
@@ -154,7 +151,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
     tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
 
     /* Start translating.  */
-    icount_start_insn = gen_tb_start(cflags);
+    icount_start_insn = gen_tb_start(db, cflags);
     ops->tb_start(db, cpu);
     tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
 
@@ -181,7 +178,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
            the next instruction.  */
         if (db->num_insns == db->max_insns && (cflags & CF_LAST_IO)) {
             /* Accept I/O on the last instruction.  */
-            gen_io_start();
+            set_can_do_io(db, true);
         }
         ops->translate_insn(db, cpu);
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6: 2/2] accel/tcg: Track current value of can_do_io in the TB
  2023-09-15  9:41   ` [PATCH 3/6: 1/2] accel/tcg: Refactor gen_io_start() as set_can_do_io() Philippe Mathieu-Daudé
@ 2023-09-15  9:41     ` Philippe Mathieu-Daudé
  2023-09-15  9:44       ` Philippe Mathieu-Daudé
  2023-09-15  9:42     ` [PATCH 3/6: 1/2] accel/tcg: Refactor gen_io_start() as set_can_do_io() Philippe Mathieu-Daudé
  1 sibling, 1 reply; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-15  9:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Philippe Mathieu-Daudé

From: Richard Henderson <richard.henderson@linaro.org>

Simplify translator_io_start by recording the current
known value of can_do_io within DisasContextBase.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20230914174436.1597356-4-richard.henderson@linaro.org>
[PMD: Split patch in 2, extracting set_can_do_io() first]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 include/exec/translator.h |  2 ++
 accel/tcg/translator.c    | 14 +++++++-------
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index 4e17c4f401..9d9e980819 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -72,6 +72,7 @@ typedef enum DisasJumpType {
  * @num_insns: Number of translated instructions (including current).
  * @max_insns: Maximum number of instructions to be translated in this TB.
  * @singlestep_enabled: "Hardware" single stepping enabled.
+ * @saved_can_do_io: Known value of cpu->neg.can_do_io, or -1 for unknown.
  *
  * Architecture-agnostic disassembly context.
  */
@@ -83,6 +84,7 @@ typedef struct DisasContextBase {
     int num_insns;
     int max_insns;
     bool singlestep_enabled;
+    int8_t saved_can_do_io;
     void *host_addr[2];
 } DisasContextBase;
 
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index cebf0a84cc..850d82e26f 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -18,9 +18,12 @@
 
 static void set_can_do_io(DisasContextBase *db, bool val)
 {
-    tcg_gen_st_i32(tcg_constant_i32(val), cpu_env,
-                   offsetof(ArchCPU, parent_obj.can_do_io) -
-                   offsetof(ArchCPU, env));
+    if (db->saved_can_do_io != val) {
+        db->saved_can_do_io = val;
+        tcg_gen_st_i32(tcg_constant_i32(val), cpu_env,
+                       offsetof(ArchCPU, parent_obj.can_do_io) -
+                       offsetof(ArchCPU, env));
+    }
 }
 
 bool translator_io_start(DisasContextBase *db)
@@ -30,10 +33,6 @@ bool translator_io_start(DisasContextBase *db)
     if (!(cflags & CF_USE_ICOUNT)) {
         return false;
     }
-    if (db->num_insns == db->max_insns && (cflags & CF_LAST_IO)) {
-        /* Already started in translator_loop. */
-        return true;
-    }
 
     set_can_do_io(db, true);
 
@@ -144,6 +143,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
     db->num_insns = 0;
     db->max_insns = *max_insns;
     db->singlestep_enabled = cflags & CF_SINGLE_STEP;
+    db->saved_can_do_io = -1;
     db->host_addr[0] = host_pc;
     db->host_addr[1] = NULL;
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6: 1/2] accel/tcg: Refactor gen_io_start() as set_can_do_io()
  2023-09-15  9:41   ` [PATCH 3/6: 1/2] accel/tcg: Refactor gen_io_start() as set_can_do_io() Philippe Mathieu-Daudé
  2023-09-15  9:41     ` [PATCH 3/6: 2/2] accel/tcg: Track current value of can_do_io in the TB Philippe Mathieu-Daudé
@ 2023-09-15  9:42     ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-15  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson

On 15/9/23 11:41, Philippe Mathieu-Daudé wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Message-ID: <20230914174436.1597356-4-richard.henderson@linaro.org>
> [PMD: Split patch in 2, extracting set_can_do_io() first]
> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> ---
>   accel/tcg/translator.c | 19 ++++++++-----------
>   1 file changed, 8 insertions(+), 11 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6: 2/2] accel/tcg: Track current value of can_do_io in the TB
  2023-09-15  9:41     ` [PATCH 3/6: 2/2] accel/tcg: Track current value of can_do_io in the TB Philippe Mathieu-Daudé
@ 2023-09-15  9:44       ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-15  9:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson

On 15/9/23 11:41, Philippe Mathieu-Daudé wrote:
> From: Richard Henderson <richard.henderson@linaro.org>
> 
> Simplify translator_io_start by recording the current
> known value of can_do_io within DisasContextBase.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Message-ID: <20230914174436.1597356-4-richard.henderson@linaro.org>
> [PMD: Split patch in 2, extracting set_can_do_io() first]
> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> ---
>   include/exec/translator.h |  2 ++
>   accel/tcg/translator.c    | 14 +++++++-------
>   2 files changed, 9 insertions(+), 7 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] accel/tcg: Improve setting of can_do_io at start of TB
  2023-09-14 17:44 ` [PATCH 4/6] accel/tcg: Improve setting of can_do_io at start of TB Richard Henderson
@ 2023-09-15  9:47   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-15  9:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 14/9/23 19:44, Richard Henderson wrote:
> Initialize can_do_io to true if this the TB has CF_LAST_IO
> and will consist of a single instruction.  This avoids a
> set to 0 followed immediately by a set to 1.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   accel/tcg/translator.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] accel/tcg: Always set CF_LAST_IO with CF_NOIRQ
  2023-09-14 17:44 ` [PATCH 5/6] accel/tcg: Always set CF_LAST_IO with CF_NOIRQ Richard Henderson
@ 2023-09-19  9:26   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-19  9:26 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 14/9/23 19:44, Richard Henderson wrote:
> Without this we can get see loops through cpu_io_recompile,
> in which the cpu makes no progress.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   accel/tcg/cpu-exec.c | 2 +-
>   accel/tcg/tb-maint.c | 6 ++++--
>   2 files changed, 5 insertions(+), 3 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] accel/tcg: Always require can_do_io
  2023-09-14 17:44 ` [PATCH 6/6] accel/tcg: Always require can_do_io Richard Henderson
@ 2023-09-19  9:37   ` Philippe Mathieu-Daudé
  2023-10-24  9:50   ` Clément Chigot
  1 sibling, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-19  9:37 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 14/9/23 19:44, Richard Henderson wrote:
> Require i/o as the last insn of a TranslationBlock always,
> not only with icount.  This is required for i/o that alters
> the address space, such as a pci config space write.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1866
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   accel/tcg/translator.c      | 20 +++++++-------------
>   target/mips/tcg/translate.c |  1 -
>   2 files changed, 7 insertions(+), 14 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] accel/tcg: Always require can_do_io
  2023-09-14 17:44 ` [PATCH 6/6] accel/tcg: Always require can_do_io Richard Henderson
  2023-09-19  9:37   ` Philippe Mathieu-Daudé
@ 2023-10-24  9:50   ` Clément Chigot
  2023-10-26  0:44     ` Richard Henderson
  1 sibling, 1 reply; 19+ messages in thread
From: Clément Chigot @ 2023-10-24  9:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, philmd

Hi Richard,

This commit has broken some of our internal bareboard testing on
Risc-V 64. At some point in our programs, there is an AMOSWAP (=
atomic swap) instruction on I/O. But since this commit, can_do_io is
set to false triggering an infinite loop.
IIUC the doc (cf [1]), atomic operations on I/O are allowed.

I think there is a CF_LAST_IO flag missing somewhere to allow it, but
I'm not sure where this should be. Do you have any ideas ?

Sadly I cannot provide a reproducer that easily, mainly because our
microchip has a few patches not yet merged making our binaries not
running on the upstream master.
But here is a bit of the in_asm backtrace:

  | IN: system__bb__riscv_plic__initialize
  | Priv: 3; Virt: 0
  | 0x80000eb4:  1141              addi                    sp,sp,-16
  | 0x80000eb6:  0c0027b7          lui                     a5,49154
  | 0x80000eba:  e406              sd                      ra,8(sp)
  | 0x80000ebc:  00010597          auipc                   a1,16
            # 0x80010ebc
  | 0x80000ec0:  47458593          addi                    a1,a1,1140
  | 0x80000ec4:  f3ffe637          lui                     a2,-49154
  | 0x80000ec8:  01878693          addi                    a3,a5,24
  | 0x80000ecc:  00f58733          add                     a4,a1,a5
  | 0x80000ed0:  9732              add                     a4,a4,a2
  | 0x80000ed2:  4318              lw                      a4,0(a4)
  | 0x80000ed4:  2701              sext.w                  a4,a4
  | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
  | 0x80000eda:  0791              addi                    a5,a5,4
  | 0x80000edc:  fed798e3          bne                     a5,a3,-16
            # 0x80000ecc
  |
  | ----------------
  | IN: system__bb__riscv_plic__initialize
  | Priv: 3; Virt: 0
  | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
  |
  | ----------------
  | IN: system__bb__riscv_plic__initialize
  | Priv: 3; Virt: 0
  | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
  | * Freeze *

a5 (= 0xc002000) above is targeting an address inside sifive_plic HW.


Note IIUC the doc (cf [1]), atomic operations on I/O are allowed.
[1] https://github.com/riscv/riscv-isa-manual/blob/main/src/a-st-ext.adoc#atomic-memory-operations

Thanks in advance.

On Thu, Sep 14, 2023 at 7:45 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Require i/o as the last insn of a TranslationBlock always,
> not only with icount.  This is required for i/o that alters
> the address space, such as a pci config space write.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1866
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/translator.c      | 20 +++++++-------------
>  target/mips/tcg/translate.c |  1 -
>  2 files changed, 7 insertions(+), 14 deletions(-)
>
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index dd507cd471..358214d526 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -28,12 +28,6 @@ static void set_can_do_io(DisasContextBase *db, bool val)
>
>  bool translator_io_start(DisasContextBase *db)
>  {
> -    uint32_t cflags = tb_cflags(db->tb);
> -
> -    if (!(cflags & CF_USE_ICOUNT)) {
> -        return false;
> -    }
> -
>      set_can_do_io(db, true);
>
>      /*
> @@ -86,15 +80,15 @@ static TCGOp *gen_tb_start(DisasContextBase *db, uint32_t cflags)
>          tcg_gen_st16_i32(count, cpu_env,
>                           offsetof(ArchCPU, neg.icount_decr.u16.low) -
>                           offsetof(ArchCPU, env));
> -        /*
> -         * cpu->can_do_io is set automatically here at the beginning of
> -         * each translation block.  The cost is minimal and only paid for
> -         * -icount, plus it would be very easy to forget doing it in the
> -         * translator.
> -         */
> -        set_can_do_io(db, db->max_insns == 1 && (cflags & CF_LAST_IO));
>      }
>
> +    /*
> +     * cpu->can_do_io is set automatically here at the beginning of
> +     * each translation block.  The cost is minimal, plus it would be
> +     * very easy to forget doing it in the translator.
> +     */
> +    set_can_do_io(db, db->max_insns == 1 && (cflags & CF_LAST_IO));
> +
>      return icount_start_insn;
>  }
>
> diff --git a/target/mips/tcg/translate.c b/target/mips/tcg/translate.c
> index 9bb40f1849..593fc80458 100644
> --- a/target/mips/tcg/translate.c
> +++ b/target/mips/tcg/translate.c
> @@ -11212,7 +11212,6 @@ static void gen_branch(DisasContext *ctx, int insn_bytes)
>          /* Branches completion */
>          clear_branch_hflags(ctx);
>          ctx->base.is_jmp = DISAS_NORETURN;
> -        /* FIXME: Need to clear can_do_io.  */
>          switch (proc_hflags & MIPS_HFLAG_BMASK_BASE) {
>          case MIPS_HFLAG_FBNSLOT:
>              gen_goto_tb(ctx, 0, ctx->base.pc_next + insn_bytes);
> --
> 2.34.1
>
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] accel/tcg: Always require can_do_io
  2023-10-24  9:50   ` Clément Chigot
@ 2023-10-26  0:44     ` Richard Henderson
  2023-10-26  8:10       ` Clément Chigot
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2023-10-26  0:44 UTC (permalink / raw)
  To: Clément Chigot; +Cc: qemu-devel, philmd

On 10/24/23 02:50, Clément Chigot wrote:
> Hi Richard,
> 
> This commit has broken some of our internal bareboard testing on
> Risc-V 64. At some point in our programs, there is an AMOSWAP (=
> atomic swap) instruction on I/O. But since this commit, can_do_io is
> set to false triggering an infinite loop.
> IIUC the doc (cf [1]), atomic operations on I/O are allowed.
> 
> I think there is a CF_LAST_IO flag missing somewhere to allow it, but
> I'm not sure where this should be. Do you have any ideas ?
> 
> Sadly I cannot provide a reproducer that easily, mainly because our
> microchip has a few patches not yet merged making our binaries not
> running on the upstream master.
> But here is a bit of the in_asm backtrace:
> 
>    | IN: system__bb__riscv_plic__initialize
>    | Priv: 3; Virt: 0
>    | 0x80000eb4:  1141              addi                    sp,sp,-16
>    | 0x80000eb6:  0c0027b7          lui                     a5,49154
>    | 0x80000eba:  e406              sd                      ra,8(sp)
>    | 0x80000ebc:  00010597          auipc                   a1,16
>              # 0x80010ebc
>    | 0x80000ec0:  47458593          addi                    a1,a1,1140
>    | 0x80000ec4:  f3ffe637          lui                     a2,-49154
>    | 0x80000ec8:  01878693          addi                    a3,a5,24
>    | 0x80000ecc:  00f58733          add                     a4,a1,a5
>    | 0x80000ed0:  9732              add                     a4,a4,a2
>    | 0x80000ed2:  4318              lw                      a4,0(a4)
>    | 0x80000ed4:  2701              sext.w                  a4,a4
>    | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
>    | 0x80000eda:  0791              addi                    a5,a5,4
>    | 0x80000edc:  fed798e3          bne                     a5,a3,-16
>              # 0x80000ecc
>    |
>    | ----------------
>    | IN: system__bb__riscv_plic__initialize
>    | Priv: 3; Virt: 0
>    | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
>    |
>    | ----------------
>    | IN: system__bb__riscv_plic__initialize
>    | Priv: 3; Virt: 0
>    | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
>    | * Freeze *

I would expect two translations:

(1) with the original TB, aborts execution on !can_do_io.
(2) with the second TB, we get further into the actual execution and abort execution on 
TLB_MMIO.
(3) with the third TB, we clear CF_PARALLEL and execute under cpu_exec_step_atomic.

Both 2 and 3 should have had CF_LAST_IO set.
You can verify this with '-d exec' output.

As a trivial example from qemu-system-alpha bios startup:

> Trace 0: 0x7f2584008380 [00000000/fffffc0000003ee4/01000000/ff000000] uart_init_line
> cpu_io_recompile: rewound execution of TB to fffffc0000003ee4
> ----------------
> IN: uart_init_line
> 0xfffffc0000003f20:  stb        t8,0(t6)
> 
> Trace 0: 0x7f2584008a00 [00000000/fffffc0000003f20/01000000/ff018001] uart_init_line

Note that the final "/" field is cflags.  The first "Trace" corresponds to (1), where the 
store is in the middle of the TB.  You can see the io_recompile abort, then the second 
"Trace" contains {CF_COUNT=1, CF_LAST_IO, CF_MEMI_ONLY}.

In the short term, try adding CF_LAST_IO to cflags in cpu_exec_step_atomic.

I think probably the logic of CF_LAST_IO should always be applied now, since can_do_io is 
always live, and thus the flag itself should go away.


r~


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] accel/tcg: Always require can_do_io
  2023-10-26  0:44     ` Richard Henderson
@ 2023-10-26  8:10       ` Clément Chigot
  0 siblings, 0 replies; 19+ messages in thread
From: Clément Chigot @ 2023-10-26  8:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, philmd

On Thu, Oct 26, 2023 at 2:44 AM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 10/24/23 02:50, Clément Chigot wrote:
> > Hi Richard,
> >
> > This commit has broken some of our internal bareboard testing on
> > Risc-V 64. At some point in our programs, there is an AMOSWAP (=
> > atomic swap) instruction on I/O. But since this commit, can_do_io is
> > set to false triggering an infinite loop.
> > IIUC the doc (cf [1]), atomic operations on I/O are allowed.
> >
> > I think there is a CF_LAST_IO flag missing somewhere to allow it, but
> > I'm not sure where this should be. Do you have any ideas ?
> >
> > Sadly I cannot provide a reproducer that easily, mainly because our
> > microchip has a few patches not yet merged making our binaries not
> > running on the upstream master.
> > But here is a bit of the in_asm backtrace:
> >
> >    | IN: system__bb__riscv_plic__initialize
> >    | Priv: 3; Virt: 0
> >    | 0x80000eb4:  1141              addi                    sp,sp,-16
> >    | 0x80000eb6:  0c0027b7          lui                     a5,49154
> >    | 0x80000eba:  e406              sd                      ra,8(sp)
> >    | 0x80000ebc:  00010597          auipc                   a1,16
> >              # 0x80010ebc
> >    | 0x80000ec0:  47458593          addi                    a1,a1,1140
> >    | 0x80000ec4:  f3ffe637          lui                     a2,-49154
> >    | 0x80000ec8:  01878693          addi                    a3,a5,24
> >    | 0x80000ecc:  00f58733          add                     a4,a1,a5
> >    | 0x80000ed0:  9732              add                     a4,a4,a2
> >    | 0x80000ed2:  4318              lw                      a4,0(a4)
> >    | 0x80000ed4:  2701              sext.w                  a4,a4
> >    | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
> >    | 0x80000eda:  0791              addi                    a5,a5,4
> >    | 0x80000edc:  fed798e3          bne                     a5,a3,-16
> >              # 0x80000ecc
> >    |
> >    | ----------------
> >    | IN: system__bb__riscv_plic__initialize
> >    | Priv: 3; Virt: 0
> >    | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
> >    |
> >    | ----------------
> >    | IN: system__bb__riscv_plic__initialize
> >    | Priv: 3; Virt: 0
> >    | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
> >    | * Freeze *
>
> I would expect two translations:
>
> (1) with the original TB, aborts execution on !can_do_io.
> (2) with the second TB, we get further into the actual execution and abort execution on
> TLB_MMIO.
> (3) with the third TB, we clear CF_PARALLEL and execute under cpu_exec_step_atomic.
>
> Both 2 and 3 should have had CF_LAST_IO set.
> You can verify this with '-d exec' output.
>
> As a trivial example from qemu-system-alpha bios startup:
>
> > Trace 0: 0x7f2584008380 [00000000/fffffc0000003ee4/01000000/ff000000] uart_init_line
> > cpu_io_recompile: rewound execution of TB to fffffc0000003ee4
> > ----------------
> > IN: uart_init_line
> > 0xfffffc0000003f20:  stb        t8,0(t6)
> >
> > Trace 0: 0x7f2584008a00 [00000000/fffffc0000003f20/01000000/ff018001] uart_init_line
>
> Note that the final "/" field is cflags.  The first "Trace" corresponds to (1), where the
> store is in the middle of the TB.  You can see the io_recompile abort, then the second
> "Trace" contains {CF_COUNT=1, CF_LAST_IO, CF_MEMI_ONLY}.

With the exec, it's indeed a bit clearer.
I do have a new translation made by cpu_io_recompile which sets
CF_LAST_IO (2). But afterward, it somehow loops back to the previous
translation which doesn't have CF_LAST_IO which calls cpu_io_recompile
again, etc. This triggers an infinite loop between these two
translations
  | ----------------
  | IN: system__bb__riscv_plic__initialize
  | 0x80000eb4:  1141              addi                    sp,sp,-16
  | ...
  | 0x80000ed4:  2701              sext.w                  a4,a4
  | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
  | 0x80000eda:  0791              addi                    a5,a5,4
  | 0x80000edc:  fed798e3          bne                     a5,a3,-16
            # 0x80000ecc
  |
  | Linking TBs 0x7f0d3199db40 index 0 -> 0x7f0d3199dd00
  | Trace 1: 0x7f0d3199dd00
[00000000/0000000080000eb4/0b02401b/ff280000]
system__bb__riscv_plic__initialize

(1) First exec of amoswap without CF_LAST_IO.
  | ----------------
  | IN: system__bb__riscv_plic__initialize
  | Priv: 3; Virt: 0
  | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
  |
  | Trace 1: 0x7f0d3199df80
[00000000/0000000080000ed6/0b02401b/ff200601]
system__bb__riscv_plic__initialize

(2) Second exec with CF_LAST_IO.
  | cpu_io_recompile: rewound execution of TB to 0000000080000ed6
  | ----------------
  | IN: system__bb__riscv_plic__initialize
  | Priv: 3; Virt: 0
  | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
  |
  | Trace 1: 0x7f0d3199e140
[00000000/0000000080000ed6/0b02401b/ff298001]
system__bb__riscv_plic__initialize

Loop back (1) and start the infinite loop between (1) and (2).
  | Trace 1: 0x7f0d3199df80
[00000000/0000000080000ed6/0b02401b/ff200601]
system__bb__riscv_plic__initialize
  | cpu_io_recompile: rewound execution of TB to 0000000080000ed6
  | Trace 1: 0x7f0d3199e140
[00000000/0000000080000ed6/0b02401b/ff298001]
system__bb__riscv_plic__initialize
  | Trace 1: 0x7f0d3199df80
[00000000/0000000080000ed6/0b02401b/ff200601]
system__bb__riscv_plic__initialize
  | cpu_io_recompile: rewound execution of TB to 0000000080000ed6

I'll continue to investigate. Even if the workaround you provided
works, there is something weird happening here.

> In the short term, try adding CF_LAST_IO to cflags in cpu_exec_step_atomic.
Thanks for that workaround.

Thanks,
Clément

> I think probably the logic of CF_LAST_IO should always be applied now, since can_do_io is
> always live, and thus the flag itself should go away.
>
>
> r~


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-10-26  8:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-14 17:44 [PATCH 0/6] accel/tcg: Always require can_do_io (#1866) Richard Henderson
2023-09-14 17:44 ` [PATCH 1/6] accel/tcg: Avoid load of icount_decr if unused Richard Henderson
2023-09-15  9:23   ` Philippe Mathieu-Daudé
2023-09-14 17:44 ` [PATCH 2/6] accel/tcg: Hoist CF_MEMI_ONLY check outside translation loop Richard Henderson
2023-09-15  9:26   ` Philippe Mathieu-Daudé
2023-09-14 17:44 ` [PATCH 3/6] accel/tcg: Track current value of can_do_io in the TB Richard Henderson
2023-09-15  9:41   ` [PATCH 3/6: 1/2] accel/tcg: Refactor gen_io_start() as set_can_do_io() Philippe Mathieu-Daudé
2023-09-15  9:41     ` [PATCH 3/6: 2/2] accel/tcg: Track current value of can_do_io in the TB Philippe Mathieu-Daudé
2023-09-15  9:44       ` Philippe Mathieu-Daudé
2023-09-15  9:42     ` [PATCH 3/6: 1/2] accel/tcg: Refactor gen_io_start() as set_can_do_io() Philippe Mathieu-Daudé
2023-09-14 17:44 ` [PATCH 4/6] accel/tcg: Improve setting of can_do_io at start of TB Richard Henderson
2023-09-15  9:47   ` Philippe Mathieu-Daudé
2023-09-14 17:44 ` [PATCH 5/6] accel/tcg: Always set CF_LAST_IO with CF_NOIRQ Richard Henderson
2023-09-19  9:26   ` Philippe Mathieu-Daudé
2023-09-14 17:44 ` [PATCH 6/6] accel/tcg: Always require can_do_io Richard Henderson
2023-09-19  9:37   ` Philippe Mathieu-Daudé
2023-10-24  9:50   ` Clément Chigot
2023-10-26  0:44     ` Richard Henderson
2023-10-26  8:10       ` Clément Chigot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).