qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] hvf: use TCG emulation to handle data aborts
@ 2025-02-09  3:32 Joelle van Dyne
  2025-02-09  3:32 ` [PATCH RFC 1/4] cpu-exec: support single-step without debug Joelle van Dyne
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Joelle van Dyne @ 2025-02-09  3:32 UTC (permalink / raw)
  To: qemu-devel

When the VM exits with an data abort, we check the ISV field in the ESR and when
ISV=1, that means the processor has filled the remaining fields with information
needed to determine the access that caused the abort: address, access width, and
the register operand. However, only a limited set of instructions which can
cause a data abort is nice enough for the processor to decode this way. Many
instructions such as LDP/STP and SIMD can cause an data abort with ISV=0 and for
that the hypervisor needs to manually decode the instruction, find the operands,
and emulate the access.

QEMU already ships with the ability to do this: TCG. However, TCG currently
operates as a stand-alone accelerator. This patch set enables HVF to call into
TCG when needed in order to perform a memory access that caused the abort.

One thing this enables is the ability to use virtio-vga with Windows for ARM64.
Currently, graphics support for Windows is flakey because you must first boot
with ramfb to get to the desktop where you can then install the virtio-gpu
drivers and then start up with virtio-gpu. Even then, there is a known issue
where Windows mistakingly thinks there are two monitors connected because the
boot display does not share a framebuffer with the GPU. This results in
sometimes a black screen when updating Windows.

Another issue is that the TPM driver uses LDP/STP to access the command buffer
and so the QEMU device which maps registers as MMIO will not work.

There are a couple major issues with the patch as it currently stands. First of
all, it is very slow. Because we do not track writes to code pages, to be safe
we flush TLBs and TBs every time we switch to emulation mode. We also need to
synchronize the register states between HVF and QEMU each time we enter and
exit emulation mode. Since we enter/exit emulation mode for every instruction
that causes the data abort, in the case of the VGA buffer being cleared in a
loop, this means we need to enter-exit emulation mode to execute a single
instruction for every pixel. Second, we don't support plugins at all. Lastly,
some of the CPU state used by TCG is not properly synchronized with HVF and so
subtle issues can occur. We may want to constrain the emulator to only run with
a known allowlist of instructions we wish to handle in a data abort.

I think these issues can be worked around but I want to know if people think
this approach is worth doing or if instead we should pursue alternatives such
as a more basic instruction decoder which only supports a subset of instructions
which are interesting for memory accesses.

Joelle van Dyne (4):
  cpu-exec: support single-step without debug
  cpu-target: support emulation from non-TCG accels
  hvf: arm: emulate instruction when ISV=0
  hw/arm/virt: enable VGA

 include/exec/cpu-common.h |   1 +
 include/hw/core/cpu.h     |  11 +++++
 include/system/hvf_int.h  |   2 +-
 target/arm/hvf_arm.h      |   5 ++
 target/arm/internals.h    |   3 +-
 accel/hvf/hvf-accel-ops.c |   2 +-
 accel/tcg/cpu-exec.c      |  35 +++++++++----
 accel/tcg/plugin-gen.c    |   4 ++
 accel/tcg/tb-maint.c      |   2 +-
 accel/tcg/tcg-accel-ops.c |   3 +-
 cpu-target.c              |  20 +++++++-
 plugins/core.c            |  12 +++++
 system/physmem.c          |   7 +--
 target/arm/hvf/hvf.c      | 100 ++++++++++++++++++++++++++++++++++++--
 target/i386/hvf/hvf.c     |   2 +-
 hw/arm/Kconfig            |   1 +
 16 files changed, 186 insertions(+), 24 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH RFC 1/4] cpu-exec: support single-step without debug
  2025-02-09  3:32 [PATCH RFC 0/4] hvf: use TCG emulation to handle data aborts Joelle van Dyne
@ 2025-02-09  3:32 ` Joelle van Dyne
  2025-02-09  3:32 ` [PATCH RFC 2/4] cpu-target: support emulation from non-TCG accels Joelle van Dyne
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Joelle van Dyne @ 2025-02-09  3:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joelle van Dyne, Richard Henderson, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Peter Maydell, open list:ARM TCG CPUs

Currently, single-stepping is tied to GDB debugging. This means that when
EXCP_DEBUG is returned, a debug exception is triggered in many cases. We
define a new EXCP_SINGLESTEP to differentiate the case where we want a
single step to not be tied to a debug exception. We also define a new flag
for cpu->singlestep_enabled called SSTEP_NODEBUG which is set when we want
to use single-step for purposes other than debugging.

Signed-off-by: Joelle van Dyne <j@getutm.app>
---
 include/exec/cpu-common.h |  1 +
 include/hw/core/cpu.h     |  1 +
 target/arm/internals.h    |  3 ++-
 accel/tcg/cpu-exec.c      | 35 +++++++++++++++++++++++++----------
 cpu-target.c              |  7 +++++--
 5 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index b1d76d6985..e1c798b07d 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -22,6 +22,7 @@
 #define EXCP_HALTED     0x10003 /* cpu is halted (waiting for external event) */
 #define EXCP_YIELD      0x10004 /* cpu wants to yield timeslice to another */
 #define EXCP_ATOMIC     0x10005 /* stop-the-world and emulate atomic */
+#define EXCP_SINGLESTEP 0x10006 /* singlestep without debugging */
 
 void cpu_exec_init_all(void);
 void cpu_exec_step_atomic(CPUState *cpu);
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index fb397cdfc5..e3c8450f8f 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -1072,6 +1072,7 @@ void qemu_init_vcpu(CPUState *cpu);
 #define SSTEP_ENABLE  0x1  /* Enable simulated HW single stepping */
 #define SSTEP_NOIRQ   0x2  /* Do not use IRQ while single stepping */
 #define SSTEP_NOTIMER 0x4  /* Do not Timers while single stepping */
+#define SSTEP_NODEBUG 0x8  /* Single-stepping is not for debugging */
 
 /**
  * cpu_single_step:
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 863a84edf8..961cd9927a 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -57,7 +57,8 @@ static inline bool excp_is_internal(int excp)
         || excp == EXCP_HALTED
         || excp == EXCP_EXCEPTION_EXIT
         || excp == EXCP_KERNEL_TRAP
-        || excp == EXCP_SEMIHOST;
+        || excp == EXCP_SEMIHOST
+        || excp == EXCP_SINGLESTEP;
 }
 
 /*
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 8b773d8847..6b4e63e69e 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -349,7 +349,7 @@ static bool check_for_breakpoints_slow(CPUState *cpu, vaddr pc,
      * so that one could (gdb) singlestep into the guest kernel's
      * architectural breakpoint handler.
      */
-    if (cpu->singlestep_enabled) {
+    if (cpu->singlestep_enabled && !(cpu->singlestep_enabled & SSTEP_NODEBUG)) {
         return false;
     }
 
@@ -529,7 +529,11 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
      * is handled in cpu_handle_exception.
      */
     if (unlikely(cpu->singlestep_enabled) && cpu->exception_index == -1) {
-        cpu->exception_index = EXCP_DEBUG;
+        if (!(cpu->singlestep_enabled & SSTEP_NODEBUG)) {
+            cpu->exception_index = EXCP_DEBUG;
+        } else {
+            cpu->exception_index = EXCP_SINGLESTEP;
+        }
         cpu_loop_exit(cpu);
     }
 
@@ -781,13 +785,20 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
         cpu->exception_index = -1;
 
         if (unlikely(cpu->singlestep_enabled)) {
-            /*
-             * After processing the exception, ensure an EXCP_DEBUG is
-             * raised when single-stepping so that GDB doesn't miss the
-             * next instruction.
-             */
-            *ret = EXCP_DEBUG;
-            cpu_handle_debug_exception(cpu);
+            if (!(cpu->singlestep_enabled & SSTEP_NODEBUG)) {
+                /*
+                 * After processing the exception, ensure an EXCP_DEBUG is
+                 * raised when single-stepping so that GDB doesn't miss the
+                 * next instruction.
+                 */
+                *ret = EXCP_DEBUG;
+                cpu_handle_debug_exception(cpu);
+            } else {
+                /*
+                 * In case of non-debug single step, just return
+                 */
+                *ret = EXCP_SINGLESTEP;
+            }
             return true;
         }
     } else if (!replay_has_interrupt()) {
@@ -892,7 +903,11 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
                  * next instruction.
                  */
                 if (unlikely(cpu->singlestep_enabled)) {
-                    cpu->exception_index = EXCP_DEBUG;
+                    if (!(cpu->singlestep_enabled & SSTEP_NODEBUG)) {
+                        cpu->exception_index = EXCP_DEBUG;
+                    } else {
+                        cpu->exception_index = EXCP_SINGLESTEP;
+                    }
                     bql_unlock();
                     return true;
                 }
diff --git a/cpu-target.c b/cpu-target.c
index 667688332c..6293477ed9 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -322,9 +322,12 @@ void list_cpus(void)
    CPU loop after each instruction */
 void cpu_single_step(CPUState *cpu, int enabled)
 {
-    if (cpu->singlestep_enabled != enabled) {
-        cpu->singlestep_enabled = enabled;
+    int previous = cpu->singlestep_enabled;
+    bool prev_debug_en = previous && !(previous & SSTEP_NODEBUG);
+    bool cur_debug_en = enabled && !(enabled & SSTEP_NODEBUG);
 
+    cpu->singlestep_enabled = enabled;
+    if (prev_debug_en != cur_debug_en) {
 #if !defined(CONFIG_USER_ONLY)
         const AccelOpsClass *ops = cpus_get_accel();
         if (ops->update_guest_debug) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH RFC 2/4] cpu-target: support emulation from non-TCG accels
  2025-02-09  3:32 [PATCH RFC 0/4] hvf: use TCG emulation to handle data aborts Joelle van Dyne
  2025-02-09  3:32 ` [PATCH RFC 1/4] cpu-exec: support single-step without debug Joelle van Dyne
@ 2025-02-09  3:32 ` Joelle van Dyne
  2025-02-09  3:32 ` [PATCH RFC 3/4] hvf: arm: emulate instruction when ISV=0 Joelle van Dyne
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Joelle van Dyne @ 2025-02-09  3:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joelle van Dyne, Richard Henderson, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Alex Bennée, Alexandre Iooss,
	Mahmoud Mandour, Pierrick Bouvier, Peter Xu, David Hildenbrand

We create a toggle to allow TCG emulation to be used dynamically when
running other accelerators. Tracking dirty code can be expensive so we
need to flush the TLBs and TBs every time we toggle emulation mode. Plugin
support is currently disabled when running in this mode.

Signed-off-by: Joelle van Dyne <j@getutm.app>
---
 include/hw/core/cpu.h     | 10 ++++++++++
 accel/tcg/plugin-gen.c    |  4 ++++
 accel/tcg/tb-maint.c      |  2 +-
 accel/tcg/tcg-accel-ops.c |  3 ++-
 cpu-target.c              | 13 +++++++++++++
 plugins/core.c            | 12 ++++++++++++
 system/physmem.c          |  5 +++--
 7 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index e3c8450f8f..dbbaca06ee 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -569,6 +569,9 @@ struct CPUState {
     /* track IOMMUs whose translations we've cached in the TCG TLB */
     GArray *iommu_notifiers;
 
+    /* doing emulation when not in TCG backend */
+    bool emulation_enabled;
+
     /*
      * MUST BE LAST in order to minimize the displacement to CPUArchState.
      */
@@ -1083,6 +1086,13 @@ void qemu_init_vcpu(CPUState *cpu);
  */
 void cpu_single_step(CPUState *cpu, int enabled);
 
+/**
+ * cpu_emulate:
+ * @cpu: CPU to set to emulation mode
+ * @enabled: enable emulation mode
+ */
+void cpu_emulate(CPUState *cpu, bool enabled);
+
 /* Breakpoint/watchpoint flags */
 #define BP_MEM_READ           0x01
 #define BP_MEM_WRITE          0x02
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 7e5f040bf7..e07dffeb00 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -388,6 +388,10 @@ bool plugin_gen_tb_start(CPUState *cpu, const DisasContextBase *db)
 {
     struct qemu_plugin_tb *ptb;
 
+    if (cpu->emulation_enabled) {
+        return false;
+    }
+
     if (!test_bit(QEMU_PLUGIN_EV_VCPU_TB_TRANS,
                   cpu->plugin_state->event_mask)) {
         return false;
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 3f1bebf6ab..14d4bed347 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -791,7 +791,7 @@ done:
 
 void tb_flush(CPUState *cpu)
 {
-    if (tcg_enabled()) {
+    if (tcg_enabled() || unlikely(cpu->emulation_enabled)) {
         unsigned tb_flush_count = qatomic_read(&tb_ctx.tb_flush_count);
 
         if (cpu_in_serial_context(cpu)) {
diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index 6e3f1fa92b..3c07407ccf 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -32,6 +32,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/guest-random.h"
 #include "qemu/timer.h"
+#include "exec/cpu-common.h"
 #include "exec/exec-all.h"
 #include "exec/hwaddr.h"
 #include "exec/tb-flush.h"
@@ -74,7 +75,7 @@ void tcg_cpu_destroy(CPUState *cpu)
 int tcg_cpu_exec(CPUState *cpu)
 {
     int ret;
-    assert(tcg_enabled());
+    assert(tcg_enabled() || cpu->emulation_enabled);
     cpu_exec_start(cpu);
     ret = cpu_exec(cpu);
     cpu_exec_end(cpu);
diff --git a/cpu-target.c b/cpu-target.c
index 6293477ed9..8df75e915a 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -339,6 +339,19 @@ void cpu_single_step(CPUState *cpu, int enabled)
     }
 }
 
+void cpu_emulate(CPUState *cpu, bool enabled)
+{
+    if (cpu->emulation_enabled != enabled) {
+        cpu->emulation_enabled = enabled;
+
+        if (enabled) {
+            /* FIXME: track dirty code to improve performance */
+            tb_flush(cpu);
+            tlb_flush(cpu);
+        }
+    }
+}
+
 void cpu_abort(CPUState *cpu, const char *fmt, ...)
 {
     va_list ap;
diff --git a/plugins/core.c b/plugins/core.c
index bb105e8e68..dee6ffd722 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -55,6 +55,10 @@ struct qemu_plugin_ctx *plugin_id_to_ctx_locked(qemu_plugin_id_t id)
 
 static void plugin_cpu_update__async(CPUState *cpu, run_on_cpu_data data)
 {
+    if (cpu->emulation_enabled) {
+        return;
+    }
+
     bitmap_copy(cpu->plugin_state->event_mask,
                 &data.host_ulong, QEMU_PLUGIN_EV_MAX);
     tcg_flush_jmp_cache(cpu);
@@ -499,6 +503,10 @@ qemu_plugin_vcpu_syscall(CPUState *cpu, int64_t num, uint64_t a1, uint64_t a2,
     struct qemu_plugin_cb *cb, *next;
     enum qemu_plugin_event ev = QEMU_PLUGIN_EV_VCPU_SYSCALL;
 
+    if (cpu->emulation_enabled) {
+        return;
+    }
+
     if (!test_bit(ev, cpu->plugin_state->event_mask)) {
         return;
     }
@@ -521,6 +529,10 @@ void qemu_plugin_vcpu_syscall_ret(CPUState *cpu, int64_t num, int64_t ret)
     struct qemu_plugin_cb *cb, *next;
     enum qemu_plugin_event ev = QEMU_PLUGIN_EV_VCPU_SYSCALL_RET;
 
+    if (cpu->emulation_enabled) {
+        return;
+    }
+
     if (!test_bit(ev, cpu->plugin_state->event_mask)) {
         return;
     }
diff --git a/system/physmem.c b/system/physmem.c
index 67c9db9daa..4bb2976646 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2696,7 +2696,9 @@ static void tcg_commit_cpu(CPUState *cpu, run_on_cpu_data data)
     CPUAddressSpace *cpuas = data.host_ptr;
 
     cpuas->memory_dispatch = address_space_to_dispatch(cpuas->as);
-    tlb_flush(cpu);
+    if (tcg_enabled() || cpu->emulation_enabled) {
+        tlb_flush(cpu);
+    }
 }
 
 static void tcg_commit(MemoryListener *listener)
@@ -2704,7 +2706,6 @@ static void tcg_commit(MemoryListener *listener)
     CPUAddressSpace *cpuas;
     CPUState *cpu;
 
-    assert(tcg_enabled());
     /* since each CPU stores ram addresses in its TLB cache, we must
        reset the modified entries */
     cpuas = container_of(listener, CPUAddressSpace, tcg_as_listener);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH RFC 3/4] hvf: arm: emulate instruction when ISV=0
  2025-02-09  3:32 [PATCH RFC 0/4] hvf: use TCG emulation to handle data aborts Joelle van Dyne
  2025-02-09  3:32 ` [PATCH RFC 1/4] cpu-exec: support single-step without debug Joelle van Dyne
  2025-02-09  3:32 ` [PATCH RFC 2/4] cpu-target: support emulation from non-TCG accels Joelle van Dyne
@ 2025-02-09  3:32 ` Joelle van Dyne
  2025-02-09  3:32 ` [PATCH RFC 4/4] hw/arm/virt: enable VGA Joelle van Dyne
  2025-02-10 10:16 ` [PATCH RFC 0/4] hvf: use TCG emulation to handle data aborts Peter Maydell
  4 siblings, 0 replies; 6+ messages in thread
From: Joelle van Dyne @ 2025-02-09  3:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joelle van Dyne, Cameron Esfahani, Roman Bolshakov,
	Phil Dennis-Jordan, Paolo Bonzini, Peter Xu, David Hildenbrand,
	Philippe Mathieu-Daudé, Alexander Graf, Peter Maydell,
	open list:ARM TCG CPUs

On a data abort, the processor will try to decode the faulting instruction
so the hypervisor can emulate the read/write. However, it is not always
able to do this and ISV=0 whenever the instruction is not decoded. This is
the case for example if the faulting instruction is SIMD or a LDP/STP.

When this happens, we can use TCG to emulate the faulting instruction.
This is needed if the processor uses one of these instructions to access
memory that is currently unmapped such as with VGA VRAM.

Signed-off-by: Joelle van Dyne <j@getutm.app>
---
 include/system/hvf_int.h  |   2 +-
 target/arm/hvf_arm.h      |   5 ++
 accel/hvf/hvf-accel-ops.c |   2 +-
 system/physmem.c          |   2 +-
 target/arm/hvf/hvf.c      | 100 ++++++++++++++++++++++++++++++++++++--
 target/i386/hvf/hvf.c     |   2 +-
 6 files changed, 106 insertions(+), 7 deletions(-)

diff --git a/include/system/hvf_int.h b/include/system/hvf_int.h
index 42ae18433f..7b85dbc495 100644
--- a/include/system/hvf_int.h
+++ b/include/system/hvf_int.h
@@ -64,7 +64,7 @@ void assert_hvf_ok_impl(hv_return_t ret, const char *file, unsigned int line,
                         const char *exp);
 #define assert_hvf_ok(EX) assert_hvf_ok_impl((EX), __FILE__, __LINE__, #EX)
 const char *hvf_return_string(hv_return_t ret);
-int hvf_arch_init(void);
+int hvf_arch_init(MachineState *ms);
 hv_return_t hvf_arch_vm_create(MachineState *ms, uint32_t pa_range);
 int hvf_arch_init_vcpu(CPUState *cpu);
 void hvf_arch_vcpu_destroy(CPUState *cpu);
diff --git a/target/arm/hvf_arm.h b/target/arm/hvf_arm.h
index 26c717b382..6ebef31390 100644
--- a/target/arm/hvf_arm.h
+++ b/target/arm/hvf_arm.h
@@ -41,4 +41,9 @@ static inline uint32_t hvf_arm_get_max_ipa_bit_size(void)
 
 #endif
 
+/**
+ * hvf_arm_init_emulator() - initialize TCG emulator
+ */
+void hvf_arm_init_emulator(int splitwx, unsigned max_cpus);
+
 #endif
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index 945ba72051..1caf713118 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -346,7 +346,7 @@ static int hvf_accel_init(MachineState *ms)
     hvf_state = s;
     memory_listener_register(&hvf_memory_listener, &address_space_memory);
 
-    return hvf_arch_init();
+    return hvf_arch_init(ms);
 }
 
 static inline int hvf_gdbstub_sstep_flags(void)
diff --git a/system/physmem.c b/system/physmem.c
index 4bb2976646..950cac5971 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -771,7 +771,7 @@ void cpu_address_space_init(CPUState *cpu, int asidx,
     newas = &cpu->cpu_ases[asidx];
     newas->cpu = cpu;
     newas->as = as;
-    if (tcg_enabled()) {
+    if (tcg_enabled() || hvf_enabled()) {
         newas->tcg_as_listener.log_global_after_sync = tcg_log_global_after_sync;
         newas->tcg_as_listener.commit = tcg_commit;
         newas->tcg_as_listener.name = "tcg";
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 28886970c9..2c70e691fb 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -37,6 +37,17 @@
 
 #include "gdbstub/enums.h"
 
+#if defined(CONFIG_TCG)
+#include "accel/tcg/internal-common.h"
+#include "accel/tcg/tcg-accel-ops.h"
+#include "exec/tb-flush.h"
+#include "hw/core/cpu.h"
+#include "qapi/error.h"
+#include "qemu/units.h"
+#include "system/tcg.h"
+#include "tcg/startup.h"
+#endif /* defined(CONFIG_TCG) */
+
 #define MDSCR_EL1_SS_SHIFT  0
 #define MDSCR_EL1_MDE_SHIFT 15
 
@@ -150,6 +161,17 @@ void hvf_arm_init_debug(void)
         g_array_sized_new(true, true, sizeof(HWWatchpoint), max_hw_wps);
 }
 
+#if defined(CONFIG_TCG)
+void hvf_arm_init_emulator(int splitwx, unsigned max_cpus)
+{
+    mttcg_enabled = true;
+    page_init();
+    tb_htable_init();
+    tcg_init(64 * MiB, splitwx, max_cpus);
+    tcg_prologue_init();
+}
+#endif /* defined(CONFIG_TCG) */
+
 #define HVF_SYSREG(crn, crm, op0, op1, op2) \
         ENCODE_AA64_CP_REG(CP_REG_ARM64_SYSREG_CP, crn, crm, op0, op1, op2)
 
@@ -968,6 +990,9 @@ void hvf_arm_set_cpu_features_from_host(ARMCPU *cpu)
 
 void hvf_arch_vcpu_destroy(CPUState *cpu)
 {
+#if defined(CONFIG_TCG)
+    tcg_exec_unrealizefn(cpu);
+#endif
 }
 
 hv_return_t hvf_arch_vm_create(MachineState *ms, uint32_t pa_range)
@@ -1060,13 +1085,26 @@ int hvf_arch_init_vcpu(CPUState *cpu)
                               arm_cpu->isar.id_aa64mmfr0);
     assert_hvf_ok(ret);
 
+    /* enable TCG emulator */
+#if defined(CONFIG_TCG)
+    tcg_register_thread();
+    tcg_cpu_init_cflags(cpu, current_machine->smp.max_cpus > 1);
+    tcg_exec_realizefn(cpu, &error_fatal);
+#endif
+
     return 0;
 }
 
 void hvf_kick_vcpu_thread(CPUState *cpu)
 {
-    cpus_kick_thread(cpu);
-    hv_vcpus_exit(&cpu->accel->fd, 1);
+    if (cpu->emulation_enabled) {
+        cpu_exit(cpu);
+    } else {
+        cpus_kick_thread(cpu);
+        if (cpu->accel) {
+            hv_vcpus_exit(&cpu->accel->fd, 1);
+        }
+    }
 }
 
 static void hvf_raise_exception(CPUState *cpu, uint32_t excp,
@@ -1881,6 +1919,50 @@ static inline uint64_t sign_extend(uint64_t value, uint32_t bits)
     return (uint64_t)((int64_t)(value << (64 - bits)) >> (64 - bits));
 }
 
+#if defined(CONFIG_TCG)
+static int emulate_single_instruction(CPUState *cpu)
+{
+    ARMCPU *arm_cpu = ARM_CPU(cpu);
+    CPUARMState *env = &arm_cpu->env;
+    int prev_ss_enable = cpu->singlestep_enabled;
+    int ret;
+
+    cpu_synchronize_state(cpu);
+    arm_rebuild_hflags(env);
+    cpu_emulate(cpu, true);
+    cpu_single_step(cpu, SSTEP_NODEBUG | SSTEP_ENABLE);
+    do {
+        if (cpu_can_run(cpu)) {
+            bql_unlock();
+            ret = tcg_cpu_exec(cpu);
+            bql_lock();
+            if (ret == EXCP_ATOMIC) {
+                bql_unlock();
+                cpu_exec_step_atomic(cpu);
+                bql_lock();
+                ret = 0;
+            }
+            /* retry if we got an interrupt */
+            if (ret != EXCP_INTERRUPT) {
+                break;
+            }
+        }
+
+        qatomic_set_mb(&cpu->exit_request, 0);
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+    cpu_single_step(cpu, prev_ss_enable);
+    cpu_emulate(cpu, false);
+    cpu->accel->dirty = true;
+    flush_cpu_state(cpu);
+    if (!ret && prev_ss_enable) {
+        /* if single-stepping, always return EXCP_DEBUG */
+        ret = EXCP_DEBUG;
+    }
+    return ret;
+}
+#endif
+
 int hvf_vcpu_exec(CPUState *cpu)
 {
     ARMCPU *arm_cpu = ARM_CPU(cpu);
@@ -1993,7 +2075,15 @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
 
+#if defined(CONFIG_TCG)
+        if (unlikely(!isv)) {
+            ret = emulate_single_instruction(cpu);
+            advance_pc = false;
+            break;
+        }
+#else
         assert(isv);
+#endif
 
         if (iswrite) {
             val = hvf_get_reg(cpu, srt);
@@ -2124,7 +2214,7 @@ static void hvf_vm_state_change(void *opaque, bool running, RunState state)
     }
 }
 
-int hvf_arch_init(void)
+int hvf_arch_init(MachineState *ms)
 {
     hvf_state->vtimer_offset = mach_absolute_time();
     vmstate_register(NULL, 0, &vmstate_hvf_vtimer, &vtimer);
@@ -2132,6 +2222,10 @@ int hvf_arch_init(void)
 
     hvf_arm_init_debug();
 
+#if defined(CONFIG_TCG)
+    hvf_arm_init_emulator(0, ms->smp.max_cpus);
+#endif
+
     return 0;
 }
 
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index ca08f0753f..bcf9433d33 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -218,7 +218,7 @@ void hvf_kick_vcpu_thread(CPUState *cpu)
     hv_vcpu_interrupt(&cpu->accel->fd, 1);
 }
 
-int hvf_arch_init(void)
+int hvf_arch_init(MachineState *ms)
 {
     return 0;
 }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH RFC 4/4] hw/arm/virt: enable VGA
  2025-02-09  3:32 [PATCH RFC 0/4] hvf: use TCG emulation to handle data aborts Joelle van Dyne
                   ` (2 preceding siblings ...)
  2025-02-09  3:32 ` [PATCH RFC 3/4] hvf: arm: emulate instruction when ISV=0 Joelle van Dyne
@ 2025-02-09  3:32 ` Joelle van Dyne
  2025-02-10 10:16 ` [PATCH RFC 0/4] hvf: use TCG emulation to handle data aborts Peter Maydell
  4 siblings, 0 replies; 6+ messages in thread
From: Joelle van Dyne @ 2025-02-09  3:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joelle van Dyne, Paolo Bonzini, Peter Maydell,
	open list:ARM TCG CPUs

Signed-off-by: Joelle van Dyne <j@getutm.app>
---
 hw/arm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 256013ca80..6818c54787 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -11,6 +11,7 @@ config ARM_VIRT
     imply TPM_TIS_I2C
     imply NVDIMM
     imply IOMMUFD
+    imply VIRTIO_VGA
     select ARM_GIC
     select ACPI
     select ARM_SMMUV3
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH RFC 0/4] hvf: use TCG emulation to handle data aborts
  2025-02-09  3:32 [PATCH RFC 0/4] hvf: use TCG emulation to handle data aborts Joelle van Dyne
                   ` (3 preceding siblings ...)
  2025-02-09  3:32 ` [PATCH RFC 4/4] hw/arm/virt: enable VGA Joelle van Dyne
@ 2025-02-10 10:16 ` Peter Maydell
  4 siblings, 0 replies; 6+ messages in thread
From: Peter Maydell @ 2025-02-10 10:16 UTC (permalink / raw)
  To: Joelle van Dyne; +Cc: qemu-devel

On Sun, 9 Feb 2025 at 03:33, Joelle van Dyne <j@getutm.app> wrote:
>
> When the VM exits with an data abort, we check the ISV field in the ESR and when
> ISV=1, that means the processor has filled the remaining fields with information
> needed to determine the access that caused the abort: address, access width, and
> the register operand. However, only a limited set of instructions which can
> cause a data abort is nice enough for the processor to decode this way. Many
> instructions such as LDP/STP and SIMD can cause an data abort with ISV=0 and for
> that the hypervisor needs to manually decode the instruction, find the operands,
> and emulate the access.
>
> QEMU already ships with the ability to do this: TCG. However, TCG currently
> operates as a stand-alone accelerator. This patch set enables HVF to call into
> TCG when needed in order to perform a memory access that caused the abort.

So one problem with this is that it immediately puts all of TCG onto
the security boundary with the VM. We don't claim any kind of security
or can't-escape guarantees for TCG, and it's a lot of code, some of
which is old and some of which wasn't written with security as
a top-of-mind concern.

Our approach to these instructions with KVM on Arm is to say "don't
do those in the guest to MMIO regions". Most sensible guest code
doesn't do weird instruction forms for device accesses, and the
performance is going to be bad anyway if you need to fully emulate them.
(This includes in the past that Windows got fixed to not do this
kind of insn to a device in at least one case.)

> One thing this enables is the ability to use virtio-vga with Windows for ARM64.
> Currently, graphics support for Windows is flakey because you must first boot
> with ramfb to get to the desktop where you can then install the virtio-gpu
> drivers and then start up with virtio-gpu. Even then, there is a known issue
> where Windows mistakingly thinks there are two monitors connected because the
> boot display does not share a framebuffer with the GPU. This results in
> sometimes a black screen when updating Windows.

It's not really a good idea to use virtio-vga in an Arm VM,
because it requires FEAT_S2FWB in the host CPU to make it
work properly, and not every CPU has that, at least in the
KVM world. So you need to use virtio-gpu anyhow.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-02-10 10:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-09  3:32 [PATCH RFC 0/4] hvf: use TCG emulation to handle data aborts Joelle van Dyne
2025-02-09  3:32 ` [PATCH RFC 1/4] cpu-exec: support single-step without debug Joelle van Dyne
2025-02-09  3:32 ` [PATCH RFC 2/4] cpu-target: support emulation from non-TCG accels Joelle van Dyne
2025-02-09  3:32 ` [PATCH RFC 3/4] hvf: arm: emulate instruction when ISV=0 Joelle van Dyne
2025-02-09  3:32 ` [PATCH RFC 4/4] hw/arm/virt: enable VGA Joelle van Dyne
2025-02-10 10:16 ` [PATCH RFC 0/4] hvf: use TCG emulation to handle data aborts Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).