* [PATCH] target/arm/hvf: emulate ISV=0 data abort instructions
@ 2026-03-09 21:48 Lucas Amaral
2026-03-10 1:28 ` Mohamed Mediouni
2026-03-13 2:18 ` [PATCH v2 0/3] target/arm: ISV=0 data abort emulation library Lucas Amaral
0 siblings, 2 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-09 21:48 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, Lucas Amaral
On Apple Silicon, HVF exits with ISV=0 (no syndrome information)
for STP/LDP/STNP/LDNP, SIMD/FP load/stores, and DC cache
maintenance instructions that access MMIO regions. The existing
code asserted ISV!=0, crashing the VM.
Decode the faulting instruction from guest memory and emulate:
- Load/Store Pair (STP/LDP/STNP/LDNP) for GPR and SIMD registers
- Single Load/Store with writeback and SIMD/FP variants
- DC system instructions (NOP on MMIO regions)
- LDPSW (sign-extending load pair)
For pair instructions, compute the effective virtual address from
the base register to handle page-straddling accesses correctly.
HPFAR_EL2 reports the faulting page, not the effective address,
so using the IPA directly would produce wrong results when an
STP straddles an HVF-mapped / MMIO boundary.
Tested with virtio-gpu Venus blob resources on macOS ARM64.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/hvf/hvf.c | 309 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 306 insertions(+), 3 deletions(-)
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index d79469c..87ddcdb 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -1871,10 +1871,313 @@ static int hvf_handle_exception(CPUState *cpu, hv_vcpu_exit_exception_t *excp)
assert(!s1ptw);
/*
- * TODO: ISV will be 0 for SIMD or SVE accesses.
- * Inject the exception into the guest.
+ * ISV=0: syndrome doesn't carry access size/register info.
+ * This happens for STP/LDP/STNP/LDNP, SIMD/SVE load/stores,
+ * and DC (data cache) maintenance instructions.
+ *
+ * Sync all CPU state (including TTBR/TCR/SCTLR for page table
+ * walk) and decode the faulting instruction from guest memory.
*/
- assert(isv);
+ if (!isv) {
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ uint32_t insn;
+
+ /*
+ * Sync system registers (TTBR, TCR, SCTLR, etc.) from HVF
+ * so cpu_memory_rw_debug can walk guest page tables.
+ */
+ cpu_synchronize_state(cpu);
+
+ if (cpu_memory_rw_debug(cpu, env->pc,
+ (uint8_t *)&insn, 4, false) != 0) {
+ error_report("HVF: ISV=0 at ipa=0x%" PRIx64
+ " -- cannot read insn at pc=0x%" PRIx64,
+ ipa, (uint64_t)env->pc);
+ goto isv0_inject_fault;
+ }
+ insn = le32_to_cpu(insn);
+
+ /*
+ * System instructions (DC CIVAC, DC CVAC, etc.):
+ * bits [31:22] = 1101010100 identifies MRS/MSR/SYS class.
+ * Cache maintenance on MMIO regions is a harmless NOP.
+ */
+ if ((insn & 0xFFC00000) == 0xD5000000) {
+ advance_pc = true;
+ break;
+ }
+
+ /*
+ * Load/Store Pair (STP/LDP/STNP/LDNP):
+ * bits [29:27] = 101 identifies this instruction class.
+ * Supports both integer (GPR) and SIMD/FP register pairs.
+ */
+ if ((insn & 0x38000000) == 0x28000000) {
+ uint32_t opc = extract32(insn, 30, 2);
+ bool is_vec = extract32(insn, 26, 1);
+ bool is_load = extract32(insn, 22, 1);
+ uint32_t rt = extract32(insn, 0, 5);
+ uint32_t rt2 = extract32(insn, 10, 5);
+ uint32_t rn = extract32(insn, 5, 5);
+ uint32_t type = extract32(insn, 23, 3);
+ bool writeback = (type == 1 || type == 3);
+ uint32_t esize;
+ int32_t imm7 = sextract32(insn, 15, 7);
+
+ if (!is_vec) {
+ esize = (opc & 2) ? 8 : 4;
+ } else {
+ esize = 4u << opc; /* 4, 8, or 16 bytes */
+ }
+
+ int64_t stp_offset = (int64_t)imm7 * esize;
+
+ /*
+ * Compute the effective virtual address from the base
+ * register and immediate offset. HPFAR_EL2 reports
+ * the faulting page, not the effective address, so we
+ * must derive the VA from the instruction encoding.
+ *
+ * After cpu_synchronize_state(), env->xregs[0..30] are
+ * GPRs and env->xregs[31] is the current SP (restored
+ * via aarch64_restore_sp).
+ *
+ * Using the VA with cpu_memory_rw_debug() correctly
+ * splits page-straddling accesses via guest page tables.
+ */
+ uint64_t rn_va = env->xregs[rn];
+ /* post-index: access at unmodified base */
+ uint64_t va = (type == 1) ? rn_va : rn_va + stp_offset;
+
+ if (is_load == iswrite) {
+ error_report("HVF: ISV=0 load/write mismatch at "
+ "ipa=0x%" PRIx64, ipa);
+ goto isv0_inject_fault;
+ }
+
+ if (iswrite) {
+ /* Store pair */
+ if (!is_vec) {
+ uint64_t val1 = env->xregs[rt];
+ uint64_t val2 = env->xregs[rt2];
+ uint8_t buf[16]; /* max 2 x 8 bytes */
+ memcpy(buf, &val1, esize);
+ memcpy(buf + esize, &val2, esize);
+ cpu_memory_rw_debug(cpu, va, buf,
+ 2 * esize, true);
+ } else {
+ /*
+ * SIMD STP: register data is in env->vfp.zregs[]
+ * after cpu_synchronize_state().
+ * esize=4: S reg, esize=8: D reg, esize=16: Q reg
+ */
+ uint8_t buf[32]; /* max 2 x 16 bytes */
+ memcpy(buf, &env->vfp.zregs[rt], esize);
+ memcpy(buf + esize,
+ &env->vfp.zregs[rt2], esize);
+ cpu_memory_rw_debug(cpu, va, buf,
+ 2 * esize, true);
+ }
+ } else {
+ /* Load pair */
+ if (!is_vec) {
+ uint64_t val1 = 0, val2 = 0;
+ uint8_t buf[16];
+ memset(buf, 0, sizeof(buf));
+ cpu_memory_rw_debug(cpu, va, buf,
+ 2 * esize, false);
+ memcpy(&val1, buf, esize);
+ memcpy(&val2, buf + esize, esize);
+ if (opc == 1 && !is_vec) {
+ /* LDPSW: sign-extend 32-bit to 64-bit */
+ val1 = (int64_t)(int32_t)val1;
+ val2 = (int64_t)(int32_t)val2;
+ }
+ hvf_set_reg(cpu, rt, val1);
+ hvf_set_reg(cpu, rt2, val2);
+ } else {
+ /* SIMD LDP */
+ uint8_t buf[32];
+ memset(buf, 0, sizeof(buf));
+ cpu_memory_rw_debug(cpu, va, buf,
+ 2 * esize, false);
+ memset(&env->vfp.zregs[rt], 0,
+ sizeof(env->vfp.zregs[rt]));
+ memset(&env->vfp.zregs[rt2], 0,
+ sizeof(env->vfp.zregs[rt2]));
+ memcpy(&env->vfp.zregs[rt], buf, esize);
+ memcpy(&env->vfp.zregs[rt2],
+ buf + esize, esize);
+ cpu->vcpu_dirty = true;
+ }
+ }
+
+ /* Handle base register writeback (pre/post-index) */
+ if (writeback) {
+ env->xregs[rn] = env->xregs[rn] + stp_offset;
+ cpu->vcpu_dirty = true;
+ }
+
+ advance_pc = true;
+ break;
+ }
+
+ /*
+ * Load/Store Register (single):
+ * bits [29:27] = 111, bit [25] = 0.
+ * Covers immediate (unscaled, post-index, pre-index),
+ * unsigned offset, and register offset variants.
+ *
+ * ISV=0 for: writeback variants (pre/post-indexed) and
+ * all SIMD/FP loads/stores.
+ */
+ if ((insn & 0x3A000000) == 0x38000000) {
+ uint32_t size_field = extract32(insn, 30, 2);
+ bool is_vec = extract32(insn, 26, 1);
+ uint32_t opc = extract32(insn, 22, 2);
+ bool is_unsigned = extract32(insn, 24, 1);
+ bool bit21 = extract32(insn, 21, 1);
+ uint32_t rn = extract32(insn, 5, 5);
+ uint32_t rt = extract32(insn, 0, 5);
+ uint32_t sub_type = extract32(insn, 10, 2);
+ bool is_reg = !is_unsigned && bit21;
+
+ /*
+ * [24]=0, [21]=1, [11:10]!=10 could be atomic ops
+ * (LDADD, SWP, CAS, etc.) -- not handled.
+ */
+ if (is_reg && sub_type != 2) {
+ goto isv0_inject_fault;
+ }
+
+ bool writeback = !is_unsigned && !is_reg
+ && (sub_type == 1 || sub_type == 3);
+
+ uint32_t esize;
+ bool is_load;
+ bool is_signed = false;
+ uint32_t sign_extend_to = 0;
+
+ if (!is_vec) {
+ esize = 1u << size_field;
+ switch (opc) {
+ case 0: /* STR */
+ is_load = false;
+ break;
+ case 1: /* LDR */
+ is_load = true;
+ break;
+ case 2:
+ is_load = true; /* LDRS->64 */
+ is_signed = true;
+ sign_extend_to = 8;
+ break;
+ case 3:
+ if (size_field == 3) {
+ /* PRFM -- prefetch is NOP on MMIO */
+ advance_pc = true;
+ goto isv0_done;
+ }
+ is_load = true; /* LDRS->32 */
+ is_signed = true;
+ sign_extend_to = 4;
+ break;
+ }
+ } else {
+ /* SIMD/FP: size+opc determines element width */
+ is_load = (opc & 1);
+ if (opc >= 2 && size_field == 0) {
+ esize = 16; /* Q register (128-bit) */
+ } else if (opc < 2) {
+ esize = 1u << size_field;
+ } else {
+ goto isv0_inject_fault;
+ }
+ }
+
+ if (is_load == iswrite) {
+ error_report("HVF: ISV=0 LDR/STR load/write mismatch "
+ "at ipa=0x%" PRIx64, ipa);
+ goto isv0_inject_fault;
+ }
+
+ /* Perform memory access */
+ if (!is_load) {
+ if (!is_vec) {
+ uint64_t val = hvf_get_reg(cpu, rt);
+ address_space_write(as, ipa,
+ MEMTXATTRS_UNSPECIFIED,
+ &val, esize);
+ } else {
+ address_space_write(as, ipa,
+ MEMTXATTRS_UNSPECIFIED,
+ &env->vfp.zregs[rt], esize);
+ }
+ } else {
+ if (!is_vec) {
+ uint64_t val = 0;
+ address_space_read(as, ipa,
+ MEMTXATTRS_UNSPECIFIED,
+ &val, esize);
+ if (is_signed) {
+ switch (esize) {
+ case 1:
+ val = (int64_t)(int8_t)val;
+ break;
+ case 2:
+ val = (int64_t)(int16_t)val;
+ break;
+ case 4:
+ val = (int64_t)(int32_t)val;
+ break;
+ }
+ if (sign_extend_to == 4) {
+ val &= 0xFFFFFFFF;
+ }
+ }
+ hvf_set_reg(cpu, rt, val);
+ } else {
+ /* SIMD/FP load */
+ memset(&env->vfp.zregs[rt], 0,
+ sizeof(env->vfp.zregs[rt]));
+ address_space_read(as, ipa,
+ MEMTXATTRS_UNSPECIFIED,
+ &env->vfp.zregs[rt], esize);
+ cpu->vcpu_dirty = true;
+ }
+ }
+
+ /* Base register writeback (post/pre-indexed) */
+ if (writeback) {
+ int32_t imm9 = sextract32(insn, 12, 9);
+ env->xregs[rn] = env->xregs[rn] + imm9;
+ cpu->vcpu_dirty = true;
+ }
+
+ advance_pc = true;
+ goto isv0_done;
+ }
+
+isv0_inject_fault:
+ /*
+ * Inject data abort into guest for unrecognized or
+ * inconsistent ISV=0 instructions. The guest kernel
+ * will deliver SIGBUS to the faulting process.
+ */
+ {
+ int target_el = 1;
+ bool same_el = arm_current_el(env) == target_el;
+ uint32_t esr = syn_data_abort_no_iss(same_el,
+ /*fnv=*/1, /*ea=*/0, /*cm=*/0,
+ /*s1ptw=*/0, iswrite, /*fsc=*/0x10);
+ env->exception.vaddress = ipa;
+ hvf_raise_exception(cpu, EXCP_DATA_ABORT,
+ esr, target_el);
+ }
+isv0_done:
+ break;
+ }
/*
* Emulate MMIO.
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] target/arm/hvf: emulate ISV=0 data abort instructions
2026-03-09 21:48 [PATCH] target/arm/hvf: emulate ISV=0 data abort instructions Lucas Amaral
@ 2026-03-10 1:28 ` Mohamed Mediouni
2026-03-10 9:23 ` Peter Maydell
2026-03-13 2:18 ` [PATCH v2 0/3] target/arm: ISV=0 data abort emulation library Lucas Amaral
1 sibling, 1 reply; 25+ messages in thread
From: Mohamed Mediouni @ 2026-03-10 1:28 UTC (permalink / raw)
To: Lucas Amaral; +Cc: qemu-devel, qemu-arm, agraf
> On 9. Mar 2026, at 22:48, Lucas Amaral <lucaaamaral@gmail.com> wrote:
>
> On Apple Silicon, HVF exits with ISV=0 (no syndrome information)
> for STP/LDP/STNP/LDNP, SIMD/FP load/stores, and DC cache
> maintenance instructions that access MMIO regions. The existing
> code asserted ISV!=0, crashing the VM.
>
> Decode the faulting instruction from guest memory and emulate:
> - Load/Store Pair (STP/LDP/STNP/LDNP) for GPR and SIMD registers
> - Single Load/Store with writeback and SIMD/FP variants
> - DC system instructions (NOP on MMIO regions)
> - LDPSW (sign-extending load pair)
>
> For pair instructions, compute the effective virtual address from
> the base register to handle page-straddling accesses correctly.
> HPFAR_EL2 reports the faulting page, not the effective address,
> so using the IPA directly would produce wrong results when an
> STP straddles an HVF-mapped / MMIO boundary.
>
> Tested with virtio-gpu Venus blob resources on macOS ARM64.
>
> Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
> —
Hello,
The idea is good, but this specific implementation is NAK for me.
This ought to be in common code target/i386/emulate style instead of in specific HVF code.
Oh, and using something that looks more like a regular decoder perhaps.
Thank you,
-Mohamed
> target/arm/hvf/hvf.c | 309 ++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 306 insertions(+), 3 deletions(-)
>
> diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
> index d79469c..87ddcdb 100644
> --- a/target/arm/hvf/hvf.c
> +++ b/target/arm/hvf/hvf.c
> @@ -1871,10 +1871,313 @@ static int hvf_handle_exception(CPUState *cpu, hv_vcpu_exit_exception_t *excp)
> assert(!s1ptw);
>
> /*
> - * TODO: ISV will be 0 for SIMD or SVE accesses.
> - * Inject the exception into the guest.
> + * ISV=0: syndrome doesn't carry access size/register info.
> + * This happens for STP/LDP/STNP/LDNP, SIMD/SVE load/stores,
> + * and DC (data cache) maintenance instructions.
> + *
> + * Sync all CPU state (including TTBR/TCR/SCTLR for page table
> + * walk) and decode the faulting instruction from guest memory.
> */
> - assert(isv);
> + if (!isv) {
> + ARMCPU *arm_cpu = ARM_CPU(cpu);
> + CPUARMState *env = &arm_cpu->env;
> + uint32_t insn;
> +
> + /*
> + * Sync system registers (TTBR, TCR, SCTLR, etc.) from HVF
> + * so cpu_memory_rw_debug can walk guest page tables.
> + */
> + cpu_synchronize_state(cpu);
> +
> + if (cpu_memory_rw_debug(cpu, env->pc,
> + (uint8_t *)&insn, 4, false) != 0) {
> + error_report("HVF: ISV=0 at ipa=0x%" PRIx64
> + " -- cannot read insn at pc=0x%" PRIx64,
> + ipa, (uint64_t)env->pc);
> + goto isv0_inject_fault;
> + }
> + insn = le32_to_cpu(insn);
> +
> + /*
> + * System instructions (DC CIVAC, DC CVAC, etc.):
> + * bits [31:22] = 1101010100 identifies MRS/MSR/SYS class.
> + * Cache maintenance on MMIO regions is a harmless NOP.
> + */
> + if ((insn & 0xFFC00000) == 0xD5000000) {
> + advance_pc = true;
> + break;
> + }
> +
> + /*
> + * Load/Store Pair (STP/LDP/STNP/LDNP):
> + * bits [29:27] = 101 identifies this instruction class.
> + * Supports both integer (GPR) and SIMD/FP register pairs.
> + */
> + if ((insn & 0x38000000) == 0x28000000) {
> + uint32_t opc = extract32(insn, 30, 2);
> + bool is_vec = extract32(insn, 26, 1);
> + bool is_load = extract32(insn, 22, 1);
> + uint32_t rt = extract32(insn, 0, 5);
> + uint32_t rt2 = extract32(insn, 10, 5);
> + uint32_t rn = extract32(insn, 5, 5);
> + uint32_t type = extract32(insn, 23, 3);
> + bool writeback = (type == 1 || type == 3);
> + uint32_t esize;
> + int32_t imm7 = sextract32(insn, 15, 7);
> +
> + if (!is_vec) {
> + esize = (opc & 2) ? 8 : 4;
> + } else {
> + esize = 4u << opc; /* 4, 8, or 16 bytes */
> + }
> +
> + int64_t stp_offset = (int64_t)imm7 * esize;
> +
> + /*
> + * Compute the effective virtual address from the base
> + * register and immediate offset. HPFAR_EL2 reports
> + * the faulting page, not the effective address, so we
> + * must derive the VA from the instruction encoding.
> + *
> + * After cpu_synchronize_state(), env->xregs[0..30] are
> + * GPRs and env->xregs[31] is the current SP (restored
> + * via aarch64_restore_sp).
> + *
> + * Using the VA with cpu_memory_rw_debug() correctly
> + * splits page-straddling accesses via guest page tables.
> + */
> + uint64_t rn_va = env->xregs[rn];
> + /* post-index: access at unmodified base */
> + uint64_t va = (type == 1) ? rn_va : rn_va + stp_offset;
> +
> + if (is_load == iswrite) {
> + error_report("HVF: ISV=0 load/write mismatch at "
> + "ipa=0x%" PRIx64, ipa);
> + goto isv0_inject_fault;
> + }
> +
> + if (iswrite) {
> + /* Store pair */
> + if (!is_vec) {
> + uint64_t val1 = env->xregs[rt];
> + uint64_t val2 = env->xregs[rt2];
> + uint8_t buf[16]; /* max 2 x 8 bytes */
> + memcpy(buf, &val1, esize);
> + memcpy(buf + esize, &val2, esize);
> + cpu_memory_rw_debug(cpu, va, buf,
> + 2 * esize, true);
> + } else {
> + /*
> + * SIMD STP: register data is in env->vfp.zregs[]
> + * after cpu_synchronize_state().
> + * esize=4: S reg, esize=8: D reg, esize=16: Q reg
> + */
> + uint8_t buf[32]; /* max 2 x 16 bytes */
> + memcpy(buf, &env->vfp.zregs[rt], esize);
> + memcpy(buf + esize,
> + &env->vfp.zregs[rt2], esize);
> + cpu_memory_rw_debug(cpu, va, buf,
> + 2 * esize, true);
> + }
> + } else {
> + /* Load pair */
> + if (!is_vec) {
> + uint64_t val1 = 0, val2 = 0;
> + uint8_t buf[16];
> + memset(buf, 0, sizeof(buf));
> + cpu_memory_rw_debug(cpu, va, buf,
> + 2 * esize, false);
> + memcpy(&val1, buf, esize);
> + memcpy(&val2, buf + esize, esize);
> + if (opc == 1 && !is_vec) {
> + /* LDPSW: sign-extend 32-bit to 64-bit */
> + val1 = (int64_t)(int32_t)val1;
> + val2 = (int64_t)(int32_t)val2;
> + }
> + hvf_set_reg(cpu, rt, val1);
> + hvf_set_reg(cpu, rt2, val2);
> + } else {
> + /* SIMD LDP */
> + uint8_t buf[32];
> + memset(buf, 0, sizeof(buf));
> + cpu_memory_rw_debug(cpu, va, buf,
> + 2 * esize, false);
> + memset(&env->vfp.zregs[rt], 0,
> + sizeof(env->vfp.zregs[rt]));
> + memset(&env->vfp.zregs[rt2], 0,
> + sizeof(env->vfp.zregs[rt2]));
> + memcpy(&env->vfp.zregs[rt], buf, esize);
> + memcpy(&env->vfp.zregs[rt2],
> + buf + esize, esize);
> + cpu->vcpu_dirty = true;
> + }
> + }
> +
> + /* Handle base register writeback (pre/post-index) */
> + if (writeback) {
> + env->xregs[rn] = env->xregs[rn] + stp_offset;
> + cpu->vcpu_dirty = true;
> + }
> +
> + advance_pc = true;
> + break;
> + }
> +
> + /*
> + * Load/Store Register (single):
> + * bits [29:27] = 111, bit [25] = 0.
> + * Covers immediate (unscaled, post-index, pre-index),
> + * unsigned offset, and register offset variants.
> + *
> + * ISV=0 for: writeback variants (pre/post-indexed) and
> + * all SIMD/FP loads/stores.
> + */
> + if ((insn & 0x3A000000) == 0x38000000) {
> + uint32_t size_field = extract32(insn, 30, 2);
> + bool is_vec = extract32(insn, 26, 1);
> + uint32_t opc = extract32(insn, 22, 2);
> + bool is_unsigned = extract32(insn, 24, 1);
> + bool bit21 = extract32(insn, 21, 1);
> + uint32_t rn = extract32(insn, 5, 5);
> + uint32_t rt = extract32(insn, 0, 5);
> + uint32_t sub_type = extract32(insn, 10, 2);
> + bool is_reg = !is_unsigned && bit21;
> +
> + /*
> + * [24]=0, [21]=1, [11:10]!=10 could be atomic ops
> + * (LDADD, SWP, CAS, etc.) -- not handled.
> + */
> + if (is_reg && sub_type != 2) {
> + goto isv0_inject_fault;
> + }
> +
> + bool writeback = !is_unsigned && !is_reg
> + && (sub_type == 1 || sub_type == 3);
> +
> + uint32_t esize;
> + bool is_load;
> + bool is_signed = false;
> + uint32_t sign_extend_to = 0;
> +
> + if (!is_vec) {
> + esize = 1u << size_field;
> + switch (opc) {
> + case 0: /* STR */
> + is_load = false;
> + break;
> + case 1: /* LDR */
> + is_load = true;
> + break;
> + case 2:
> + is_load = true; /* LDRS->64 */
> + is_signed = true;
> + sign_extend_to = 8;
> + break;
> + case 3:
> + if (size_field == 3) {
> + /* PRFM -- prefetch is NOP on MMIO */
> + advance_pc = true;
> + goto isv0_done;
> + }
> + is_load = true; /* LDRS->32 */
> + is_signed = true;
> + sign_extend_to = 4;
> + break;
> + }
> + } else {
> + /* SIMD/FP: size+opc determines element width */
> + is_load = (opc & 1);
> + if (opc >= 2 && size_field == 0) {
> + esize = 16; /* Q register (128-bit) */
> + } else if (opc < 2) {
> + esize = 1u << size_field;
> + } else {
> + goto isv0_inject_fault;
> + }
> + }
> +
> + if (is_load == iswrite) {
> + error_report("HVF: ISV=0 LDR/STR load/write mismatch "
> + "at ipa=0x%" PRIx64, ipa);
> + goto isv0_inject_fault;
> + }
> +
> + /* Perform memory access */
> + if (!is_load) {
> + if (!is_vec) {
> + uint64_t val = hvf_get_reg(cpu, rt);
> + address_space_write(as, ipa,
> + MEMTXATTRS_UNSPECIFIED,
> + &val, esize);
> + } else {
> + address_space_write(as, ipa,
> + MEMTXATTRS_UNSPECIFIED,
> + &env->vfp.zregs[rt], esize);
> + }
> + } else {
> + if (!is_vec) {
> + uint64_t val = 0;
> + address_space_read(as, ipa,
> + MEMTXATTRS_UNSPECIFIED,
> + &val, esize);
> + if (is_signed) {
> + switch (esize) {
> + case 1:
> + val = (int64_t)(int8_t)val;
> + break;
> + case 2:
> + val = (int64_t)(int16_t)val;
> + break;
> + case 4:
> + val = (int64_t)(int32_t)val;
> + break;
> + }
> + if (sign_extend_to == 4) {
> + val &= 0xFFFFFFFF;
> + }
> + }
> + hvf_set_reg(cpu, rt, val);
> + } else {
> + /* SIMD/FP load */
> + memset(&env->vfp.zregs[rt], 0,
> + sizeof(env->vfp.zregs[rt]));
> + address_space_read(as, ipa,
> + MEMTXATTRS_UNSPECIFIED,
> + &env->vfp.zregs[rt], esize);
> + cpu->vcpu_dirty = true;
> + }
> + }
> +
> + /* Base register writeback (post/pre-indexed) */
> + if (writeback) {
> + int32_t imm9 = sextract32(insn, 12, 9);
> + env->xregs[rn] = env->xregs[rn] + imm9;
> + cpu->vcpu_dirty = true;
> + }
> +
> + advance_pc = true;
> + goto isv0_done;
> + }
> +
> +isv0_inject_fault:
> + /*
> + * Inject data abort into guest for unrecognized or
> + * inconsistent ISV=0 instructions. The guest kernel
> + * will deliver SIGBUS to the faulting process.
> + */
> + {
> + int target_el = 1;
> + bool same_el = arm_current_el(env) == target_el;
> + uint32_t esr = syn_data_abort_no_iss(same_el,
> + /*fnv=*/1, /*ea=*/0, /*cm=*/0,
> + /*s1ptw=*/0, iswrite, /*fsc=*/0x10);
> + env->exception.vaddress = ipa;
> + hvf_raise_exception(cpu, EXCP_DATA_ABORT,
> + esr, target_el);
> + }
> +isv0_done:
> + break;
> + }
>
> /*
> * Emulate MMIO.
> --
> 2.52.0
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] target/arm/hvf: emulate ISV=0 data abort instructions
2026-03-10 1:28 ` Mohamed Mediouni
@ 2026-03-10 9:23 ` Peter Maydell
0 siblings, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2026-03-10 9:23 UTC (permalink / raw)
To: Mohamed Mediouni; +Cc: Lucas Amaral, qemu-devel, qemu-arm, agraf
On Tue, 10 Mar 2026 at 01:29, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
>
>
>
> > On 9. Mar 2026, at 22:48, Lucas Amaral <lucaaamaral@gmail.com> wrote:
> >
> > On Apple Silicon, HVF exits with ISV=0 (no syndrome information)
> > for STP/LDP/STNP/LDNP, SIMD/FP load/stores, and DC cache
> > maintenance instructions that access MMIO regions. The existing
> > code asserted ISV!=0, crashing the VM.
> >
> > Decode the faulting instruction from guest memory and emulate:
> > - Load/Store Pair (STP/LDP/STNP/LDNP) for GPR and SIMD registers
> > - Single Load/Store with writeback and SIMD/FP variants
> > - DC system instructions (NOP on MMIO regions)
> > - LDPSW (sign-extending load pair)
> >
> > For pair instructions, compute the effective virtual address from
> > the base register to handle page-straddling accesses correctly.
> > HPFAR_EL2 reports the faulting page, not the effective address,
> > so using the IPA directly would produce wrong results when an
> > STP straddles an HVF-mapped / MMIO boundary.
> >
> > Tested with virtio-gpu Venus blob resources on macOS ARM64.
> >
> > Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
> > —
> Hello,
>
> The idea is good, but this specific implementation is NAK for me.
>
> This ought to be in common code target/i386/emulate style instead of in specific HVF code.
>
> Oh, and using something that looks more like a regular decoder perhaps.
Yes, if we're going to do this rather than simply saying "fix your
guest to be more virtualization friendly" (which is the approach
we have so far taken with KVM) then we must:
* be hypervisor agnostic -- KVM also could use this code,
and I suspect whpx would like it
* use a decodetree file to do the initial decode
(you can probably borrow the patterns from the TCG
decodetree files; don't try to actually share the files,
that will be too complicated)
thanks
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 0/3] target/arm: ISV=0 data abort emulation library
2026-03-09 21:48 [PATCH] target/arm/hvf: emulate ISV=0 data abort instructions Lucas Amaral
2026-03-10 1:28 ` Mohamed Mediouni
@ 2026-03-13 2:18 ` Lucas Amaral
2026-03-13 2:18 ` [PATCH v2 1/3] target/arm: add AArch64 ISV=0 instruction " Lucas Amaral
` (3 more replies)
1 sibling, 4 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-13 2:18 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, Lucas Amaral
When a guest triggers a data abort with ISV=0 (e.g. STP, LDP, SIMD/FP
load/store, writeback addressing, atomics, exclusives), the ESR syndrome
does not carry the access size or target register, so the hypervisor
cannot emulate MMIO without decoding the faulting instruction.
v1 handled this inside HVF with a hand-written decoder. Based on review
feedback from Mohamed Mediouni and Peter Maydell, v2 restructures the
implementation as:
- A shared emulation library in target/arm/emulate/ with a decodetree
decoder (a64-ldst.decode), usable by any hypervisor backend.
- A callback-based interface (struct arm_emul_ops) that abstracts
register and memory access, keeping the library hypervisor-agnostic.
- HVF and WHPX backends wired as the first two consumers.
Instruction classes handled (DDI 0487):
- Load/store pair: STP, LDP, STNP, LDNP, STGP, LDPSW (C3.3.14-16)
- SIMD/FP load/store pair and single (C3.3.10, C3.3.14-16)
- All immediate addressing: unscaled, post/pre-index, unsigned offset
- Register offset addressing with extend (C3.3.9)
- Exclusives: STXR, LDXR, STXP, LDXP (C3.3.6)
- Atomics: LDADD, LDCLR, LDEOR, LDSET, LDSMAX/MIN, LDUMAX/MIN, SWP
- Compare-and-swap: CAS, CASP (C3.3.1)
- LDRAA/LDRAB with FEAT_PAuth (C6.2.121)
- PRFM, DC maintenance (as NOPs)
Intentionally omitted (not observed in ISV=0 MMIO traps during testing):
- AdvSIMD structure loads/stores (LD1/ST1 etc.)
- MTE load/stores (FEAT_MTE)
- 128-bit atomics (FEAT_LSE128)
- MOPS (FEAT_MOPS)
KVM NISV handling is a natural follow-up -- it requires similar
arm_emul_ops callbacks using KVM vcpu ioctls.
v1 -> v2:
- Moved from HVF-specific inline decoder to shared library
in target/arm/emulate/ (Mohamed Mediouni)
- Added decodetree decoder for structured instruction parsing
(Peter Maydell)
- Made hypervisor-agnostic; wired HVF and WHPX (Peter Maydell)
- Added CASP register-pair validation (odd/r31 -> UNHANDLED)
- Added unit tests (19 test cases)
- Split into 3 patches for reviewability
Lucas Amaral (3):
target/arm: add AArch64 ISV=0 instruction emulation library
tests: add unit tests for ISV=0 emulation library
target/arm: wire ISV=0 emulation into HVF and WHPX
target/arm/emulate/a64-ldst.decode | 293 ++++++++++++
target/arm/emulate/arm_emulate.c | 738 +++++++++++++++++++++++++++++
target/arm/emulate/arm_emulate.h | 55 +++
target/arm/emulate/meson.build | 16 +
target/arm/hvf/hvf.c | 94 +++-
target/arm/meson.build | 1 +
target/arm/whpx/whpx-all.c | 86 +++-
tests/unit/meson.build | 1 +
tests/unit/test-arm-emulate.c | 540 +++++++++++++++++++++
9 files changed, 1820 insertions(+), 4 deletions(-)
create mode 100644 target/arm/emulate/a64-ldst.decode
create mode 100644 target/arm/emulate/arm_emulate.c
create mode 100644 target/arm/emulate/arm_emulate.h
create mode 100644 target/arm/emulate/meson.build
create mode 100644 tests/unit/test-arm-emulate.c
--
2.52.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 1/3] target/arm: add AArch64 ISV=0 instruction emulation library
2026-03-13 2:18 ` [PATCH v2 0/3] target/arm: ISV=0 data abort emulation library Lucas Amaral
@ 2026-03-13 2:18 ` Lucas Amaral
2026-03-13 6:33 ` Mohamed Mediouni
2026-03-13 8:59 ` Peter Maydell
2026-03-13 2:18 ` [PATCH v2 2/3] tests: add unit tests for ISV=0 " Lucas Amaral
` (2 subsequent siblings)
3 siblings, 2 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-13 2:18 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, Lucas Amaral
Add a shared emulation library in target/arm/emulate/ using a
decodetree decoder (a64-ldst.decode) and a callback-based interface
(struct arm_emul_ops) that any hypervisor backend can implement.
The hypervisor cannot emulate ISV=0 data aborts without decoding the
faulting instruction, since the ESR syndrome does not carry the access
size or target register.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 293 ++++++++++++
target/arm/emulate/arm_emulate.c | 738 +++++++++++++++++++++++++++++
target/arm/emulate/arm_emulate.h | 55 +++
target/arm/emulate/meson.build | 16 +
target/arm/meson.build | 1 +
5 files changed, 1103 insertions(+)
create mode 100644 target/arm/emulate/a64-ldst.decode
create mode 100644 target/arm/emulate/arm_emulate.c
create mode 100644 target/arm/emulate/arm_emulate.h
create mode 100644 target/arm/emulate/meson.build
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
new file mode 100644
index 0000000..9a7b697
--- /dev/null
+++ b/target/arm/emulate/a64-ldst.decode
@@ -0,0 +1,293 @@
+# AArch64 load/store instruction patterns for ISV=0 emulation
+#
+# Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+### Argument sets
+
+# Load/store exclusive
+&stxr rn rt rt2 rs sz lasr
+
+# Load/store pair (GPR and SIMD/FP)
+&ldstpair rt2 rt rn imm sz sign w p
+
+# Load/store immediate (unscaled, pre/post-index, unprivileged, unsigned offset)
+# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
+&ldst_imm rt rn imm sz sign w p unpriv ext u
+
+# Load/store register offset
+&ldst rm rn rt sign ext sz opt s
+
+# Atomic memory operations
+&atomic rs rn rt a r sz
+
+# Compare-and-swap
+&cas rs rn rt sz a r
+
+# Load with PAC (LDRAA/LDRAB, FEAT_PAuth)
+%ldra_imm 22:s1 12:9
+&ldra rt rn imm m w
+
+### Format templates
+
+# Exclusives
+@stxr sz:2 ...... ... rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr
+
+# Load/store pair: imm7 is signed, scaled by element size in handler
+@ldstpair .. ... . ... . imm:s7 rt2:5 rn:5 rt:5 &ldstpair
+
+# Load/store immediate (9-bit signed)
+@ldst_imm .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=0
+@ldst_imm_pre .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=1
+@ldst_imm_post .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=1 w=1
+@ldst_imm_user .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=1 p=0 w=0
+
+# Load/store unsigned offset (12-bit, handler scales by << sz)
+@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+
+# Load/store register offset
+@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
+
+# Atomics
+@atomic sz:2 ... . .. a:1 r:1 . rs:5 . ... .. rn:5 rt:5 &atomic
+
+# Compare-and-swap: sz extracted by pattern (CAS) or set constant (CASP)
+@cas .. ...... . a:1 . rs:5 r:1 ..... rn:5 rt:5 &cas
+
+# Load with PAC
+@ldra .. ... . .. m:1 . . ......... w:1 . rn:5 rt:5 &ldra imm=%ldra_imm
+
+### Load/store exclusive
+
+# STXR / STLXR (sz encodes 8/16/32/64-bit)
+STXR .. 001000 000 ..... . ..... ..... ..... @stxr
+
+# LDXR / LDAXR
+LDXR .. 001000 010 ..... . ..... ..... ..... @stxr
+
+# STXP / STLXP (bit[31]=1, bit[30]=sf → sz=2 for 32-bit, sz=3 for 64-bit)
+STXP 10 001000 001 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=2
+STXP 11 001000 001 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=3
+
+# LDXP / LDAXP
+LDXP 10 001000 011 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=2
+LDXP 11 001000 011 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=3
+
+### Compare-and-swap
+
+# CAS / CASA / CASAL / CASL
+CAS sz:2 001000 1 . 1 ..... . 11111 ..... ..... @cas
+
+# CASP / CASPA / CASPAL / CASPL (pair: Rt,Rt+1 and Rs,Rs+1)
+CASP 00 001000 0 . 1 ..... . 11111 ..... ..... @cas sz=2
+CASP 01 001000 0 . 1 ..... . 11111 ..... ..... @cas sz=3
+
+### Load/store pair — non-temporal (STNP/LDNP)
+
+# STNP/LDNP: offset only, no writeback. Non-temporal hint ignored.
+STP 00 101 0 000 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 00 101 0 000 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP 10 101 0 000 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP 10 101 0 000 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 00 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP_v 00 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP_v 01 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP_v 01 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 10 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+LDP_v 10 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+
+### Load/store pair — post-indexed
+
+STP 00 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP 00 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP 01 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=1 w=1
+STP 10 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+LDP 10 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STP_v 00 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP_v 00 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+STP_v 01 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+LDP_v 01 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STP_v 10 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=1 w=1
+LDP_v 10 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=1 w=1
+
+### Load/store pair — signed offset
+
+STP 00 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 00 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 01 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=0 w=0
+STP 10 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP 10 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 00 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP_v 00 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP_v 01 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP_v 01 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 10 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+LDP_v 10 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+
+### Load/store pair — pre-indexed
+
+STP 00 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP 00 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP 01 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=0 w=1
+STP 10 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+LDP 10 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+STP_v 00 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP_v 00 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+STP_v 01 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+LDP_v 01 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+STP_v 10 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=1
+LDP_v 10 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=1
+
+### Load/store pair — STGP (store allocation tag + pair)
+
+STGP 01 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STGP 01 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STGP 01 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+
+### Load/store register — unscaled immediate (LDUR/STUR)
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=4
+
+### Load/store register — post-indexed
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=4
+
+### Load/store register — unprivileged
+
+# GPR only (no SIMD/FP unprivileged forms)
+STR_i sz:2 111 0 00 00 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=1 sz=1
+
+### Load/store register — pre-indexed
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=4
+
+### PRFM — unscaled immediate: prefetch is a NOP
+
+NOP 11 111 0 00 10 0 --------- 00 ----- -----
+
+### Load/store register — unsigned offset
+
+# GPR
+STR_i sz:2 111 0 01 00 ............ ..... ..... @ldst_uimm sign=0 ext=0
+LDR_i 00 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=0
+LDR_i 01 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=1
+LDR_i 10 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=2
+LDR_i 11 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=3
+LDR_i 00 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=0
+LDR_i 01 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=1
+LDR_i 10 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=2
+LDR_i 00 111 0 01 11 ............ ..... ..... @ldst_uimm sign=1 ext=1 sz=0
+LDR_i 01 111 0 01 11 ............ ..... ..... @ldst_uimm sign=1 ext=1 sz=1
+
+# PRFM — unsigned offset
+NOP 11 111 0 01 10 ------------ ----- -----
+
+# SIMD/FP
+STR_v_i sz:2 111 1 01 00 ............ ..... ..... @ldst_uimm sign=0 ext=0
+STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
+LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+
+### Load/store register — register offset
+
+# GPR
+STR sz:2 111 0 00 00 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+LDR 00 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=0
+LDR 01 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=1
+LDR 10 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=2
+LDR 11 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=3
+LDR 00 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=0
+LDR 01 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=1
+LDR 10 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=2
+LDR 00 111 0 00 11 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=1 sz=0
+LDR 01 111 0 00 11 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=1 sz=1
+
+# PRFM — register offset
+NOP 11 111 0 00 10 1 ----- -1- - 10 ----- -----
+
+# SIMD/FP
+STR_v sz:2 111 1 00 00 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+STR_v 00 111 1 00 10 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+LDR_v sz:2 111 1 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+LDR_v 00 111 1 00 11 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+
+### Atomic memory operations
+
+LDADD .. 111 0 00 . . 1 ..... 0000 00 ..... ..... @atomic
+LDCLR .. 111 0 00 . . 1 ..... 0001 00 ..... ..... @atomic
+LDEOR .. 111 0 00 . . 1 ..... 0010 00 ..... ..... @atomic
+LDSET .. 111 0 00 . . 1 ..... 0011 00 ..... ..... @atomic
+LDSMAX .. 111 0 00 . . 1 ..... 0100 00 ..... ..... @atomic
+LDSMIN .. 111 0 00 . . 1 ..... 0101 00 ..... ..... @atomic
+LDUMAX .. 111 0 00 . . 1 ..... 0110 00 ..... ..... @atomic
+LDUMIN .. 111 0 00 . . 1 ..... 0111 00 ..... ..... @atomic
+SWP .. 111 0 00 . . 1 ..... 1000 00 ..... ..... @atomic
+
+### Load with PAC (FEAT_PAuth)
+
+# LDRAA (M=0) / LDRAB (M=1), offset (W=0) / pre-indexed (W=1)
+LDRA 11 111 0 00 . . 1 ......... . 1 ..... ..... @ldra
+
+### System instructions — DC cache maintenance
+
+# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
+# On MMIO regions, cache maintenance is a harmless no-op.
+NOP 1101 0101 0000 1 --- 0111 ---- --- -----
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
new file mode 100644
index 0000000..cd8f44d
--- /dev/null
+++ b/target/arm/emulate/arm_emulate.c
@@ -0,0 +1,738 @@
+/*
+ * AArch64 instruction emulation for ISV=0 data aborts
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "arm_emulate.h"
+#include "qemu/bitops.h"
+#include "qemu/error-report.h"
+
+/* Named "DisasContext" as required by the decodetree code generator */
+typedef struct {
+ CPUState *cpu;
+ const struct arm_emul_ops *ops;
+ ArmEmulResult result;
+} DisasContext;
+
+#include "decode-a64-ldst.c.inc"
+
+/* GPR data access (Rt, Rs, Rt2) -- register 31 = XZR */
+
+static uint64_t gpr_read(DisasContext *ctx, int reg)
+{
+ if (reg == 31) {
+ return 0; /* XZR */
+ }
+ return ctx->ops->read_gpr(ctx->cpu, reg);
+}
+
+static void gpr_write(DisasContext *ctx, int reg, uint64_t val)
+{
+ if (reg == 31) {
+ return; /* XZR -- discard */
+ }
+ ctx->ops->write_gpr(ctx->cpu, reg, val);
+}
+
+/* Base register access (Rn) -- register 31 = SP */
+
+static uint64_t base_read(DisasContext *ctx, int rn)
+{
+ return ctx->ops->read_gpr(ctx->cpu, rn);
+}
+
+static void base_write(DisasContext *ctx, int rn, uint64_t val)
+{
+ ctx->ops->write_gpr(ctx->cpu, rn, val);
+}
+
+/* Memory access wrappers */
+
+static int mem_read(DisasContext *ctx, uint64_t va, void *buf, int size)
+{
+ int ret = ctx->ops->read_mem(ctx->cpu, va, buf, size);
+ if (ret != 0) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ }
+ return ret;
+}
+
+static int mem_write(DisasContext *ctx, uint64_t va, const void *buf, int size)
+{
+ int ret = ctx->ops->write_mem(ctx->cpu, va, buf, size);
+ if (ret != 0) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ }
+ return ret;
+}
+
+/* Sign/zero extension helpers */
+
+static uint64_t sign_extend(uint64_t val, int from_bits)
+{
+ int shift = 64 - from_bits;
+ return (int64_t)(val << shift) >> shift;
+}
+
+/* Apply sign/zero extension */
+static uint64_t load_extend(uint64_t val, int sz, int sign, int ext)
+{
+ int data_bits = 8 << sz;
+
+ if (sign) {
+ val = sign_extend(val, data_bits);
+ if (ext) {
+ /* Sign-extend to 32 bits (W register) */
+ val &= 0xFFFFFFFF;
+ }
+ } else if (ext) {
+ /* Zero-extend to 32 bits (W register) */
+ val &= 0xFFFFFFFF;
+ }
+ return val;
+}
+
+/* Register offset extension (DDI 0487 C6.2.131) */
+
+static uint64_t extend_reg(uint64_t val, int option, int shift)
+{
+ switch (option) {
+ case 0: /* UXTB */
+ val = (uint8_t)val;
+ break;
+ case 1: /* UXTH */
+ val = (uint16_t)val;
+ break;
+ case 2: /* UXTW */
+ val = (uint32_t)val;
+ break;
+ case 3: /* UXTX / LSL */
+ break;
+ case 4: /* SXTB */
+ val = (int64_t)(int8_t)val;
+ break;
+ case 5: /* SXTH */
+ val = (int64_t)(int16_t)val;
+ break;
+ case 6: /* SXTW */
+ val = (int64_t)(int32_t)val;
+ break;
+ case 7: /* SXTX */
+ break;
+ }
+ return val << shift;
+}
+
+/*
+ * Load/store pair: STP, LDP, STNP, LDNP, STGP, LDPSW
+ * (DDI 0487 C3.3.14 -- C3.3.16)
+ */
+
+static bool trans_STP(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz; /* 4 or 8 bytes */
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset; /* post-index: unmodified base */
+ uint8_t buf[16]; /* max 2 x 8 bytes */
+
+ uint64_t v1 = gpr_read(ctx, a->rt);
+ uint64_t v2 = gpr_read(ctx, a->rt2);
+ memcpy(buf, &v1, esize);
+ memcpy(buf + esize, &v2, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDP(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz;
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+ uint64_t v1 = 0, v2 = 0;
+
+ memset(buf, 0, sizeof(buf));
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ memcpy(&v1, buf, esize);
+ memcpy(&v2, buf + esize, esize);
+
+ /* LDPSW: sign-extend 32-bit values to 64-bit (sign=1, sz=2) */
+ if (a->sign) {
+ v1 = sign_extend(v1, 8 * esize);
+ v2 = sign_extend(v2, 8 * esize);
+ }
+
+ gpr_write(ctx, a->rt, v1);
+ gpr_write(ctx, a->rt2, v2);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/* STGP: tag operation is a NOP for emulation; data stored via STP */
+static bool trans_STGP(DisasContext *ctx, arg_ldstpair *a)
+{
+ return trans_STP(ctx, a);
+}
+
+/*
+ * SIMD/FP load/store pair: STP_v, LDP_v
+ * (DDI 0487 C3.3.14 -- C3.3.16)
+ */
+
+static bool trans_STP_v(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz; /* 4, 8, or 16 bytes */
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[32]; /* max 2 x 16 bytes */
+
+ ctx->ops->read_fpreg(ctx->cpu, a->rt, buf, esize);
+ ctx->ops->read_fpreg(ctx->cpu, a->rt2, buf + esize, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDP_v(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz;
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[32];
+
+ memset(buf, 0, sizeof(buf));
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ ctx->ops->write_fpreg(ctx->cpu, a->rt, buf, esize);
+ ctx->ops->write_fpreg(ctx->cpu, a->rt2, buf + esize, esize);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/* Load/store single -- immediate (GPR) (DDI 0487 C3.3.8 -- C3.3.13) */
+
+static bool trans_STR_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+
+ uint64_t val = gpr_read(ctx, a->rt);
+ if (mem_write(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDR_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ val = load_extend(val, a->sz, a->sign, a->ext);
+ gpr_write(ctx, a->rt, val);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/*
+ * Load/store single -- immediate (SIMD/FP)
+ * STR_v_i / LDR_v_i (DDI 0487 C3.3.10)
+ */
+
+static bool trans_STR_v_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+
+ ctx->ops->read_fpreg(ctx->cpu, a->rt, buf, esize);
+ if (mem_write(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDR_v_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+
+ memset(buf, 0, sizeof(buf));
+ if (mem_read(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ ctx->ops->write_fpreg(ctx->cpu, a->rt, buf, esize);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/*
+ * Load/store single -- register offset (GPR)
+ * STR / LDR (DDI 0487 C3.3.9)
+ */
+
+static bool trans_STR(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+
+ uint64_t val = gpr_read(ctx, a->rt);
+ mem_write(ctx, va, &val, esize);
+ return true;
+}
+
+static bool trans_LDR(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ val = load_extend(val, a->sz, a->sign, a->ext);
+ gpr_write(ctx, a->rt, val);
+ return true;
+}
+
+/*
+ * Load/store single -- register offset (SIMD/FP)
+ * STR_v / LDR_v (DDI 0487 C3.3.10)
+ */
+
+static bool trans_STR_v(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint8_t buf[16];
+
+ ctx->ops->read_fpreg(ctx->cpu, a->rt, buf, esize);
+ mem_write(ctx, va, buf, esize);
+ return true;
+}
+
+static bool trans_LDR_v(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint8_t buf[16];
+
+ memset(buf, 0, sizeof(buf));
+ if (mem_read(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ ctx->ops->write_fpreg(ctx->cpu, a->rt, buf, esize);
+ return true;
+}
+
+/*
+ * Load/store exclusive: STXR, LDXR, STXP, LDXP
+ * (DDI 0487 C3.3.6)
+ *
+ * Exclusive monitors have no meaning on MMIO. STXR always reports
+ * success (Rs=0) and LDXR does not set an exclusive monitor.
+ */
+
+static bool trans_STXR(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t val = gpr_read(ctx, a->rt);
+
+ if (mem_write(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ /* Report success -- no exclusive monitor on emulated access */
+ gpr_write(ctx, a->rs, 0);
+ return true;
+}
+
+static bool trans_LDXR(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, val);
+ return true;
+}
+
+static bool trans_STXP(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz; /* sz=2->4, sz=3->8 */
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+
+ uint64_t v1 = gpr_read(ctx, a->rt);
+ uint64_t v2 = gpr_read(ctx, a->rt2);
+ memcpy(buf, &v1, esize);
+ memcpy(buf + esize, &v2, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rs, 0); /* success */
+ return true;
+}
+
+static bool trans_LDXP(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+ uint64_t v1 = 0, v2 = 0;
+
+ memset(buf, 0, sizeof(buf));
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ memcpy(&v1, buf, esize);
+ memcpy(&v2, buf + esize, esize);
+ gpr_write(ctx, a->rt, v1);
+ gpr_write(ctx, a->rt2, v2);
+ return true;
+}
+
+/*
+ * Atomic memory operations (DDI 0487 C3.3.2)
+ *
+ * Non-atomic read-modify-write; sufficient for MMIO.
+ * Acquire/release semantics ignored (sequentially consistent by design).
+ */
+
+typedef uint64_t (*atomic_op_fn)(uint64_t old, uint64_t operand, int bits);
+
+static uint64_t atomic_add(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old + op;
+}
+
+static uint64_t atomic_clr(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old & ~op;
+}
+
+static uint64_t atomic_eor(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old ^ op;
+}
+
+static uint64_t atomic_set(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old | op;
+}
+
+static uint64_t atomic_smax(uint64_t old, uint64_t op, int bits)
+{
+ int64_t a = sign_extend(old, bits);
+ int64_t b = sign_extend(op, bits);
+ return (a >= b) ? old : op;
+}
+
+static uint64_t atomic_smin(uint64_t old, uint64_t op, int bits)
+{
+ int64_t a = sign_extend(old, bits);
+ int64_t b = sign_extend(op, bits);
+ return (a <= b) ? old : op;
+}
+
+static uint64_t atomic_umax(uint64_t old, uint64_t op, int bits)
+{
+ uint64_t mask = (bits == 64) ? UINT64_MAX : (1ULL << bits) - 1;
+ return ((old & mask) >= (op & mask)) ? old : op;
+}
+
+static uint64_t atomic_umin(uint64_t old, uint64_t op, int bits)
+{
+ uint64_t mask = (bits == 64) ? UINT64_MAX : (1ULL << bits) - 1;
+ return ((old & mask) <= (op & mask)) ? old : op;
+}
+
+static bool do_atomic(DisasContext *ctx, arg_atomic *a, atomic_op_fn fn)
+{
+ int esize = 1 << a->sz;
+ int bits = 8 * esize;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t old = 0;
+
+ if (mem_read(ctx, va, &old, esize) != 0) {
+ return true;
+ }
+
+ uint64_t operand = gpr_read(ctx, a->rs);
+ uint64_t result = fn(old, operand, bits);
+
+ if (mem_write(ctx, va, &result, esize) != 0) {
+ return true;
+ }
+
+ /* Rt receives the old value (before modification) */
+ gpr_write(ctx, a->rt, old);
+ return true;
+}
+
+static bool trans_LDADD(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_add);
+}
+
+static bool trans_LDCLR(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_clr);
+}
+
+static bool trans_LDEOR(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_eor);
+}
+
+static bool trans_LDSET(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_set);
+}
+
+static bool trans_LDSMAX(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_smax);
+}
+
+static bool trans_LDSMIN(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_smin);
+}
+
+static bool trans_LDUMAX(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_umax);
+}
+
+static bool trans_LDUMIN(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_umin);
+}
+
+static bool trans_SWP(DisasContext *ctx, arg_atomic *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t old = 0;
+
+ if (mem_read(ctx, va, &old, esize) != 0) {
+ return true;
+ }
+
+ uint64_t newval = gpr_read(ctx, a->rs);
+ if (mem_write(ctx, va, &newval, esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, old);
+ return true;
+}
+
+/* Compare-and-swap: CAS, CASP (DDI 0487 C3.3.1) */
+
+static bool trans_CAS(DisasContext *ctx, arg_cas *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t current = 0;
+
+ if (mem_read(ctx, va, ¤t, esize) != 0) {
+ return true;
+ }
+
+ uint64_t mask = (esize == 8) ? UINT64_MAX : (1ULL << (8 * esize)) - 1;
+ uint64_t compare = gpr_read(ctx, a->rs) & mask;
+
+ if ((current & mask) == compare) {
+ uint64_t newval = gpr_read(ctx, a->rt) & mask;
+ if (mem_write(ctx, va, &newval, esize) != 0) {
+ return true;
+ }
+ }
+
+ /* Rs receives the old memory value (whether or not swap occurred) */
+ gpr_write(ctx, a->rs, current);
+ return true;
+}
+
+/* CASP: compare-and-swap pair (Rs,Rs+1 compared; Rt,Rt+1 stored) */
+static bool trans_CASP(DisasContext *ctx, arg_cas *a)
+{
+ /* CASP requires even register pairs; odd or r31 is UNPREDICTABLE */
+ if ((a->rs & 1) || a->rs >= 31 || (a->rt & 1) || a->rt >= 31) {
+ return false;
+ }
+
+ int esize = 1 << a->sz; /* per-register size */
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+ uint64_t cur1 = 0, cur2 = 0;
+
+ memset(buf, 0, sizeof(buf));
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ memcpy(&cur1, buf, esize);
+ memcpy(&cur2, buf + esize, esize);
+
+ uint64_t mask = (esize == 8) ? UINT64_MAX : (1ULL << (8 * esize)) - 1;
+ uint64_t cmp1 = gpr_read(ctx, a->rs) & mask;
+ uint64_t cmp2 = gpr_read(ctx, a->rs + 1) & mask;
+
+ if ((cur1 & mask) == cmp1 && (cur2 & mask) == cmp2) {
+ uint64_t new1 = gpr_read(ctx, a->rt) & mask;
+ uint64_t new2 = gpr_read(ctx, a->rt + 1) & mask;
+ memcpy(buf, &new1, esize);
+ memcpy(buf + esize, &new2, esize);
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ }
+
+ gpr_write(ctx, a->rs, cur1);
+ gpr_write(ctx, a->rs + 1, cur2);
+ return true;
+}
+
+/*
+ * Load with PAC: LDRAA / LDRAB (FEAT_PAuth)
+ * (DDI 0487 C6.2.121)
+ *
+ * Pointer authentication is not emulated -- the base register is used
+ * directly (equivalent to auth always succeeding).
+ */
+
+static bool trans_LDRA(DisasContext *ctx, arg_ldra *a)
+{
+ int64_t offset = (int64_t)a->imm << 3; /* S:imm9, scaled by 8 */
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = base + offset; /* auth not emulated */
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, 8) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, val);
+
+ if (a->w) {
+ base_write(ctx, a->rn, va);
+ }
+ return true;
+}
+
+/* PRFM, DC cache maintenance -- treated as NOP */
+static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
+{
+ (void)ctx;
+ (void)a;
+ return true;
+}
+
+/* Entry point */
+
+ArmEmulResult arm_emul_insn(CPUState *cpu, const struct arm_emul_ops *ops,
+ uint32_t insn)
+{
+ DisasContext ctx = {
+ .cpu = cpu,
+ .ops = ops,
+ .result = ARM_EMUL_OK,
+ };
+
+ if (!decode_a64_ldst(&ctx, insn)) {
+ return ARM_EMUL_UNHANDLED;
+ }
+
+ return ctx.result;
+}
diff --git a/target/arm/emulate/arm_emulate.h b/target/arm/emulate/arm_emulate.h
new file mode 100644
index 0000000..eef8a37
--- /dev/null
+++ b/target/arm/emulate/arm_emulate.h
@@ -0,0 +1,55 @@
+/*
+ * AArch64 instruction emulation library
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef ARM_EMULATE_H
+#define ARM_EMULATE_H
+
+#include "qemu/osdep.h"
+
+/*
+ * CPUState is only used as an opaque pointer (via qemu/typedefs.h).
+ * Callers that dereference CPUState include hw/core/cpu.h themselves.
+ */
+
+/**
+ * ArmEmulResult - return status from arm_emul_insn()
+ */
+typedef enum {
+ ARM_EMUL_OK, /* Instruction emulated successfully */
+ ARM_EMUL_UNHANDLED, /* Instruction not recognized by decoder */
+ ARM_EMUL_ERR_MEM, /* Memory access callback failed */
+} ArmEmulResult;
+
+/**
+ * struct arm_emul_ops - hypervisor register/memory callbacks
+ *
+ * GPR reg 31 = SP (the XZR/SP distinction is handled internally).
+ * Memory callbacks use guest virtual addresses.
+ */
+struct arm_emul_ops {
+ uint64_t (*read_gpr)(CPUState *cpu, int reg);
+ void (*write_gpr)(CPUState *cpu, int reg, uint64_t val);
+
+ /* @size: access width in bytes (4, 8, or 16) */
+ void (*read_fpreg)(CPUState *cpu, int reg, void *buf, int size);
+ void (*write_fpreg)(CPUState *cpu, int reg, const void *buf, int size);
+
+ /* Returns 0 on success, non-zero on failure */
+ int (*read_mem)(CPUState *cpu, uint64_t va, void *buf, int size);
+ int (*write_mem)(CPUState *cpu, uint64_t va, const void *buf, int size);
+};
+
+/**
+ * arm_emul_insn - decode and emulate one AArch64 instruction
+ *
+ * Caller must synchronize CPU state and fetch @insn before calling.
+ */
+ArmEmulResult arm_emul_insn(CPUState *cpu, const struct arm_emul_ops *ops,
+ uint32_t insn);
+
+#endif /* ARM_EMULATE_H */
diff --git a/target/arm/emulate/meson.build b/target/arm/emulate/meson.build
new file mode 100644
index 0000000..29b7879
--- /dev/null
+++ b/target/arm/emulate/meson.build
@@ -0,0 +1,16 @@
+gen_a64_ldst = decodetree.process('a64-ldst.decode',
+ extra_args: ['--static-decode=decode_a64_ldst'])
+
+arm_common_system_ss.add(when: 'TARGET_AARCH64', if_true: [
+ gen_a64_ldst, files('arm_emulate.c')
+])
+
+# Static library for unit testing (links emulation code + decodetree decoder)
+arm_emulate_test_lib = static_library('arm-emulate-test',
+ sources: [files('arm_emulate.c'), gen_a64_ldst],
+ dependencies: [qemuutil],
+ include_directories: include_directories('.'))
+
+arm_emulate_test = declare_dependency(
+ link_with: arm_emulate_test_lib,
+ include_directories: include_directories('.'))
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 6e0e504..a4b2291 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -57,6 +57,7 @@ arm_common_system_ss.add(files(
'vfp_fpscr.c',
))
+subdir('emulate')
subdir('hvf')
subdir('whpx')
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 2/3] tests: add unit tests for ISV=0 emulation library
2026-03-13 2:18 ` [PATCH v2 0/3] target/arm: ISV=0 data abort emulation library Lucas Amaral
2026-03-13 2:18 ` [PATCH v2 1/3] target/arm: add AArch64 ISV=0 instruction " Lucas Amaral
@ 2026-03-13 2:18 ` Lucas Amaral
2026-03-13 2:18 ` [PATCH v2 3/3] target/arm: wire ISV=0 emulation into HVF and WHPX Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
3 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-13 2:18 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, Lucas Amaral
Add test-arm-emulate with 19 test cases covering the arm_emul_ops
callback interface using a mock environment. Encodings and expected
values verified against ARM architecture ground truth.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
tests/unit/meson.build | 1 +
tests/unit/test-arm-emulate.c | 540 ++++++++++++++++++++++++++++++++++
2 files changed, 541 insertions(+)
create mode 100644 tests/unit/test-arm-emulate.c
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 41e8b06..27a515b 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -157,6 +157,7 @@ if have_system
}
endif
tests += {'test-qdev': [qom, hwcore]}
+ tests += {'test-arm-emulate': [arm_emulate_test]}
endif
if have_ga and host_os == 'linux'
diff --git a/tests/unit/test-arm-emulate.c b/tests/unit/test-arm-emulate.c
new file mode 100644
index 0000000..5ab7f04
--- /dev/null
+++ b/tests/unit/test-arm-emulate.c
@@ -0,0 +1,540 @@
+/*
+ * Unit tests for AArch64 ISV=0 instruction emulation library
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "arm_emulate.h"
+
+/* Mock environment: GPR, FPR, and flat memory */
+
+typedef struct MockEnv {
+ uint64_t gpr[32]; /* X0-X30, X31=SP */
+ uint8_t fpr[32][16]; /* V0-V31, 128 bits each */
+ uint8_t mem[0x1000]; /* 4 KiB flat address space */
+ bool mem_fail; /* if true, memory ops return -1 */
+} MockEnv;
+
+static MockEnv *env_from_cpu(CPUState *cpu)
+{
+ return (MockEnv *)cpu;
+}
+
+static uint64_t mock_read_gpr(CPUState *cpu, int reg)
+{
+ return env_from_cpu(cpu)->gpr[reg];
+}
+
+static void mock_write_gpr(CPUState *cpu, int reg, uint64_t val)
+{
+ env_from_cpu(cpu)->gpr[reg] = val;
+}
+
+static void mock_read_fpreg(CPUState *cpu, int reg, void *buf, int size)
+{
+ memcpy(buf, env_from_cpu(cpu)->fpr[reg], size);
+}
+
+static void mock_write_fpreg(CPUState *cpu, int reg, const void *buf, int size)
+{
+ MockEnv *env = env_from_cpu(cpu);
+ memset(env->fpr[reg], 0, 16);
+ memcpy(env->fpr[reg], buf, size);
+}
+
+static int mock_read_mem(CPUState *cpu, uint64_t va, void *buf, int size)
+{
+ MockEnv *env = env_from_cpu(cpu);
+ if (env->mem_fail || va + size > sizeof(env->mem)) {
+ return -1;
+ }
+ memcpy(buf, env->mem + va, size);
+ return 0;
+}
+
+static int mock_write_mem(CPUState *cpu, uint64_t va, const void *buf, int size)
+{
+ MockEnv *env = env_from_cpu(cpu);
+ if (env->mem_fail || va + size > sizeof(env->mem)) {
+ return -1;
+ }
+ memcpy(env->mem + va, buf, size);
+ return 0;
+}
+
+static const struct arm_emul_ops mock_ops = {
+ .read_gpr = mock_read_gpr,
+ .write_gpr = mock_write_gpr,
+ .read_fpreg = mock_read_fpreg,
+ .write_fpreg = mock_write_fpreg,
+ .read_mem = mock_read_mem,
+ .write_mem = mock_write_mem,
+};
+
+/* Helper: reset mock environment */
+static MockEnv *fresh_env(MockEnv *env)
+{
+ memset(env, 0, sizeof(*env));
+ return env;
+}
+
+/* Helper: call arm_emul_insn with the mock environment */
+static ArmEmulResult emul(MockEnv *env, uint32_t insn)
+{
+ return arm_emul_insn((CPUState *)env, &mock_ops, insn);
+}
+
+/* Helper: write a uint64_t to mock memory at a given VA */
+static void mem_write64(MockEnv *env, uint64_t va, uint64_t val)
+{
+ memcpy(env->mem + va, &val, 8);
+}
+
+/* Helper: read a uint64_t from mock memory */
+static uint64_t mem_read64(MockEnv *env, uint64_t va)
+{
+ uint64_t val = 0;
+ memcpy(&val, env->mem + va, 8);
+ return val;
+}
+
+/* Helper: read a uint32_t from mock memory */
+static uint32_t mem_read32(MockEnv *env, uint64_t va)
+{
+ uint32_t val = 0;
+ memcpy(&val, env->mem + va, 4);
+ return val;
+}
+
+/* STP / LDP (64-bit store/load pair, signed offset) */
+
+/*
+ * STP X0, X1, [X2]
+ * 10 101 0 010 0 0000000 00001 00010 00000
+ * = 0xA9000440
+ */
+static void test_stp_offset(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[0] = 0xDEADBEEF;
+ env.gpr[1] = 0xCAFEBABE;
+ env.gpr[2] = 0x100; /* base address */
+
+ g_assert_cmpint(emul(&env, 0xA9000440), ==, ARM_EMUL_OK);
+
+ g_assert_cmphex(mem_read64(&env, 0x100), ==, 0xDEADBEEF);
+ g_assert_cmphex(mem_read64(&env, 0x108), ==, 0xCAFEBABE);
+ /* No writeback — base unchanged */
+ g_assert_cmphex(env.gpr[2], ==, 0x100);
+}
+
+/*
+ * LDP X3, X4, [X5]
+ * 10 101 0 010 1 0000000 00100 00101 00011
+ * = 0xA94010A3
+ */
+static void test_ldp_offset(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[5] = 0x200;
+ mem_write64(&env, 0x200, 0x1111111111111111ULL);
+ mem_write64(&env, 0x208, 0x2222222222222222ULL);
+
+ g_assert_cmpint(emul(&env, 0xA94010A3), ==, ARM_EMUL_OK);
+
+ g_assert_cmphex(env.gpr[3], ==, 0x1111111111111111ULL);
+ g_assert_cmphex(env.gpr[4], ==, 0x2222222222222222ULL);
+}
+
+/* STP pre-indexed (writeback) */
+
+/*
+ * STP X0, X1, [X2, #16]!
+ * 10 101 0 011 0 0000010 00001 00010 00000
+ * = 0xA9810440
+ * imm7=+2, scaled by 8 = offset +16
+ */
+static void test_stp_preindex(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[0] = 0xAAAA;
+ env.gpr[1] = 0xBBBB;
+ env.gpr[2] = 0x100;
+
+ g_assert_cmpint(emul(&env, 0xA9810440), ==, ARM_EMUL_OK);
+
+ /* Data stored at base+16 = 0x110 */
+ g_assert_cmphex(mem_read64(&env, 0x110), ==, 0xAAAA);
+ g_assert_cmphex(mem_read64(&env, 0x118), ==, 0xBBBB);
+ /* Writeback: base updated to base+16 */
+ g_assert_cmphex(env.gpr[2], ==, 0x110);
+}
+
+/* STR / LDR unsigned offset (64-bit) */
+
+/*
+ * STR X0, [X1]
+ * 11 111 0 01 00 000000000000 00001 00000
+ * = 0xF9000020
+ */
+static void test_str_uoffset(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[0] = 0x42;
+ env.gpr[1] = 0x80;
+
+ g_assert_cmpint(emul(&env, 0xF9000020), ==, ARM_EMUL_OK);
+ g_assert_cmphex(mem_read64(&env, 0x80), ==, 0x42);
+}
+
+/*
+ * LDR X2, [X1]
+ * 11 111 0 01 01 000000000000 00001 00010
+ * = 0xF9400022
+ */
+static void test_ldr_uoffset(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[1] = 0x80;
+ mem_write64(&env, 0x80, 0xFEDCBA9876543210ULL);
+
+ g_assert_cmpint(emul(&env, 0xF9400022), ==, ARM_EMUL_OK);
+ g_assert_cmphex(env.gpr[2], ==, 0xFEDCBA9876543210ULL);
+}
+
+/* LDRB zero-extend / LDRSB sign-extend */
+
+/*
+ * LDRB W2, [X1] (zero-extend byte to 32-bit)
+ * 00 111 0 01 01 000000000000 00001 00010
+ * = 0x39400022
+ */
+static void test_ldrb_zero_extend(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[1] = 0x80;
+ env.mem[0x80] = 0xFF;
+
+ g_assert_cmpint(emul(&env, 0x39400022), ==, ARM_EMUL_OK);
+ g_assert_cmphex(env.gpr[2], ==, 0xFF);
+}
+
+/*
+ * LDRSB X2, [X1] (sign-extend byte to 64-bit)
+ * 00 111 0 01 10 000000000000 00001 00010
+ * = 0x39800022
+ */
+static void test_ldrsb_sign_extend(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[1] = 0x80;
+ env.mem[0x80] = 0x80; /* -128 as signed byte */
+
+ g_assert_cmpint(emul(&env, 0x39800022), ==, ARM_EMUL_OK);
+ g_assert_cmphex(env.gpr[2], ==, 0xFFFFFFFFFFFFFF80ULL);
+}
+
+/* XZR -- register 31 reads as zero for GPR data */
+
+/*
+ * STR XZR, [X1] (store zero register)
+ * 11 111 0 01 00 000000000000 00001 11111
+ * = 0xF900003F
+ */
+static void test_xzr_reads_zero(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[31] = 0x9999; /* SP value — should NOT be stored */
+ env.gpr[1] = 0x80;
+ mem_write64(&env, 0x80, 0xFFFF); /* pre-fill to verify overwrite */
+
+ g_assert_cmpint(emul(&env, 0xF900003F), ==, ARM_EMUL_OK);
+ /* XZR is zero, not SP */
+ g_assert_cmphex(mem_read64(&env, 0x80), ==, 0);
+}
+
+/* Atomic -- LDADD */
+
+/*
+ * LDADD X0, X1, [X2]
+ * 11 111 0 00 00 1 00000 0000 00 00010 00001
+ * = 0xF8200041
+ * sz=3, a=0, r=0, rs=0, opc=0000 (ADD), rn=2, rt=1
+ */
+static void test_ldadd(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[0] = 5; /* operand (Rs) */
+ env.gpr[2] = 0x100; /* base address */
+ mem_write64(&env, 0x100, 10); /* old value */
+
+ g_assert_cmpint(emul(&env, 0xF8200041), ==, ARM_EMUL_OK);
+
+ /* Rt gets old value */
+ g_assert_cmphex(env.gpr[1], ==, 10);
+ /* Memory gets old + operand */
+ g_assert_cmphex(mem_read64(&env, 0x100), ==, 15);
+}
+
+/* SWP */
+
+/*
+ * SWP X0, X1, [X2]
+ * 11 111 0 00 00 1 00000 1000 00 00010 00001
+ * = 0xF8208041
+ */
+static void test_swp(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[0] = 42; /* new value (Rs) */
+ env.gpr[2] = 0x100; /* base address */
+ mem_write64(&env, 0x100, 99); /* old value */
+
+ g_assert_cmpint(emul(&env, 0xF8208041), ==, ARM_EMUL_OK);
+
+ /* Rt gets old value */
+ g_assert_cmphex(env.gpr[1], ==, 99);
+ /* Memory gets new value */
+ g_assert_cmphex(mem_read64(&env, 0x100), ==, 42);
+}
+
+/* CAS (compare-and-swap, 64-bit) */
+
+/*
+ * CAS X0, X2, [X4]
+ * 11 001000 1 0 1 00000 0 11111 00100 00010
+ * = 0xC8A07C82
+ * sz=3, a=0, r=0, rs=0 (compare), rt=2 (new), rn=4 (base)
+ */
+static void test_cas_match(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[0] = 100; /* Rs: compare value */
+ env.gpr[2] = 200; /* Rt: new value */
+ env.gpr[4] = 0x100; /* Rn: base address */
+ mem_write64(&env, 0x100, 100); /* memory == compare, swap occurs */
+
+ g_assert_cmpint(emul(&env, 0xC8A07C82), ==, ARM_EMUL_OK);
+
+ /* Rs gets old memory value */
+ g_assert_cmphex(env.gpr[0], ==, 100);
+ /* Memory updated to new value */
+ g_assert_cmphex(mem_read64(&env, 0x100), ==, 200);
+}
+
+/* CAS with no match — memory unchanged */
+static void test_cas_nomatch(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[0] = 100; /* compare */
+ env.gpr[2] = 200; /* new */
+ env.gpr[4] = 0x100;
+ mem_write64(&env, 0x100, 999); /* memory != compare, no swap */
+
+ g_assert_cmpint(emul(&env, 0xC8A07C82), ==, ARM_EMUL_OK);
+
+ g_assert_cmphex(env.gpr[0], ==, 999); /* Rs gets old value */
+ g_assert_cmphex(mem_read64(&env, 0x100), ==, 999); /* unchanged */
+}
+
+/* CASP (pair compare-and-swap) */
+
+/*
+ * CASP X0, X1, X2, X3, [X4] (64-bit pair)
+ * 01 001000 0 0 1 00000 0 11111 00100 00010
+ * = 0x48207C82
+ */
+static void test_casp_match(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[0] = 0xAA; env.gpr[1] = 0xBB; /* Rs pair: compare */
+ env.gpr[2] = 0xCC; env.gpr[3] = 0xDD; /* Rt pair: new */
+ env.gpr[4] = 0x100;
+ mem_write64(&env, 0x100, 0xAA);
+ mem_write64(&env, 0x108, 0xBB);
+
+ g_assert_cmpint(emul(&env, 0x48207C82), ==, ARM_EMUL_OK);
+
+ g_assert_cmphex(mem_read64(&env, 0x100), ==, 0xCC);
+ g_assert_cmphex(mem_read64(&env, 0x108), ==, 0xDD);
+}
+
+/* CASP with odd register -- validation rejects */
+
+/*
+ * CASP with Rs=X1 (odd) -- trans_CASP returns false
+ * 01 001000 0 0 1 00001 0 11111 00100 00010
+ * = 0x48217C82
+ */
+static void test_casp_odd_register(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[4] = 0x100;
+
+ /* Odd Rs -- decoder returns false -- ARM_EMUL_UNHANDLED */
+ g_assert_cmpint(emul(&env, 0x48217C82), ==, ARM_EMUL_UNHANDLED);
+}
+
+/* Unrecognized instruction -- ARM_EMUL_UNHANDLED */
+
+static void test_unhandled(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ /* B #0 = 0x14000000 — a branch, not a load/store */
+ g_assert_cmpint(emul(&env, 0x14000000), ==, ARM_EMUL_UNHANDLED);
+}
+
+/* Memory error -- ARM_EMUL_ERR_MEM */
+
+static void test_mem_error(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[1] = 0x80;
+ env.mem_fail = true;
+
+ /* LDR X2, [X1] — memory read will fail */
+ g_assert_cmpint(emul(&env, 0xF9400022), ==, ARM_EMUL_ERR_MEM);
+}
+
+/* PRFM (prefetch) -- NOP, returns OK */
+
+/*
+ * PRFM #0, [X1] (unsigned offset, imm=0)
+ * 11 111 0 01 10 000000000000 00001 00000
+ * = 0xF9800020
+ */
+static void test_prfm_nop(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[1] = 0x80;
+
+ g_assert_cmpint(emul(&env, 0xF9800020), ==, ARM_EMUL_OK);
+ /* No state change — it's a NOP */
+}
+
+/* SIMD/FP store/load pair */
+
+/*
+ * STP S0, S1, [X2] (32-bit FP pair, signed offset, imm=0)
+ * 00 101 1 010 0 0000000 00001 00010 00000
+ * = 0x2D000440
+ */
+static void test_stp_fp(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ uint32_t f0 = 0x3F800000; /* 1.0f */
+ uint32_t f1 = 0x40000000; /* 2.0f */
+ memcpy(env.fpr[0], &f0, 4);
+ memcpy(env.fpr[1], &f1, 4);
+ env.gpr[2] = 0x200;
+
+ g_assert_cmpint(emul(&env, 0x2D000440), ==, ARM_EMUL_OK);
+
+ g_assert_cmphex(mem_read32(&env, 0x200), ==, 0x3F800000);
+ g_assert_cmphex(mem_read32(&env, 0x204), ==, 0x40000000);
+}
+
+/* LDR post-indexed (writeback after load) */
+
+/*
+ * LDR X3, [X1], #8 (post-indexed, imm=+8)
+ * 11 111 0 00 01 0 000001000 01 00001 00011
+ * = 0xF8408423
+ * sz=3, opc=01, imm9=+8, type=01 (post-index), rn=1, rt=3
+ */
+static void test_ldr_postindex(void)
+{
+ MockEnv env;
+ fresh_env(&env);
+
+ env.gpr[1] = 0x100;
+ mem_write64(&env, 0x100, 0x55AA55AA55AA55AAULL);
+
+ g_assert_cmpint(emul(&env, 0xF8408423), ==, ARM_EMUL_OK);
+
+ /* Load from original base */
+ g_assert_cmphex(env.gpr[3], ==, 0x55AA55AA55AA55AAULL);
+ /* Writeback: base += 8 */
+ g_assert_cmphex(env.gpr[1], ==, 0x108);
+}
+
+/* Entry point */
+
+int main(int argc, char **argv)
+{
+ g_test_init(&argc, &argv, NULL);
+
+ /* Load/store pair */
+ g_test_add_func("/arm-emulate/stp-offset", test_stp_offset);
+ g_test_add_func("/arm-emulate/ldp-offset", test_ldp_offset);
+ g_test_add_func("/arm-emulate/stp-preindex", test_stp_preindex);
+ g_test_add_func("/arm-emulate/stp-fp", test_stp_fp);
+
+ /* Load/store single */
+ g_test_add_func("/arm-emulate/str-uoffset", test_str_uoffset);
+ g_test_add_func("/arm-emulate/ldr-uoffset", test_ldr_uoffset);
+ g_test_add_func("/arm-emulate/ldr-postindex", test_ldr_postindex);
+ g_test_add_func("/arm-emulate/ldrb-zero-extend", test_ldrb_zero_extend);
+ g_test_add_func("/arm-emulate/ldrsb-sign-extend", test_ldrsb_sign_extend);
+
+ /* XZR */
+ g_test_add_func("/arm-emulate/xzr-reads-zero", test_xzr_reads_zero);
+
+ /* Atomics */
+ g_test_add_func("/arm-emulate/ldadd", test_ldadd);
+ g_test_add_func("/arm-emulate/swp", test_swp);
+
+ /* Compare-and-swap */
+ g_test_add_func("/arm-emulate/cas-match", test_cas_match);
+ g_test_add_func("/arm-emulate/cas-nomatch", test_cas_nomatch);
+ g_test_add_func("/arm-emulate/casp-match", test_casp_match);
+ g_test_add_func("/arm-emulate/casp-odd-register", test_casp_odd_register);
+
+ /* NOP */
+ g_test_add_func("/arm-emulate/prfm-nop", test_prfm_nop);
+
+ /* Error handling */
+ g_test_add_func("/arm-emulate/unhandled", test_unhandled);
+ g_test_add_func("/arm-emulate/mem-error", test_mem_error);
+
+ return g_test_run();
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 3/3] target/arm: wire ISV=0 emulation into HVF and WHPX
2026-03-13 2:18 ` [PATCH v2 0/3] target/arm: ISV=0 data abort emulation library Lucas Amaral
2026-03-13 2:18 ` [PATCH v2 1/3] target/arm: add AArch64 ISV=0 instruction " Lucas Amaral
2026-03-13 2:18 ` [PATCH v2 2/3] tests: add unit tests for ISV=0 " Lucas Amaral
@ 2026-03-13 2:18 ` Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
3 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-13 2:18 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, Lucas Amaral
Connect the ISV=0 emulation library to the HVF and WHPX backends.
Each implements arm_emul_ops callbacks over CPUARMState and
cpu_memory_rw_debug(). Replaces the assert(isv) with instruction
fetch, decode, and emulation via arm_emul_insn().
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/hvf/hvf.c | 94 ++++++++++++++++++++++++++++++++++++--
target/arm/whpx/whpx-all.c | 86 +++++++++++++++++++++++++++++++++-
2 files changed, 176 insertions(+), 4 deletions(-)
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index d79469c..2a57b97 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -30,6 +30,7 @@
#include "qemu/main-loop.h"
#include "system/cpus.h"
#include "arm-powerctl.h"
+#include "emulate/arm_emulate.h"
#include "target/arm/cpu.h"
#include "target/arm/internals.h"
#include "target/arm/multiprocessing.h"
@@ -797,6 +798,59 @@ static uint64_t hvf_get_reg(CPUState *cpu, int rt)
return val;
}
+/*
+ * arm_emul_ops callbacks for HVF
+ *
+ * State must already be synchronized (cpu_synchronize_state) before
+ * calling arm_emul_insn(). Reads/writes env->xregs[] directly to
+ * correctly handle register 31 as SP and avoid redundant HVF API calls.
+ */
+
+static uint64_t hvf_emul_read_gpr(CPUState *cpu, int reg)
+{
+ return ARM_CPU(cpu)->env.xregs[reg];
+}
+
+static void hvf_emul_write_gpr(CPUState *cpu, int reg, uint64_t val)
+{
+ ARM_CPU(cpu)->env.xregs[reg] = val;
+ cpu->vcpu_dirty = true;
+}
+
+static void hvf_emul_read_fpreg(CPUState *cpu, int reg, void *buf, int size)
+{
+ memcpy(buf, &ARM_CPU(cpu)->env.vfp.zregs[reg], size);
+}
+
+static void hvf_emul_write_fpreg(CPUState *cpu, int reg,
+ const void *buf, int size)
+{
+ CPUARMState *env = &ARM_CPU(cpu)->env;
+ memset(&env->vfp.zregs[reg], 0, sizeof(env->vfp.zregs[reg]));
+ memcpy(&env->vfp.zregs[reg], buf, size);
+ cpu->vcpu_dirty = true;
+}
+
+static int hvf_emul_read_mem(CPUState *cpu, uint64_t va, void *buf, int size)
+{
+ return cpu_memory_rw_debug(cpu, va, buf, size, false);
+}
+
+static int hvf_emul_write_mem(CPUState *cpu, uint64_t va,
+ const void *buf, int size)
+{
+ return cpu_memory_rw_debug(cpu, va, (void *)buf, size, true);
+}
+
+static const struct arm_emul_ops hvf_arm_emul_ops = {
+ .read_gpr = hvf_emul_read_gpr,
+ .write_gpr = hvf_emul_write_gpr,
+ .read_fpreg = hvf_emul_read_fpreg,
+ .write_fpreg = hvf_emul_write_fpreg,
+ .read_mem = hvf_emul_read_mem,
+ .write_mem = hvf_emul_write_mem,
+};
+
static void clamp_id_aa64mmfr0_parange_to_ipa_size(ARMISARegisters *isar)
{
uint32_t ipa_size = chosen_ipa_bit_size ?
@@ -1871,10 +1925,44 @@ static int hvf_handle_exception(CPUState *cpu, hv_vcpu_exit_exception_t *excp)
assert(!s1ptw);
/*
- * TODO: ISV will be 0 for SIMD or SVE accesses.
- * Inject the exception into the guest.
+ * ISV=0: syndrome doesn't carry access size/register info.
+ * Fetch and emulate via target/arm/emulate/.
+ * Unhandled instructions log an error and advance PC.
*/
- assert(isv);
+ if (!isv) {
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ uint32_t insn;
+ ArmEmulResult r;
+
+ cpu_synchronize_state(cpu);
+
+ if (cpu_memory_rw_debug(cpu, env->pc,
+ (uint8_t *)&insn, 4, false) != 0) {
+ error_report("HVF: cannot read insn at pc=0x%" PRIx64,
+ (uint64_t)env->pc);
+ advance_pc = true;
+ break;
+ }
+
+ r = arm_emul_insn(cpu, &hvf_arm_emul_ops, insn);
+ if (r == ARM_EMUL_UNHANDLED) {
+ /*
+ * TODO: Inject data abort into guest instead of
+ * advancing PC. Requires setting ESR_EL1/FAR_EL1/
+ * ELR_EL1/SPSR_EL1 and redirecting to VBAR_EL1.
+ */
+ error_report("HVF: ISV=0 unhandled insn 0x%08x at "
+ "pc=0x%" PRIx64, insn, (uint64_t)env->pc);
+ } else if (r == ARM_EMUL_ERR_MEM) {
+ error_report("HVF: ISV=0 memory error emulating "
+ "insn 0x%08x at pc=0x%" PRIx64,
+ insn, (uint64_t)env->pc);
+ }
+
+ advance_pc = true;
+ break;
+ }
/*
* Emulate MMIO.
diff --git a/target/arm/whpx/whpx-all.c b/target/arm/whpx/whpx-all.c
index 40ada2d..c57abef 100644
--- a/target/arm/whpx/whpx-all.c
+++ b/target/arm/whpx/whpx-all.c
@@ -37,6 +37,7 @@
#include "whpx_arm.h"
#include "hw/arm/bsa.h"
#include "arm-powerctl.h"
+#include "emulate/arm_emulate.h"
#include <winhvplatform.h>
#include <winhvplatformdefs.h>
@@ -377,6 +378,53 @@ static void whpx_set_gp_reg(CPUState *cpu, int rt, uint64_t val)
whpx_set_reg(cpu, reg, reg_val);
}
+/* arm_emul_ops callbacks for WHPX */
+
+static uint64_t whpx_emul_read_gpr(CPUState *cpu, int reg)
+{
+ return ARM_CPU(cpu)->env.xregs[reg];
+}
+
+static void whpx_emul_write_gpr(CPUState *cpu, int reg, uint64_t val)
+{
+ ARM_CPU(cpu)->env.xregs[reg] = val;
+ cpu->vcpu_dirty = true;
+}
+
+static void whpx_emul_read_fpreg(CPUState *cpu, int reg, void *buf, int size)
+{
+ memcpy(buf, &ARM_CPU(cpu)->env.vfp.zregs[reg], size);
+}
+
+static void whpx_emul_write_fpreg(CPUState *cpu, int reg,
+ const void *buf, int size)
+{
+ CPUARMState *env = &ARM_CPU(cpu)->env;
+ memset(&env->vfp.zregs[reg], 0, sizeof(env->vfp.zregs[reg]));
+ memcpy(&env->vfp.zregs[reg], buf, size);
+ cpu->vcpu_dirty = true;
+}
+
+static int whpx_emul_read_mem(CPUState *cpu, uint64_t va, void *buf, int size)
+{
+ return cpu_memory_rw_debug(cpu, va, buf, size, false);
+}
+
+static int whpx_emul_write_mem(CPUState *cpu, uint64_t va,
+ const void *buf, int size)
+{
+ return cpu_memory_rw_debug(cpu, va, (void *)buf, size, true);
+}
+
+static const struct arm_emul_ops whpx_arm_emul_ops = {
+ .read_gpr = whpx_emul_read_gpr,
+ .write_gpr = whpx_emul_write_gpr,
+ .read_fpreg = whpx_emul_read_fpreg,
+ .write_fpreg = whpx_emul_write_fpreg,
+ .read_mem = whpx_emul_read_mem,
+ .write_mem = whpx_emul_write_mem,
+};
+
static int whpx_handle_mmio(CPUState *cpu, WHV_MEMORY_ACCESS_CONTEXT *ctx)
{
uint64_t syndrome = ctx->Syndrome;
@@ -391,7 +439,43 @@ static int whpx_handle_mmio(CPUState *cpu, WHV_MEMORY_ACCESS_CONTEXT *ctx)
uint64_t val = 0;
assert(!cm);
- assert(isv);
+
+ /*
+ * ISV=0: syndrome doesn't carry access size/register info.
+ * Fetch and decode the faulting instruction via the emulation library.
+ */
+ if (!isv) {
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ uint32_t insn;
+ ArmEmulResult r;
+
+ cpu_synchronize_state(cpu);
+
+ if (cpu_memory_rw_debug(cpu, env->pc,
+ (uint8_t *)&insn, 4, false) != 0) {
+ error_report("WHPX: cannot read insn at pc=0x%" PRIx64,
+ (uint64_t)env->pc);
+ return 0;
+ }
+
+ r = arm_emul_insn(cpu, &whpx_arm_emul_ops, insn);
+ if (r == ARM_EMUL_UNHANDLED) {
+ /*
+ * TODO: Inject data abort into guest instead of
+ * advancing PC. Requires setting ESR_EL1/FAR_EL1/
+ * ELR_EL1/SPSR_EL1 and redirecting to VBAR_EL1.
+ */
+ error_report("WHPX: ISV=0 unhandled insn 0x%08x at "
+ "pc=0x%" PRIx64, insn, (uint64_t)env->pc);
+ } else if (r == ARM_EMUL_ERR_MEM) {
+ error_report("WHPX: ISV=0 memory error emulating "
+ "insn 0x%08x at pc=0x%" PRIx64,
+ insn, (uint64_t)env->pc);
+ }
+
+ return 0;
+ }
if (iswrite) {
val = whpx_get_gp_reg(cpu, srt);
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/3] target/arm: add AArch64 ISV=0 instruction emulation library
2026-03-13 2:18 ` [PATCH v2 1/3] target/arm: add AArch64 ISV=0 instruction " Lucas Amaral
@ 2026-03-13 6:33 ` Mohamed Mediouni
2026-03-13 8:59 ` Peter Maydell
1 sibling, 0 replies; 25+ messages in thread
From: Mohamed Mediouni @ 2026-03-13 6:33 UTC (permalink / raw)
To: Lucas Amaral; +Cc: qemu-devel, qemu-arm, agraf
> On 13. Mar 2026, at 03:18, Lucas Amaral <lucaaamaral@gmail.com> wrote:
>
> Add a shared emulation library in target/arm/emulate/ using a
> decodetree decoder (a64-ldst.decode) and a callback-based interface
> (struct arm_emul_ops) that any hypervisor backend can implement.
>
> The hypervisor cannot emulate ISV=0 data aborts without decoding the
> faulting instruction, since the ESR syndrome does not carry the access
> size or target register.
>
> Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
[…]
> +/**
> + * struct arm_emul_ops - hypervisor register/memory callbacks
> + *
> + * GPR reg 31 = SP (the XZR/SP distinction is handled internally).
> + * Memory callbacks use guest virtual addresses.
> + */
> +struct arm_emul_ops {
> + uint64_t (*read_gpr)(CPUState *cpu, int reg);
> + void (*write_gpr)(CPUState *cpu, int reg, uint64_t val);
> +
> + /* @size: access width in bytes (4, 8, or 16) */
> + void (*read_fpreg)(CPUState *cpu, int reg, void *buf, int size);
> + void (*write_fpreg)(CPUState *cpu, int reg, const void *buf, int size);
Hello,
Can be good to have, but you should have a default implementation using CPUState in an arm_helpers
to not duplicate them across each backend. and then do an if(ctx->ops->read_gpr) { use override } else { default }
with a default implementation.
> +
> + /* Returns 0 on success, non-zero on failure */
> + int (*read_mem)(CPUState *cpu, uint64_t va, void *buf, int size);
> + int (*write_mem)(CPUState *cpu, uint64_t va, const void *buf, int size);
> +};
A memory access - especially one that will be emulated - can span multiple (physical) pages under
the hood. If everything is mapped you’re fine, but that’s a bit depending on precious luck, especially
as the AArch64 glibc does unaligned accesses on memcpy.
On x86 side of things, was able to run Windows (NT) and Linux but not Haiku, (the Hurd needs more complexity that I don’t even handle yet for x86), and Win9x without handling such a fault case.
And there are memory to memory instructions on the way (FEAT_MOPS) where that’s even more likely to happen.
The downside of read_mem/write_mem is even if you return a fault code, you don’t know which one of the two pages
(or more potentially for memory-to-memory instructions) raised the fault.
Made a design change away to an mmu_gva_to_gpa callback and not having read/write ops anymore like this because of that factor (see target/i386/emulate/x86_mmu.c x86_write_mem_ex/x86_read_mem_ex)
Maybe you could keep a read_mem/write_mem matching those two on top of mmu_gva_to_gpa for your unit tests. Or run those
in a guest context as kvm-unit-tests does.
Thank you,
> +
> +/**
> + * arm_emul_insn - decode and emulate one AArch64 instruction
> + *
> + * Caller must synchronize CPU state and fetch @insn before calling.
> + */
> +ArmEmulResult arm_emul_insn(CPUState *cpu, const struct arm_emul_ops *ops,
> + uint32_t insn);
> +
> +#endif /* ARM_EMULATE_H */
> diff --git a/target/arm/emulate/meson.build b/target/arm/emulate/meson.build
> new file mode 100644
> index 0000000..29b7879
> --- /dev/null
> +++ b/target/arm/emulate/meson.build
> @@ -0,0 +1,16 @@
> +gen_a64_ldst = decodetree.process('a64-ldst.decode',
> + extra_args: ['--static-decode=decode_a64_ldst'])
> +
> +arm_common_system_ss.add(when: 'TARGET_AARCH64', if_true: [
> + gen_a64_ldst, files('arm_emulate.c')
> +])
> +
> +# Static library for unit testing (links emulation code + decodetree decoder)
> +arm_emulate_test_lib = static_library('arm-emulate-test',
> + sources: [files('arm_emulate.c'), gen_a64_ldst],
> + dependencies: [qemuutil],
> + include_directories: include_directories('.'))
> +
> +arm_emulate_test = declare_dependency(
> + link_with: arm_emulate_test_lib,
> + include_directories: include_directories('.'))
> diff --git a/target/arm/meson.build b/target/arm/meson.build
> index 6e0e504..a4b2291 100644
> --- a/target/arm/meson.build
> +++ b/target/arm/meson.build
> @@ -57,6 +57,7 @@ arm_common_system_ss.add(files(
> 'vfp_fpscr.c',
> ))
>
> +subdir('emulate')
> subdir('hvf')
> subdir('whpx')
>
> --
> 2.52.0
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/3] target/arm: add AArch64 ISV=0 instruction emulation library
2026-03-13 2:18 ` [PATCH v2 1/3] target/arm: add AArch64 ISV=0 instruction " Lucas Amaral
2026-03-13 6:33 ` Mohamed Mediouni
@ 2026-03-13 8:59 ` Peter Maydell
1 sibling, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2026-03-13 8:59 UTC (permalink / raw)
To: Lucas Amaral; +Cc: qemu-devel, qemu-arm, agraf
On Fri, 13 Mar 2026 at 02:19, Lucas Amaral <lucaaamaral@gmail.com> wrote:
>
> Add a shared emulation library in target/arm/emulate/ using a
> decodetree decoder (a64-ldst.decode) and a callback-based interface
> (struct arm_emul_ops) that any hypervisor backend can implement.
>
> The hypervisor cannot emulate ISV=0 data aborts without decoding the
> faulting instruction, since the ESR syndrome does not carry the access
> size or target register.
>
> Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
> ---
> target/arm/emulate/a64-ldst.decode | 293 ++++++++++++
> target/arm/emulate/arm_emulate.c | 738 +++++++++++++++++++++++++++++
> target/arm/emulate/arm_emulate.h | 55 +++
> target/arm/emulate/meson.build | 16 +
> target/arm/meson.build | 1 +
> 5 files changed, 1103 insertions(+)
This is a huge patch, please can you split it into more easily
reviewable chunks? Something like "basic framework", then
add the instructions in multiple patches that each cover
one coherent group of insns.
Are there any places where your decodetree file patterns
differ from the tcg ones? If so, that's fine, but please note
them in the relevant commit messages for convenience of review.
thanks
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v3 0/6] target/arm: ISV=0 data abort emulation library
2026-03-13 2:18 ` [PATCH v2 0/3] target/arm: ISV=0 data abort emulation library Lucas Amaral
` (2 preceding siblings ...)
2026-03-13 2:18 ` [PATCH v2 3/3] target/arm: wire ISV=0 emulation into HVF and WHPX Lucas Amaral
@ 2026-03-15 3:41 ` Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 1/6] target/arm/emulate: add ISV=0 emulation library with load/store immediate Lucas Amaral
` (6 more replies)
3 siblings, 7 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-15 3:41 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, Lucas Amaral
Add a shared emulation library for AArch64 load/store instructions that
cause ISV=0 data aborts under hardware virtualization, and wire it into
HVF (macOS) and WHPX (Windows).
When the Instruction Syndrome Valid bit is clear, the hypervisor cannot
determine the faulting instruction's target register or access size from
the syndrome alone. This previously hit an assert(isv) and killed the
VM. The library fetches and decodes the faulting instruction using a
decodetree-generated decoder, then emulates it directly against the vCPU
register file and memory.
As suggested in v1 review, the library uses its own a64-ldst.decode
rather than sharing target/arm/tcg/a64.decode. Beyond the practical
complexity noted in review, the two have incompatible purposes: TCG's
trans_* functions are a compiler — they emit IR ops into a translation
block for later execution. This library's trans_* functions are an
interpreter — they execute directly against the vCPU register file and
memory. The decodetree-generated dispatcher calls trans_* by name, so
both cannot coexist in the same translation unit. Decode patterns are
kept consistent with TCG's where possible.
This series wires the library into HVF (macOS) and WHPX (Windows). KVM
on ARM already handles ISV=0 data aborts in-kernel via
kvm_arm_handle_dabt_nisv(), but could use this library as a userspace
fallback in the future.
Changes since v2:
- Split monolithic patch into 6 incremental patches: framework, then
one patch per coherent instruction group (Peter)
- Removed per-backend callback ops; library uses CPUArchState directly
with cpu_memory_rw_debug() for memory access (Mohamed)
- Removed mock unit tests (Mohamed; kvm-unit-tests is the right
vehicle for decoder validation)
- Added architectural justification for separate decode file
Lucas Amaral (6):
target/arm/emulate: add ISV=0 emulation library with load/store
immediate
target/arm/emulate: add load/store register offset
target/arm/emulate: add load/store pair
target/arm/emulate: add load/store exclusive
target/arm/emulate: add atomic, compare-and-swap, and PAC load
target/arm/hvf,whpx: wire ISV=0 emulation for data aborts
target/arm/emulate/a64-ldst.decode | 293 +++++++++++
target/arm/emulate/arm_emulate.c | 747 +++++++++++++++++++++++++++++
target/arm/emulate/arm_emulate.h | 30 ++
target/arm/emulate/meson.build | 6 +
target/arm/hvf/hvf.c | 41 +-
target/arm/meson.build | 1 +
target/arm/whpx/whpx-all.c | 39 +-
7 files changed, 1153 insertions(+), 4 deletions(-)
create mode 100644 target/arm/emulate/a64-ldst.decode
create mode 100644 target/arm/emulate/arm_emulate.c
create mode 100644 target/arm/emulate/arm_emulate.h
create mode 100644 target/arm/emulate/meson.build
--
2.52.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v3 1/6] target/arm/emulate: add ISV=0 emulation library with load/store immediate
2026-03-15 3:41 ` [PATCH v3 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
@ 2026-03-15 3:41 ` Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 2/6] target/arm/emulate: add load/store register offset Lucas Amaral
` (5 subsequent siblings)
6 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-15 3:41 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, Lucas Amaral
Add a shared emulation library for AArch64 load/store instructions that
cause ISV=0 data aborts under hardware virtualization (HVF, WHPX).
When the Instruction Syndrome Valid bit is clear, the hypervisor cannot
determine the faulting instruction's target register or access size from
the syndrome alone. This library fetches and decodes the instruction
using a decodetree-generated decoder, then emulates it by accessing the
vCPU's register file (CPUARMState) and memory (cpu_memory_rw_debug)
directly.
This patch establishes the framework and adds load/store single with
immediate addressing — the most common ISV=0 trigger. Subsequent
patches add register-offset, pair, exclusive, and atomic instructions.
Instruction coverage:
- STR/LDR (GPR): unscaled, post-indexed, unprivileged, pre-indexed,
unsigned offset — all sizes (8/16/32/64-bit), sign/zero extension
- STR/LDR (SIMD/FP): same addressing modes, 8-128 bit elements
- PRFM: prefetch treated as NOP
- DC cache maintenance (SYS CRn=C7): NOP on MMIO
This library uses its own a64-ldst.decode rather than sharing
target/arm/tcg/a64.decode. TCG's trans_* functions are a compiler:
they emit IR ops into a translation block for later execution. This
library's trans_* functions are an interpreter: they execute directly
against the vCPU register file and memory. The decodetree-generated
dispatcher calls trans_* by name, so both cannot coexist in the same
translation unit. Decode patterns are kept consistent with TCG's
where possible.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 129 ++++++++++++++++
target/arm/emulate/arm_emulate.c | 226 +++++++++++++++++++++++++++++
target/arm/emulate/arm_emulate.h | 30 ++++
target/arm/emulate/meson.build | 6 +
target/arm/meson.build | 1 +
5 files changed, 392 insertions(+)
create mode 100644 target/arm/emulate/a64-ldst.decode
create mode 100644 target/arm/emulate/arm_emulate.c
create mode 100644 target/arm/emulate/arm_emulate.h
create mode 100644 target/arm/emulate/meson.build
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
new file mode 100644
index 00000000..c887dcba
--- /dev/null
+++ b/target/arm/emulate/a64-ldst.decode
@@ -0,0 +1,129 @@
+# AArch64 load/store instruction patterns for ISV=0 emulation
+#
+# Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+### Argument sets
+
+# Load/store immediate (unscaled, pre/post-index, unprivileged, unsigned offset)
+# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
+&ldst_imm rt rn imm sz sign w p unpriv ext u
+
+### Format templates
+
+# Load/store immediate (9-bit signed)
+@ldst_imm .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=0
+@ldst_imm_pre .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=1
+@ldst_imm_post .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=1 w=1
+@ldst_imm_user .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=1 p=0 w=0
+
+# Load/store unsigned offset (12-bit, handler scales by << sz)
+@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+
+### Load/store register — unscaled immediate (LDUR/STUR)
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=4
+
+### Load/store register — post-indexed
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=4
+
+### Load/store register — unprivileged
+
+# GPR only (no SIMD/FP unprivileged forms)
+STR_i sz:2 111 0 00 00 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=1 sz=1
+
+### Load/store register — pre-indexed
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=4
+
+### PRFM — unscaled immediate: prefetch is a NOP
+
+NOP 11 111 0 00 10 0 --------- 00 ----- -----
+
+### Load/store register — unsigned offset
+
+# GPR
+STR_i sz:2 111 0 01 00 ............ ..... ..... @ldst_uimm sign=0 ext=0
+LDR_i 00 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=0
+LDR_i 01 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=1
+LDR_i 10 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=2
+LDR_i 11 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=3
+LDR_i 00 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=0
+LDR_i 01 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=1
+LDR_i 10 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=2
+LDR_i 00 111 0 01 11 ............ ..... ..... @ldst_uimm sign=1 ext=1 sz=0
+LDR_i 01 111 0 01 11 ............ ..... ..... @ldst_uimm sign=1 ext=1 sz=1
+
+# PRFM — unsigned offset
+NOP 11 111 0 01 10 ------------ ----- -----
+
+# SIMD/FP
+STR_v_i sz:2 111 1 01 00 ............ ..... ..... @ldst_uimm sign=0 ext=0
+STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
+LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+
+### System instructions — DC cache maintenance
+
+# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
+# On MMIO regions, cache maintenance is a harmless no-op.
+NOP 1101 0101 0000 1 --- 0111 ---- --- -----
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
new file mode 100644
index 00000000..2b4e2a9e
--- /dev/null
+++ b/target/arm/emulate/arm_emulate.c
@@ -0,0 +1,226 @@
+/*
+ * AArch64 instruction emulation for ISV=0 data aborts
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "arm_emulate.h"
+#include "target/arm/cpu.h"
+#include "exec/cpu-common.h"
+
+/* Named "DisasContext" as required by the decodetree code generator */
+typedef struct {
+ CPUState *cpu;
+ CPUARMState *env;
+ ArmEmulResult result;
+} DisasContext;
+
+#include "decode-a64-ldst.c.inc"
+
+/* GPR data access (Rt, Rs, Rt2) -- register 31 = XZR */
+
+static uint64_t gpr_read(DisasContext *ctx, int reg)
+{
+ if (reg == 31) {
+ return 0; /* XZR */
+ }
+ return ctx->env->xregs[reg];
+}
+
+static void gpr_write(DisasContext *ctx, int reg, uint64_t val)
+{
+ if (reg == 31) {
+ return; /* XZR -- discard */
+ }
+ ctx->env->xregs[reg] = val;
+ ctx->cpu->vcpu_dirty = true;
+}
+
+/* Base register access (Rn) -- register 31 = SP */
+
+static uint64_t base_read(DisasContext *ctx, int rn)
+{
+ return ctx->env->xregs[rn];
+}
+
+static void base_write(DisasContext *ctx, int rn, uint64_t val)
+{
+ ctx->env->xregs[rn] = val;
+ ctx->cpu->vcpu_dirty = true;
+}
+
+/* SIMD/FP register access */
+
+static void fpreg_read(DisasContext *ctx, int reg, void *buf, int size)
+{
+ memcpy(buf, &ctx->env->vfp.zregs[reg], size);
+}
+
+static void fpreg_write(DisasContext *ctx, int reg, const void *buf, int size)
+{
+ memset(&ctx->env->vfp.zregs[reg], 0, sizeof(ctx->env->vfp.zregs[reg]));
+ memcpy(&ctx->env->vfp.zregs[reg], buf, size);
+ ctx->cpu->vcpu_dirty = true;
+}
+
+/* Memory access wrappers */
+
+static int mem_read(DisasContext *ctx, uint64_t va, void *buf, int size)
+{
+ int ret = cpu_memory_rw_debug(ctx->cpu, va, buf, size, false);
+ if (ret != 0) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ }
+ return ret;
+}
+
+static int mem_write(DisasContext *ctx, uint64_t va, const void *buf, int size)
+{
+ int ret = cpu_memory_rw_debug(ctx->cpu, va, (void *)buf, size, true);
+ if (ret != 0) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ }
+ return ret;
+}
+
+/* Sign/zero extension helpers */
+
+static uint64_t sign_extend(uint64_t val, int from_bits)
+{
+ int shift = 64 - from_bits;
+ return (int64_t)(val << shift) >> shift;
+}
+
+/* Apply sign/zero extension */
+static uint64_t load_extend(uint64_t val, int sz, int sign, int ext)
+{
+ int data_bits = 8 << sz;
+
+ if (sign) {
+ val = sign_extend(val, data_bits);
+ if (ext) {
+ /* Sign-extend to 32 bits (W register) */
+ val &= 0xFFFFFFFF;
+ }
+ } else if (ext) {
+ /* Zero-extend to 32 bits (W register) */
+ val &= 0xFFFFFFFF;
+ }
+ return val;
+}
+
+/* Load/store single -- immediate (GPR) (DDI 0487 C3.3.8 -- C3.3.13) */
+
+static bool trans_STR_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+
+ uint64_t val = gpr_read(ctx, a->rt);
+ if (mem_write(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDR_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ val = load_extend(val, a->sz, a->sign, a->ext);
+ gpr_write(ctx, a->rt, val);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/*
+ * Load/store single -- immediate (SIMD/FP)
+ * STR_v_i / LDR_v_i (DDI 0487 C3.3.10)
+ */
+
+static bool trans_STR_v_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+
+ fpreg_read(ctx, a->rt, buf, esize);
+ if (mem_write(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDR_v_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+
+ if (mem_read(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ fpreg_write(ctx, a->rt, buf, esize);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/* PRFM, DC cache maintenance -- treated as NOP */
+static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
+{
+ (void)ctx;
+ (void)a;
+ return true;
+}
+
+/* Entry point */
+
+ArmEmulResult arm_emul_insn(CPUArchState *env, uint32_t insn)
+{
+ DisasContext ctx = {
+ .cpu = env_cpu(env),
+ .env = env,
+ .result = ARM_EMUL_OK,
+ };
+
+ if (!decode_a64_ldst(&ctx, insn)) {
+ return ARM_EMUL_UNHANDLED;
+ }
+
+ return ctx.result;
+}
diff --git a/target/arm/emulate/arm_emulate.h b/target/arm/emulate/arm_emulate.h
new file mode 100644
index 00000000..7fe29839
--- /dev/null
+++ b/target/arm/emulate/arm_emulate.h
@@ -0,0 +1,30 @@
+/*
+ * AArch64 instruction emulation library
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef ARM_EMULATE_H
+#define ARM_EMULATE_H
+
+#include "qemu/osdep.h"
+
+/**
+ * ArmEmulResult - return status from arm_emul_insn()
+ */
+typedef enum {
+ ARM_EMUL_OK, /* Instruction emulated successfully */
+ ARM_EMUL_UNHANDLED, /* Instruction not recognized by decoder */
+ ARM_EMUL_ERR_MEM, /* Memory access failed */
+} ArmEmulResult;
+
+/**
+ * arm_emul_insn - decode and emulate one AArch64 instruction
+ *
+ * Caller must synchronize CPU state and fetch @insn before calling.
+ */
+ArmEmulResult arm_emul_insn(CPUArchState *env, uint32_t insn);
+
+#endif /* ARM_EMULATE_H */
diff --git a/target/arm/emulate/meson.build b/target/arm/emulate/meson.build
new file mode 100644
index 00000000..c0b38dd1
--- /dev/null
+++ b/target/arm/emulate/meson.build
@@ -0,0 +1,6 @@
+gen_a64_ldst = decodetree.process('a64-ldst.decode',
+ extra_args: ['--static-decode=decode_a64_ldst'])
+
+arm_common_system_ss.add(when: 'TARGET_AARCH64', if_true: [
+ gen_a64_ldst, files('arm_emulate.c')
+])
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 6e0e504a..a4b2291b 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -57,6 +57,7 @@ arm_common_system_ss.add(files(
'vfp_fpscr.c',
))
+subdir('emulate')
subdir('hvf')
subdir('whpx')
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 2/6] target/arm/emulate: add load/store register offset
2026-03-15 3:41 ` [PATCH v3 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 1/6] target/arm/emulate: add ISV=0 emulation library with load/store immediate Lucas Amaral
@ 2026-03-15 3:41 ` Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 3/6] target/arm/emulate: add load/store pair Lucas Amaral
` (4 subsequent siblings)
6 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-15 3:41 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, Lucas Amaral
Add emulation for load/store register offset addressing mode
(DDI 0487 C3.3.9). The offset register value is extended via
UXTB/UXTH/UXTW/UXTX/SXTB/SXTH/SXTW/SXTX and optionally
shifted by the element size.
Instruction coverage:
- STR/LDR (GPR): register offset with extend, all sizes
- STR/LDR (SIMD/FP): register offset with extend, 8-128 bit
- PRFM register offset: NOP
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 29 ++++++++
target/arm/emulate/arm_emulate.c | 103 +++++++++++++++++++++++++++++
2 files changed, 132 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index c887dcba..af6babe1 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -10,6 +10,9 @@
# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
&ldst_imm rt rn imm sz sign w p unpriv ext u
+# Load/store register offset
+&ldst rm rn rt sign ext sz opt s
+
### Format templates
# Load/store immediate (9-bit signed)
@@ -21,6 +24,9 @@
# Load/store unsigned offset (12-bit, handler scales by << sz)
@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+# Load/store register offset
+@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
+
### Load/store register — unscaled immediate (LDUR/STUR)
# GPR
@@ -122,6 +128,29 @@ STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=
LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+### Load/store register — register offset
+
+# GPR
+STR sz:2 111 0 00 00 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+LDR 00 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=0
+LDR 01 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=1
+LDR 10 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=2
+LDR 11 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=3
+LDR 00 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=0
+LDR 01 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=1
+LDR 10 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=2
+LDR 00 111 0 00 11 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=1 sz=0
+LDR 01 111 0 00 11 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=1 sz=1
+
+# PRFM — register offset
+NOP 11 111 0 00 10 1 ----- -1- - 10 ----- -----
+
+# SIMD/FP
+STR_v sz:2 111 1 00 00 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+STR_v 00 111 1 00 10 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+LDR_v sz:2 111 1 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+LDR_v 00 111 1 00 11 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+
### System instructions — DC cache maintenance
# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index 2b4e2a9e..0e77cf33 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -200,6 +200,109 @@ static bool trans_LDR_v_i(DisasContext *ctx, arg_ldst_imm *a)
return true;
}
+/* Register offset extension (DDI 0487 C6.2.131) */
+
+static uint64_t extend_reg(uint64_t val, int option, int shift)
+{
+ switch (option) {
+ case 0: /* UXTB */
+ val = (uint8_t)val;
+ break;
+ case 1: /* UXTH */
+ val = (uint16_t)val;
+ break;
+ case 2: /* UXTW */
+ val = (uint32_t)val;
+ break;
+ case 3: /* UXTX / LSL */
+ break;
+ case 4: /* SXTB */
+ val = (int64_t)(int8_t)val;
+ break;
+ case 5: /* SXTH */
+ val = (int64_t)(int16_t)val;
+ break;
+ case 6: /* SXTW */
+ val = (int64_t)(int32_t)val;
+ break;
+ case 7: /* SXTX */
+ break;
+ }
+ return val << shift;
+}
+
+/*
+ * Load/store single -- register offset (GPR)
+ * STR / LDR (DDI 0487 C3.3.9)
+ */
+
+static bool trans_STR(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+
+ uint64_t val = gpr_read(ctx, a->rt);
+ mem_write(ctx, va, &val, esize);
+ return true;
+}
+
+static bool trans_LDR(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ val = load_extend(val, a->sz, a->sign, a->ext);
+ gpr_write(ctx, a->rt, val);
+ return true;
+}
+
+/*
+ * Load/store single -- register offset (SIMD/FP)
+ * STR_v / LDR_v (DDI 0487 C3.3.10)
+ */
+
+static bool trans_STR_v(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint8_t buf[16];
+
+ fpreg_read(ctx, a->rt, buf, esize);
+ mem_write(ctx, va, buf, esize);
+ return true;
+}
+
+static bool trans_LDR_v(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint8_t buf[16];
+
+ if (mem_read(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ fpreg_write(ctx, a->rt, buf, esize);
+ return true;
+}
+
/* PRFM, DC cache maintenance -- treated as NOP */
static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 3/6] target/arm/emulate: add load/store pair
2026-03-15 3:41 ` [PATCH v3 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 1/6] target/arm/emulate: add ISV=0 emulation library with load/store immediate Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 2/6] target/arm/emulate: add load/store register offset Lucas Amaral
@ 2026-03-15 3:41 ` Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 4/6] target/arm/emulate: add load/store exclusive Lucas Amaral
` (3 subsequent siblings)
6 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-15 3:41 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, Lucas Amaral
Add emulation for load/store pair instructions (DDI 0487 C3.3.14 --
C3.3.16). All addressing modes are covered: non-temporal (STNP/LDNP),
post-indexed, signed offset, and pre-indexed.
Instruction coverage:
- STP/LDP (GPR): 32/64-bit pairs, all addressing modes
- STP/LDP (SIMD/FP): 32/64/128-bit pairs, all addressing modes
- LDPSW: sign-extending 32-bit pair load
- STGP: store allocation tag pair (tag operation is NOP for MMIO)
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 68 ++++++++++++++++++
target/arm/emulate/arm_emulate.c | 111 +++++++++++++++++++++++++++++
2 files changed, 179 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index af6babe1..f3de3f86 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -10,6 +10,9 @@
# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
&ldst_imm rt rn imm sz sign w p unpriv ext u
+# Load/store pair (GPR and SIMD/FP)
+&ldstpair rt2 rt rn imm sz sign w p
+
# Load/store register offset
&ldst rm rn rt sign ext sz opt s
@@ -24,6 +27,9 @@
# Load/store unsigned offset (12-bit, handler scales by << sz)
@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+# Load/store pair: imm7 is signed, scaled by element size in handler
+@ldstpair .. ... . ... . imm:s7 rt2:5 rn:5 rt:5 &ldstpair
+
# Load/store register offset
@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
@@ -128,6 +134,68 @@ STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=
LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+### Load/store pair — non-temporal (STNP/LDNP)
+
+# STNP/LDNP: offset only, no writeback. Non-temporal hint ignored.
+STP 00 101 0 000 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 00 101 0 000 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP 10 101 0 000 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP 10 101 0 000 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 00 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP_v 00 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP_v 01 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP_v 01 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 10 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+LDP_v 10 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+
+### Load/store pair — post-indexed
+
+STP 00 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP 00 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP 01 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=1 w=1
+STP 10 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+LDP 10 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STP_v 00 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP_v 00 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+STP_v 01 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+LDP_v 01 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STP_v 10 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=1 w=1
+LDP_v 10 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=1 w=1
+
+### Load/store pair — signed offset
+
+STP 00 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 00 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 01 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=0 w=0
+STP 10 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP 10 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 00 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP_v 00 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP_v 01 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP_v 01 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 10 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+LDP_v 10 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+
+### Load/store pair — pre-indexed
+
+STP 00 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP 00 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP 01 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=0 w=1
+STP 10 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+LDP 10 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+STP_v 00 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP_v 00 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+STP_v 01 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+LDP_v 01 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+STP_v 10 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=1
+LDP_v 10 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=1
+
+### Load/store pair — STGP (store allocation tag + pair)
+
+STGP 01 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STGP 01 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STGP 01 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+
### Load/store register — register offset
# GPR
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index 0e77cf33..a7c62b44 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -111,6 +111,117 @@ static uint64_t load_extend(uint64_t val, int sz, int sign, int ext)
return val;
}
+/*
+ * Load/store pair: STP, LDP, STNP, LDNP, STGP, LDPSW
+ * (DDI 0487 C3.3.14 -- C3.3.16)
+ */
+
+static bool trans_STP(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz; /* 4 or 8 bytes */
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset; /* post-index: unmodified base */
+ uint8_t buf[16]; /* max 2 x 8 bytes */
+
+ uint64_t v1 = gpr_read(ctx, a->rt);
+ uint64_t v2 = gpr_read(ctx, a->rt2);
+ memcpy(buf, &v1, esize);
+ memcpy(buf + esize, &v2, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDP(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz;
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+ uint64_t v1 = 0, v2 = 0;
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ memcpy(&v1, buf, esize);
+ memcpy(&v2, buf + esize, esize);
+
+ /* LDPSW: sign-extend 32-bit values to 64-bit (sign=1, sz=2) */
+ if (a->sign) {
+ v1 = sign_extend(v1, 8 * esize);
+ v2 = sign_extend(v2, 8 * esize);
+ }
+
+ gpr_write(ctx, a->rt, v1);
+ gpr_write(ctx, a->rt2, v2);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/* STGP: tag operation is a NOP for emulation; data stored via STP */
+static bool trans_STGP(DisasContext *ctx, arg_ldstpair *a)
+{
+ return trans_STP(ctx, a);
+}
+
+/*
+ * SIMD/FP load/store pair: STP_v, LDP_v
+ * (DDI 0487 C3.3.14 -- C3.3.16)
+ */
+
+static bool trans_STP_v(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz; /* 4, 8, or 16 bytes */
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[32]; /* max 2 x 16 bytes */
+
+ fpreg_read(ctx, a->rt, buf, esize);
+ fpreg_read(ctx, a->rt2, buf + esize, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDP_v(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz;
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[32];
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ fpreg_write(ctx, a->rt, buf, esize);
+ fpreg_write(ctx, a->rt2, buf + esize, esize);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
/* Load/store single -- immediate (GPR) (DDI 0487 C3.3.8 -- C3.3.13) */
static bool trans_STR_i(DisasContext *ctx, arg_ldst_imm *a)
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 4/6] target/arm/emulate: add load/store exclusive
2026-03-15 3:41 ` [PATCH v3 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
` (2 preceding siblings ...)
2026-03-15 3:41 ` [PATCH v3 3/6] target/arm/emulate: add load/store pair Lucas Amaral
@ 2026-03-15 3:41 ` Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 5/6] target/arm/emulate: add atomic, compare-and-swap, and PAC load Lucas Amaral
` (2 subsequent siblings)
6 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-15 3:41 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, Lucas Amaral
Add emulation for load/store exclusive instructions (DDI 0487 C3.3.6).
Exclusive monitors have no meaning on emulated MMIO accesses, so STXR
always reports success (Rs=0) and LDXR does not set a monitor.
Instruction coverage:
- STXR/STLXR: exclusive store, 8/16/32/64-bit
- LDXR/LDAXR: exclusive load, 8/16/32/64-bit
- STXP/STLXP: exclusive store pair, 32/64-bit
- LDXP/LDAXP: exclusive load pair, 32/64-bit
STXP/LDXP use two explicit decode patterns (sz=2, sz=3) for the
32/64-bit size variants.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 22 +++++++++
target/arm/emulate/arm_emulate.c | 74 ++++++++++++++++++++++++++++++
2 files changed, 96 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index f3de3f86..fadf6fd2 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -10,6 +10,9 @@
# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
&ldst_imm rt rn imm sz sign w p unpriv ext u
+# Load/store exclusive
+&stxr rn rt rt2 rs sz lasr
+
# Load/store pair (GPR and SIMD/FP)
&ldstpair rt2 rt rn imm sz sign w p
@@ -18,6 +21,9 @@
### Format templates
+# Exclusives
+@stxr sz:2 ...... ... rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr
+
# Load/store immediate (9-bit signed)
@ldst_imm .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=0
@ldst_imm_pre .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=1
@@ -134,6 +140,22 @@ STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=
LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+### Load/store exclusive
+
+# STXR / STLXR (sz encodes 8/16/32/64-bit)
+STXR .. 001000 000 ..... . ..... ..... ..... @stxr
+
+# LDXR / LDAXR
+LDXR .. 001000 010 ..... . ..... ..... ..... @stxr
+
+# STXP / STLXP (bit[31]=1, bit[30]=sf → sz=2 for 32-bit, sz=3 for 64-bit)
+STXP 10 001000 001 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=2
+STXP 11 001000 001 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=3
+
+# LDXP / LDAXP
+LDXP 10 001000 011 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=2
+LDXP 11 001000 011 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=3
+
### Load/store pair — non-temporal (STNP/LDNP)
# STNP/LDNP: offset only, no writeback. Non-temporal hint ignored.
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index a7c62b44..fd567e65 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -414,6 +414,80 @@ static bool trans_LDR_v(DisasContext *ctx, arg_ldst *a)
return true;
}
+/*
+ * Load/store exclusive: STXR, LDXR, STXP, LDXP
+ * (DDI 0487 C3.3.6)
+ *
+ * Exclusive monitors have no meaning on MMIO. STXR always reports
+ * success (Rs=0) and LDXR does not set an exclusive monitor.
+ */
+
+static bool trans_STXR(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t val = gpr_read(ctx, a->rt);
+
+ if (mem_write(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ /* Report success -- no exclusive monitor on emulated access */
+ gpr_write(ctx, a->rs, 0);
+ return true;
+}
+
+static bool trans_LDXR(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, val);
+ return true;
+}
+
+static bool trans_STXP(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz; /* sz=2->4, sz=3->8 */
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+
+ uint64_t v1 = gpr_read(ctx, a->rt);
+ uint64_t v2 = gpr_read(ctx, a->rt2);
+ memcpy(buf, &v1, esize);
+ memcpy(buf + esize, &v2, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rs, 0); /* success */
+ return true;
+}
+
+static bool trans_LDXP(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+ uint64_t v1 = 0, v2 = 0;
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ memcpy(&v1, buf, esize);
+ memcpy(&v2, buf + esize, esize);
+ gpr_write(ctx, a->rt, v1);
+ gpr_write(ctx, a->rt2, v2);
+ return true;
+}
+
/* PRFM, DC cache maintenance -- treated as NOP */
static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 5/6] target/arm/emulate: add atomic, compare-and-swap, and PAC load
2026-03-15 3:41 ` [PATCH v3 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
` (3 preceding siblings ...)
2026-03-15 3:41 ` [PATCH v3 4/6] target/arm/emulate: add load/store exclusive Lucas Amaral
@ 2026-03-15 3:41 ` Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 6/6] target/arm/hvf, whpx: wire ISV=0 emulation for data aborts Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
6 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-15 3:41 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, Lucas Amaral
Add emulation for remaining ISV=0 load/store instruction classes.
Atomic memory operations (DDI 0487 C3.3.2):
- LDADD, LDCLR, LDEOR, LDSET: arithmetic/logic atomics
- LDSMAX, LDSMIN, LDUMAX, LDUMIN: signed/unsigned min/max
- SWP: atomic swap
Non-atomic read-modify-write, sufficient for MMIO where concurrent
access is not a concern. Acquire/release semantics are ignored.
Compare-and-swap (DDI 0487 C3.3.1):
- CAS/CASA/CASAL/CASL: single-register compare-and-swap
- CASP/CASPA/CASPAL/CASPL: register-pair compare-and-swap
CASP validates even register pairs; odd or r31 returns UNHANDLED.
Load with PAC (DDI 0487 C6.2.121):
- LDRAA/LDRAB: pointer-authenticated load, offset/pre-indexed
Pointer authentication is not emulated (equivalent to auth always
succeeding), which is correct for MMIO since PAC is a software
security mechanism, not a memory access semantic.
CASP uses two explicit decode patterns for the 32/64-bit size
variants. LDRA's offset immediate is stored raw in the decode;
the handler scales by << 3.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 45 ++++++
target/arm/emulate/arm_emulate.c | 233 +++++++++++++++++++++++++++++
2 files changed, 278 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index fadf6fd2..9292bfdf 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -16,6 +16,16 @@
# Load/store pair (GPR and SIMD/FP)
&ldstpair rt2 rt rn imm sz sign w p
+# Atomic memory operations
+&atomic rs rn rt a r sz
+
+# Compare-and-swap
+&cas rs rn rt sz a r
+
+# Load with PAC (LDRAA/LDRAB, FEAT_PAuth)
+%ldra_imm 22:s1 12:9
+&ldra rt rn imm m w
+
# Load/store register offset
&ldst rm rn rt sign ext sz opt s
@@ -36,6 +46,15 @@
# Load/store pair: imm7 is signed, scaled by element size in handler
@ldstpair .. ... . ... . imm:s7 rt2:5 rn:5 rt:5 &ldstpair
+# Atomics
+@atomic sz:2 ... . .. a:1 r:1 . rs:5 . ... .. rn:5 rt:5 &atomic
+
+# Compare-and-swap: sz extracted by pattern (CAS) or set constant (CASP)
+@cas .. ...... . a:1 . rs:5 r:1 ..... rn:5 rt:5 &cas
+
+# Load with PAC
+@ldra .. ... . .. m:1 . . ......... w:1 . rn:5 rt:5 &ldra imm=%ldra_imm
+
# Load/store register offset
@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
@@ -241,6 +260,32 @@ STR_v 00 111 1 00 10 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=
LDR_v sz:2 111 1 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
LDR_v 00 111 1 00 11 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+### Compare-and-swap
+
+# CAS / CASA / CASAL / CASL
+CAS sz:2 001000 1 . 1 ..... . 11111 ..... ..... @cas
+
+# CASP / CASPA / CASPAL / CASPL (pair: Rt,Rt+1 and Rs,Rs+1)
+CASP 00 001000 0 . 1 ..... . 11111 ..... ..... @cas sz=2
+CASP 01 001000 0 . 1 ..... . 11111 ..... ..... @cas sz=3
+
+### Atomic memory operations
+
+LDADD .. 111 0 00 . . 1 ..... 0000 00 ..... ..... @atomic
+LDCLR .. 111 0 00 . . 1 ..... 0001 00 ..... ..... @atomic
+LDEOR .. 111 0 00 . . 1 ..... 0010 00 ..... ..... @atomic
+LDSET .. 111 0 00 . . 1 ..... 0011 00 ..... ..... @atomic
+LDSMAX .. 111 0 00 . . 1 ..... 0100 00 ..... ..... @atomic
+LDSMIN .. 111 0 00 . . 1 ..... 0101 00 ..... ..... @atomic
+LDUMAX .. 111 0 00 . . 1 ..... 0110 00 ..... ..... @atomic
+LDUMIN .. 111 0 00 . . 1 ..... 0111 00 ..... ..... @atomic
+SWP .. 111 0 00 . . 1 ..... 1000 00 ..... ..... @atomic
+
+### Load with PAC (FEAT_PAuth)
+
+# LDRAA (M=0) / LDRAB (M=1), offset (W=0) / pre-indexed (W=1)
+LDRA 11 111 0 00 . . 1 ......... . 1 ..... ..... @ldra
+
### System instructions — DC cache maintenance
# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index fd567e65..1b959745 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -488,6 +488,239 @@ static bool trans_LDXP(DisasContext *ctx, arg_stxr *a)
return true;
}
+/*
+ * Atomic memory operations (DDI 0487 C3.3.2)
+ *
+ * Non-atomic read-modify-write; sufficient for MMIO.
+ * Acquire/release semantics ignored (sequentially consistent by design).
+ */
+
+typedef uint64_t (*atomic_op_fn)(uint64_t old, uint64_t operand, int bits);
+
+static uint64_t atomic_add(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old + op;
+}
+
+static uint64_t atomic_clr(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old & ~op;
+}
+
+static uint64_t atomic_eor(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old ^ op;
+}
+
+static uint64_t atomic_set(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old | op;
+}
+
+static uint64_t atomic_smax(uint64_t old, uint64_t op, int bits)
+{
+ int64_t a = sign_extend(old, bits);
+ int64_t b = sign_extend(op, bits);
+ return (a >= b) ? old : op;
+}
+
+static uint64_t atomic_smin(uint64_t old, uint64_t op, int bits)
+{
+ int64_t a = sign_extend(old, bits);
+ int64_t b = sign_extend(op, bits);
+ return (a <= b) ? old : op;
+}
+
+static uint64_t atomic_umax(uint64_t old, uint64_t op, int bits)
+{
+ uint64_t mask = (bits == 64) ? UINT64_MAX : (1ULL << bits) - 1;
+ return ((old & mask) >= (op & mask)) ? old : op;
+}
+
+static uint64_t atomic_umin(uint64_t old, uint64_t op, int bits)
+{
+ uint64_t mask = (bits == 64) ? UINT64_MAX : (1ULL << bits) - 1;
+ return ((old & mask) <= (op & mask)) ? old : op;
+}
+
+static bool do_atomic(DisasContext *ctx, arg_atomic *a, atomic_op_fn fn)
+{
+ int esize = 1 << a->sz;
+ int bits = 8 * esize;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t old = 0;
+
+ if (mem_read(ctx, va, &old, esize) != 0) {
+ return true;
+ }
+
+ uint64_t operand = gpr_read(ctx, a->rs);
+ uint64_t result = fn(old, operand, bits);
+
+ if (mem_write(ctx, va, &result, esize) != 0) {
+ return true;
+ }
+
+ /* Rt receives the old value (before modification) */
+ gpr_write(ctx, a->rt, old);
+ return true;
+}
+
+static bool trans_LDADD(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_add);
+}
+
+static bool trans_LDCLR(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_clr);
+}
+
+static bool trans_LDEOR(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_eor);
+}
+
+static bool trans_LDSET(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_set);
+}
+
+static bool trans_LDSMAX(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_smax);
+}
+
+static bool trans_LDSMIN(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_smin);
+}
+
+static bool trans_LDUMAX(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_umax);
+}
+
+static bool trans_LDUMIN(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_umin);
+}
+
+static bool trans_SWP(DisasContext *ctx, arg_atomic *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t old = 0;
+
+ if (mem_read(ctx, va, &old, esize) != 0) {
+ return true;
+ }
+
+ uint64_t newval = gpr_read(ctx, a->rs);
+ if (mem_write(ctx, va, &newval, esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, old);
+ return true;
+}
+
+/* Compare-and-swap: CAS, CASP (DDI 0487 C3.3.1) */
+
+static bool trans_CAS(DisasContext *ctx, arg_cas *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t current = 0;
+
+ if (mem_read(ctx, va, ¤t, esize) != 0) {
+ return true;
+ }
+
+ uint64_t mask = (esize == 8) ? UINT64_MAX : (1ULL << (8 * esize)) - 1;
+ uint64_t compare = gpr_read(ctx, a->rs) & mask;
+
+ if ((current & mask) == compare) {
+ uint64_t newval = gpr_read(ctx, a->rt) & mask;
+ if (mem_write(ctx, va, &newval, esize) != 0) {
+ return true;
+ }
+ }
+
+ /* Rs receives the old memory value (whether or not swap occurred) */
+ gpr_write(ctx, a->rs, current);
+ return true;
+}
+
+/* CASP: compare-and-swap pair (Rs,Rs+1 compared; Rt,Rt+1 stored) */
+static bool trans_CASP(DisasContext *ctx, arg_cas *a)
+{
+ /* CASP requires even register pairs; odd or r31 is UNPREDICTABLE */
+ if ((a->rs & 1) || a->rs >= 31 || (a->rt & 1) || a->rt >= 31) {
+ return false;
+ }
+
+ int esize = 1 << a->sz; /* per-register size */
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+ uint64_t cur1 = 0, cur2 = 0;
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ memcpy(&cur1, buf, esize);
+ memcpy(&cur2, buf + esize, esize);
+
+ uint64_t mask = (esize == 8) ? UINT64_MAX : (1ULL << (8 * esize)) - 1;
+ uint64_t cmp1 = gpr_read(ctx, a->rs) & mask;
+ uint64_t cmp2 = gpr_read(ctx, a->rs + 1) & mask;
+
+ if ((cur1 & mask) == cmp1 && (cur2 & mask) == cmp2) {
+ uint64_t new1 = gpr_read(ctx, a->rt) & mask;
+ uint64_t new2 = gpr_read(ctx, a->rt + 1) & mask;
+ memcpy(buf, &new1, esize);
+ memcpy(buf + esize, &new2, esize);
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ }
+
+ gpr_write(ctx, a->rs, cur1);
+ gpr_write(ctx, a->rs + 1, cur2);
+ return true;
+}
+
+/*
+ * Load with PAC: LDRAA / LDRAB (FEAT_PAuth)
+ * (DDI 0487 C6.2.121)
+ *
+ * Pointer authentication is not emulated -- the base register is used
+ * directly (equivalent to auth always succeeding).
+ */
+
+static bool trans_LDRA(DisasContext *ctx, arg_ldra *a)
+{
+ int64_t offset = (int64_t)a->imm << 3; /* S:imm9, scaled by 8 */
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = base + offset; /* auth not emulated */
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, 8) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, val);
+
+ if (a->w) {
+ base_write(ctx, a->rn, va);
+ }
+ return true;
+}
+
/* PRFM, DC cache maintenance -- treated as NOP */
static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 6/6] target/arm/hvf, whpx: wire ISV=0 emulation for data aborts
2026-03-15 3:41 ` [PATCH v3 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
` (4 preceding siblings ...)
2026-03-15 3:41 ` [PATCH v3 5/6] target/arm/emulate: add atomic, compare-and-swap, and PAC load Lucas Amaral
@ 2026-03-15 3:41 ` Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
6 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-15 3:41 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, Lucas Amaral
When a data abort with ISV=0 occurs during MMIO emulation, the
syndrome register does not carry the access size or target register.
Previously this hit an assert(isv) and killed the VM.
Replace the assert with instruction fetch + decode + emulate using the
shared library in target/arm/emulate/. The faulting instruction is read
from guest memory via cpu_memory_rw_debug(), decoded by the decodetree-
generated decoder, and emulated against the vCPU register file.
Both HVF (macOS) and WHPX (Windows Hyper-V) use the same pattern:
1. cpu_synchronize_state() to flush hypervisor registers
2. Fetch 4-byte instruction at env->pc
3. arm_emul_insn(env, insn)
4. Log errors for unhandled/memory-fault cases, advance PC
This makes ISV=0 data aborts non-fatal, enabling MMIO access from
SIMD/FP loads, load/store pairs, atomics, and other instructions
that hardware does not decode into the syndrome.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/hvf/hvf.c | 41 +++++++++++++++++++++++++++++++++++---
target/arm/whpx/whpx-all.c | 39 +++++++++++++++++++++++++++++++++++-
2 files changed, 76 insertions(+), 4 deletions(-)
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 5fc8f6bb..219dbbca 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -32,6 +32,7 @@
#include "arm-powerctl.h"
#include "target/arm/cpu.h"
#include "target/arm/internals.h"
+#include "emulate/arm_emulate.h"
#include "target/arm/multiprocessing.h"
#include "target/arm/gtimer.h"
#include "target/arm/trace.h"
@@ -2175,10 +2176,44 @@ static int hvf_handle_exception(CPUState *cpu, hv_vcpu_exit_exception_t *excp)
assert(!s1ptw);
/*
- * TODO: ISV will be 0 for SIMD or SVE accesses.
- * Inject the exception into the guest.
+ * ISV=0: syndrome doesn't carry access size/register info.
+ * Fetch and emulate via target/arm/emulate/.
+ * Unhandled instructions log an error and advance PC.
*/
- assert(isv);
+ if (!isv) {
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ uint32_t insn;
+ ArmEmulResult r;
+
+ cpu_synchronize_state(cpu);
+
+ if (cpu_memory_rw_debug(cpu, env->pc,
+ (uint8_t *)&insn, 4, false) != 0) {
+ error_report("HVF: cannot read insn at pc=0x%" PRIx64,
+ (uint64_t)env->pc);
+ advance_pc = true;
+ break;
+ }
+
+ r = arm_emul_insn(env, insn);
+ if (r == ARM_EMUL_UNHANDLED) {
+ /*
+ * TODO: Inject data abort into guest instead of
+ * advancing PC. Requires setting ESR_EL1/FAR_EL1/
+ * ELR_EL1/SPSR_EL1 and redirecting to VBAR_EL1.
+ */
+ error_report("HVF: ISV=0 unhandled insn 0x%08x at "
+ "pc=0x%" PRIx64, insn, (uint64_t)env->pc);
+ } else if (r == ARM_EMUL_ERR_MEM) {
+ error_report("HVF: ISV=0 memory error emulating "
+ "insn 0x%08x at pc=0x%" PRIx64,
+ insn, (uint64_t)env->pc);
+ }
+
+ advance_pc = true;
+ break;
+ }
/*
* Emulate MMIO.
diff --git a/target/arm/whpx/whpx-all.c b/target/arm/whpx/whpx-all.c
index 513551be..2f8ffc7f 100644
--- a/target/arm/whpx/whpx-all.c
+++ b/target/arm/whpx/whpx-all.c
@@ -29,6 +29,7 @@
#include "syndrome.h"
#include "target/arm/cpregs.h"
#include "internals.h"
+#include "emulate/arm_emulate.h"
#include "system/whpx-internal.h"
#include "system/whpx-accel-ops.h"
@@ -366,7 +367,43 @@ static int whpx_handle_mmio(CPUState *cpu, WHV_MEMORY_ACCESS_CONTEXT *ctx)
uint64_t val = 0;
assert(!cm);
- assert(isv);
+
+ /*
+ * ISV=0: syndrome doesn't carry access size/register info.
+ * Fetch and decode the faulting instruction via the emulation library.
+ */
+ if (!isv) {
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ uint32_t insn;
+ ArmEmulResult r;
+
+ cpu_synchronize_state(cpu);
+
+ if (cpu_memory_rw_debug(cpu, env->pc,
+ (uint8_t *)&insn, 4, false) != 0) {
+ error_report("WHPX: cannot read insn at pc=0x%" PRIx64,
+ (uint64_t)env->pc);
+ return 0;
+ }
+
+ r = arm_emul_insn(env, insn);
+ if (r == ARM_EMUL_UNHANDLED) {
+ /*
+ * TODO: Inject data abort into guest instead of
+ * advancing PC. Requires setting ESR_EL1/FAR_EL1/
+ * ELR_EL1/SPSR_EL1 and redirecting to VBAR_EL1.
+ */
+ error_report("WHPX: ISV=0 unhandled insn 0x%08x at "
+ "pc=0x%" PRIx64, insn, (uint64_t)env->pc);
+ } else if (r == ARM_EMUL_ERR_MEM) {
+ error_report("WHPX: ISV=0 memory error emulating "
+ "insn 0x%08x at pc=0x%" PRIx64,
+ insn, (uint64_t)env->pc);
+ }
+
+ return 0;
+ }
if (iswrite) {
val = whpx_get_gp_reg(cpu, srt);
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library
2026-03-15 3:41 ` [PATCH v3 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
` (5 preceding siblings ...)
2026-03-15 3:41 ` [PATCH v3 6/6] target/arm/hvf, whpx: wire ISV=0 emulation for data aborts Lucas Amaral
@ 2026-03-16 2:50 ` Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 1/6] target/arm/emulate: add ISV=0 emulation library with load/store immediate Lucas Amaral
` (6 more replies)
6 siblings, 7 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-16 2:50 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, peter.maydell, mohamed, Lucas Amaral
Add a shared emulation library for AArch64 load/store instructions that
cause ISV=0 data aborts under hardware virtualization, and wire it into
HVF (macOS) and WHPX (Windows).
When the Instruction Syndrome Valid bit is clear, the hypervisor cannot
determine the faulting instruction's target register or access size from
the syndrome alone. This previously hit an assert(isv) and killed the
VM. The library fetches and decodes the faulting instruction using a
decodetree-generated decoder, then emulates it directly against the vCPU
register file and memory.
The library uses its own a64-ldst.decode rather than sharing
target/arm/tcg/a64.decode — TCG's trans_* functions emit IR into a
translation block, while this library's execute directly. Decode
patterns are kept consistent with TCG's where possible.
Changes since v3:
- Inject synchronous external abort (matching kvm_inject_arm_sea()
syndrome) on unhandled instruction or memory error, instead of
silently advancing PC or returning an error.
- Fix WHPX advance_pc bug: error paths no longer advance PC.
- Add page-crossing guard in mem_read/mem_write to prevent partial
side effects from cpu_memory_rw_debug().
Changes since v2:
- Split monolithic patch into 6 incremental patches: framework, then
one patch per coherent instruction group (Peter)
- Removed per-backend callback ops; library uses CPUArchState directly
with cpu_memory_rw_debug() for memory access (Mohamed)
- Removed mock unit tests (Mohamed; kvm-unit-tests is the right
vehicle for decoder validation)
- Added architectural justification for separate decode file
Lucas Amaral (6):
target/arm/emulate: add ISV=0 emulation library with load/store
immediate
target/arm/emulate: add load/store register offset
target/arm/emulate: add load/store pair
target/arm/emulate: add load/store exclusive
target/arm/emulate: add atomic, compare-and-swap, and PAC load
target/arm/hvf,whpx: wire ISV=0 emulation for data aborts
target/arm/emulate/a64-ldst.decode | 293 +++++++++++
target/arm/emulate/arm_emulate.c | 758 +++++++++++++++++++++++++++++
target/arm/emulate/arm_emulate.h | 30 ++
target/arm/emulate/meson.build | 6 +
target/arm/hvf/hvf.c | 46 +-
target/arm/meson.build | 1 +
target/arm/whpx/whpx-all.c | 61 ++-
7 files changed, 1191 insertions(+), 4 deletions(-)
create mode 100644 target/arm/emulate/a64-ldst.decode
create mode 100644 target/arm/emulate/arm_emulate.c
create mode 100644 target/arm/emulate/arm_emulate.h
create mode 100644 target/arm/emulate/meson.build
--
2.52.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 1/6] target/arm/emulate: add ISV=0 emulation library with load/store immediate
2026-03-16 2:50 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
@ 2026-03-16 2:50 ` Lucas Amaral
2026-03-19 22:00 ` Richard Henderson
2026-03-16 2:50 ` [PATCH v4 2/6] target/arm/emulate: add load/store register offset Lucas Amaral
` (5 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Lucas Amaral @ 2026-03-16 2:50 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, peter.maydell, mohamed, Lucas Amaral
Add a shared emulation library for AArch64 load/store instructions that
cause ISV=0 data aborts under hardware virtualization (HVF, WHPX).
When the Instruction Syndrome Valid bit is clear, the hypervisor cannot
determine the faulting instruction's target register or access size from
the syndrome alone. This library fetches and decodes the instruction
using a decodetree-generated decoder, then emulates it by accessing the
vCPU's register file (CPUARMState) and memory (cpu_memory_rw_debug)
directly.
This patch establishes the framework and adds load/store single with
immediate addressing — the most common ISV=0 trigger. Subsequent
patches add register-offset, pair, exclusive, and atomic instructions.
Instruction coverage:
- STR/LDR (GPR): unscaled, post-indexed, unprivileged, pre-indexed,
unsigned offset — all sizes (8/16/32/64-bit), sign/zero extension
- STR/LDR (SIMD/FP): same addressing modes, 8-128 bit elements
- PRFM: prefetch treated as NOP
- DC cache maintenance (SYS CRn=C7): NOP on MMIO
This library uses its own a64-ldst.decode rather than sharing
target/arm/tcg/a64.decode. TCG's trans_* functions are a compiler:
they emit IR ops into a translation block for later execution. This
library's trans_* functions are an interpreter: they execute directly
against the vCPU register file and memory. The decodetree-generated
dispatcher calls trans_* by name, so both cannot coexist in the same
translation unit. Decode patterns are kept consistent with TCG's
where possible.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 129 ++++++++++++++++
target/arm/emulate/arm_emulate.c | 237 +++++++++++++++++++++++++++++
target/arm/emulate/arm_emulate.h | 30 ++++
target/arm/emulate/meson.build | 6 +
target/arm/meson.build | 1 +
5 files changed, 403 insertions(+)
create mode 100644 target/arm/emulate/a64-ldst.decode
create mode 100644 target/arm/emulate/arm_emulate.c
create mode 100644 target/arm/emulate/arm_emulate.h
create mode 100644 target/arm/emulate/meson.build
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
new file mode 100644
index 00000000..c887dcba
--- /dev/null
+++ b/target/arm/emulate/a64-ldst.decode
@@ -0,0 +1,129 @@
+# AArch64 load/store instruction patterns for ISV=0 emulation
+#
+# Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+### Argument sets
+
+# Load/store immediate (unscaled, pre/post-index, unprivileged, unsigned offset)
+# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
+&ldst_imm rt rn imm sz sign w p unpriv ext u
+
+### Format templates
+
+# Load/store immediate (9-bit signed)
+@ldst_imm .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=0
+@ldst_imm_pre .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=1
+@ldst_imm_post .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=1 w=1
+@ldst_imm_user .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=1 p=0 w=0
+
+# Load/store unsigned offset (12-bit, handler scales by << sz)
+@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+
+### Load/store register — unscaled immediate (LDUR/STUR)
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=4
+
+### Load/store register — post-indexed
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=4
+
+### Load/store register — unprivileged
+
+# GPR only (no SIMD/FP unprivileged forms)
+STR_i sz:2 111 0 00 00 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=1 sz=1
+
+### Load/store register — pre-indexed
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=4
+
+### PRFM — unscaled immediate: prefetch is a NOP
+
+NOP 11 111 0 00 10 0 --------- 00 ----- -----
+
+### Load/store register — unsigned offset
+
+# GPR
+STR_i sz:2 111 0 01 00 ............ ..... ..... @ldst_uimm sign=0 ext=0
+LDR_i 00 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=0
+LDR_i 01 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=1
+LDR_i 10 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=2
+LDR_i 11 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=3
+LDR_i 00 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=0
+LDR_i 01 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=1
+LDR_i 10 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=2
+LDR_i 00 111 0 01 11 ............ ..... ..... @ldst_uimm sign=1 ext=1 sz=0
+LDR_i 01 111 0 01 11 ............ ..... ..... @ldst_uimm sign=1 ext=1 sz=1
+
+# PRFM — unsigned offset
+NOP 11 111 0 01 10 ------------ ----- -----
+
+# SIMD/FP
+STR_v_i sz:2 111 1 01 00 ............ ..... ..... @ldst_uimm sign=0 ext=0
+STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
+LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+
+### System instructions — DC cache maintenance
+
+# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
+# On MMIO regions, cache maintenance is a harmless no-op.
+NOP 1101 0101 0000 1 --- 0111 ---- --- -----
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
new file mode 100644
index 00000000..02fefc30
--- /dev/null
+++ b/target/arm/emulate/arm_emulate.c
@@ -0,0 +1,237 @@
+/*
+ * AArch64 instruction emulation for ISV=0 data aborts
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "arm_emulate.h"
+#include "target/arm/cpu.h"
+#include "exec/cpu-common.h"
+#include "exec/target_page.h"
+
+/* TODO: assumes LE guest data layout (sufficient for HVF/WHPX, both LE-only) */
+
+/* Named "DisasContext" as required by the decodetree code generator */
+typedef struct {
+ CPUState *cpu;
+ CPUARMState *env;
+ ArmEmulResult result;
+} DisasContext;
+
+#include "decode-a64-ldst.c.inc"
+
+/* GPR data access (Rt, Rs, Rt2) -- register 31 = XZR */
+
+static uint64_t gpr_read(DisasContext *ctx, int reg)
+{
+ if (reg == 31) {
+ return 0; /* XZR */
+ }
+ return ctx->env->xregs[reg];
+}
+
+static void gpr_write(DisasContext *ctx, int reg, uint64_t val)
+{
+ if (reg == 31) {
+ return; /* XZR -- discard */
+ }
+ ctx->env->xregs[reg] = val;
+ ctx->cpu->vcpu_dirty = true;
+}
+
+/* Base register access (Rn) -- register 31 = SP */
+
+static uint64_t base_read(DisasContext *ctx, int rn)
+{
+ return ctx->env->xregs[rn];
+}
+
+static void base_write(DisasContext *ctx, int rn, uint64_t val)
+{
+ ctx->env->xregs[rn] = val;
+ ctx->cpu->vcpu_dirty = true;
+}
+
+/* SIMD/FP register access */
+
+static void fpreg_read(DisasContext *ctx, int reg, void *buf, int size)
+{
+ memcpy(buf, &ctx->env->vfp.zregs[reg], size);
+}
+
+static void fpreg_write(DisasContext *ctx, int reg, const void *buf, int size)
+{
+ memset(&ctx->env->vfp.zregs[reg], 0, sizeof(ctx->env->vfp.zregs[reg]));
+ memcpy(&ctx->env->vfp.zregs[reg], buf, size);
+ ctx->cpu->vcpu_dirty = true;
+}
+
+/* Memory access wrappers */
+
+static int mem_read(DisasContext *ctx, uint64_t va, void *buf, int size)
+{
+ if (((va & ~TARGET_PAGE_MASK) + size) > TARGET_PAGE_SIZE) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ return -1;
+ }
+ int ret = cpu_memory_rw_debug(ctx->cpu, va, buf, size, false);
+ if (ret != 0) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ }
+ return ret;
+}
+
+static int mem_write(DisasContext *ctx, uint64_t va, const void *buf, int size)
+{
+ if (((va & ~TARGET_PAGE_MASK) + size) > TARGET_PAGE_SIZE) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ return -1;
+ }
+ int ret = cpu_memory_rw_debug(ctx->cpu, va, (void *)buf, size, true);
+ if (ret != 0) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ }
+ return ret;
+}
+
+/* Sign/zero extension helpers */
+
+static uint64_t sign_extend(uint64_t val, int from_bits)
+{
+ int shift = 64 - from_bits;
+ return (int64_t)(val << shift) >> shift;
+}
+
+/* Apply sign/zero extension */
+static uint64_t load_extend(uint64_t val, int sz, int sign, int ext)
+{
+ int data_bits = 8 << sz;
+
+ if (sign) {
+ val = sign_extend(val, data_bits);
+ if (ext) {
+ /* Sign-extend to 32 bits (W register) */
+ val &= 0xFFFFFFFF;
+ }
+ } else if (ext) {
+ /* Zero-extend to 32 bits (W register) */
+ val &= 0xFFFFFFFF;
+ }
+ return val;
+}
+
+/* Load/store single -- immediate (GPR) (DDI 0487 C3.3.8 -- C3.3.13) */
+
+static bool trans_STR_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+
+ uint64_t val = gpr_read(ctx, a->rt);
+ if (mem_write(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDR_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ val = load_extend(val, a->sz, a->sign, a->ext);
+ gpr_write(ctx, a->rt, val);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/*
+ * Load/store single -- immediate (SIMD/FP)
+ * STR_v_i / LDR_v_i (DDI 0487 C3.3.10)
+ */
+
+static bool trans_STR_v_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+
+ fpreg_read(ctx, a->rt, buf, esize);
+ if (mem_write(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDR_v_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+
+ if (mem_read(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ fpreg_write(ctx, a->rt, buf, esize);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/* PRFM, DC cache maintenance -- treated as NOP */
+static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
+{
+ (void)ctx;
+ (void)a;
+ return true;
+}
+
+/* Entry point */
+
+ArmEmulResult arm_emul_insn(CPUArchState *env, uint32_t insn)
+{
+ DisasContext ctx = {
+ .cpu = env_cpu(env),
+ .env = env,
+ .result = ARM_EMUL_OK,
+ };
+
+ if (!decode_a64_ldst(&ctx, insn)) {
+ return ARM_EMUL_UNHANDLED;
+ }
+
+ return ctx.result;
+}
diff --git a/target/arm/emulate/arm_emulate.h b/target/arm/emulate/arm_emulate.h
new file mode 100644
index 00000000..7fe29839
--- /dev/null
+++ b/target/arm/emulate/arm_emulate.h
@@ -0,0 +1,30 @@
+/*
+ * AArch64 instruction emulation library
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef ARM_EMULATE_H
+#define ARM_EMULATE_H
+
+#include "qemu/osdep.h"
+
+/**
+ * ArmEmulResult - return status from arm_emul_insn()
+ */
+typedef enum {
+ ARM_EMUL_OK, /* Instruction emulated successfully */
+ ARM_EMUL_UNHANDLED, /* Instruction not recognized by decoder */
+ ARM_EMUL_ERR_MEM, /* Memory access failed */
+} ArmEmulResult;
+
+/**
+ * arm_emul_insn - decode and emulate one AArch64 instruction
+ *
+ * Caller must synchronize CPU state and fetch @insn before calling.
+ */
+ArmEmulResult arm_emul_insn(CPUArchState *env, uint32_t insn);
+
+#endif /* ARM_EMULATE_H */
diff --git a/target/arm/emulate/meson.build b/target/arm/emulate/meson.build
new file mode 100644
index 00000000..c0b38dd1
--- /dev/null
+++ b/target/arm/emulate/meson.build
@@ -0,0 +1,6 @@
+gen_a64_ldst = decodetree.process('a64-ldst.decode',
+ extra_args: ['--static-decode=decode_a64_ldst'])
+
+arm_common_system_ss.add(when: 'TARGET_AARCH64', if_true: [
+ gen_a64_ldst, files('arm_emulate.c')
+])
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 6e0e504a..a4b2291b 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -57,6 +57,7 @@ arm_common_system_ss.add(files(
'vfp_fpscr.c',
))
+subdir('emulate')
subdir('hvf')
subdir('whpx')
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 2/6] target/arm/emulate: add load/store register offset
2026-03-16 2:50 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 1/6] target/arm/emulate: add ISV=0 emulation library with load/store immediate Lucas Amaral
@ 2026-03-16 2:50 ` Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 3/6] target/arm/emulate: add load/store pair Lucas Amaral
` (4 subsequent siblings)
6 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-16 2:50 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, peter.maydell, mohamed, Lucas Amaral
Add emulation for load/store register offset addressing mode
(DDI 0487 C3.3.9). The offset register value is extended via
UXTB/UXTH/UXTW/UXTX/SXTB/SXTH/SXTW/SXTX and optionally
shifted by the element size.
Instruction coverage:
- STR/LDR (GPR): register offset with extend, all sizes
- STR/LDR (SIMD/FP): register offset with extend, 8-128 bit
- PRFM register offset: NOP
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 29 ++++++++
target/arm/emulate/arm_emulate.c | 103 +++++++++++++++++++++++++++++
2 files changed, 132 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index c887dcba..af6babe1 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -10,6 +10,9 @@
# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
&ldst_imm rt rn imm sz sign w p unpriv ext u
+# Load/store register offset
+&ldst rm rn rt sign ext sz opt s
+
### Format templates
# Load/store immediate (9-bit signed)
@@ -21,6 +24,9 @@
# Load/store unsigned offset (12-bit, handler scales by << sz)
@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+# Load/store register offset
+@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
+
### Load/store register — unscaled immediate (LDUR/STUR)
# GPR
@@ -122,6 +128,29 @@ STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=
LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+### Load/store register — register offset
+
+# GPR
+STR sz:2 111 0 00 00 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+LDR 00 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=0
+LDR 01 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=1
+LDR 10 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=2
+LDR 11 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=3
+LDR 00 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=0
+LDR 01 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=1
+LDR 10 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=2
+LDR 00 111 0 00 11 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=1 sz=0
+LDR 01 111 0 00 11 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=1 sz=1
+
+# PRFM — register offset
+NOP 11 111 0 00 10 1 ----- -1- - 10 ----- -----
+
+# SIMD/FP
+STR_v sz:2 111 1 00 00 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+STR_v 00 111 1 00 10 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+LDR_v sz:2 111 1 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+LDR_v 00 111 1 00 11 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+
### System instructions — DC cache maintenance
# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index 02fefc30..bf09e2a6 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -211,6 +211,109 @@ static bool trans_LDR_v_i(DisasContext *ctx, arg_ldst_imm *a)
return true;
}
+/* Register offset extension (DDI 0487 C6.2.131) */
+
+static uint64_t extend_reg(uint64_t val, int option, int shift)
+{
+ switch (option) {
+ case 0: /* UXTB */
+ val = (uint8_t)val;
+ break;
+ case 1: /* UXTH */
+ val = (uint16_t)val;
+ break;
+ case 2: /* UXTW */
+ val = (uint32_t)val;
+ break;
+ case 3: /* UXTX / LSL */
+ break;
+ case 4: /* SXTB */
+ val = (int64_t)(int8_t)val;
+ break;
+ case 5: /* SXTH */
+ val = (int64_t)(int16_t)val;
+ break;
+ case 6: /* SXTW */
+ val = (int64_t)(int32_t)val;
+ break;
+ case 7: /* SXTX */
+ break;
+ }
+ return val << shift;
+}
+
+/*
+ * Load/store single -- register offset (GPR)
+ * STR / LDR (DDI 0487 C3.3.9)
+ */
+
+static bool trans_STR(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+
+ uint64_t val = gpr_read(ctx, a->rt);
+ mem_write(ctx, va, &val, esize);
+ return true;
+}
+
+static bool trans_LDR(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ val = load_extend(val, a->sz, a->sign, a->ext);
+ gpr_write(ctx, a->rt, val);
+ return true;
+}
+
+/*
+ * Load/store single -- register offset (SIMD/FP)
+ * STR_v / LDR_v (DDI 0487 C3.3.10)
+ */
+
+static bool trans_STR_v(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint8_t buf[16];
+
+ fpreg_read(ctx, a->rt, buf, esize);
+ mem_write(ctx, va, buf, esize);
+ return true;
+}
+
+static bool trans_LDR_v(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint8_t buf[16];
+
+ if (mem_read(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ fpreg_write(ctx, a->rt, buf, esize);
+ return true;
+}
+
/* PRFM, DC cache maintenance -- treated as NOP */
static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 3/6] target/arm/emulate: add load/store pair
2026-03-16 2:50 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 1/6] target/arm/emulate: add ISV=0 emulation library with load/store immediate Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 2/6] target/arm/emulate: add load/store register offset Lucas Amaral
@ 2026-03-16 2:50 ` Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 4/6] target/arm/emulate: add load/store exclusive Lucas Amaral
` (3 subsequent siblings)
6 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-16 2:50 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, peter.maydell, mohamed, Lucas Amaral
Add emulation for load/store pair instructions (DDI 0487 C3.3.14 --
C3.3.16). All addressing modes are covered: non-temporal (STNP/LDNP),
post-indexed, signed offset, and pre-indexed.
Instruction coverage:
- STP/LDP (GPR): 32/64-bit pairs, all addressing modes
- STP/LDP (SIMD/FP): 32/64/128-bit pairs, all addressing modes
- LDPSW: sign-extending 32-bit pair load
- STGP: store allocation tag pair (tag operation is NOP for MMIO)
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 68 ++++++++++++++++++
target/arm/emulate/arm_emulate.c | 111 +++++++++++++++++++++++++++++
2 files changed, 179 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index af6babe1..f3de3f86 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -10,6 +10,9 @@
# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
&ldst_imm rt rn imm sz sign w p unpriv ext u
+# Load/store pair (GPR and SIMD/FP)
+&ldstpair rt2 rt rn imm sz sign w p
+
# Load/store register offset
&ldst rm rn rt sign ext sz opt s
@@ -24,6 +27,9 @@
# Load/store unsigned offset (12-bit, handler scales by << sz)
@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+# Load/store pair: imm7 is signed, scaled by element size in handler
+@ldstpair .. ... . ... . imm:s7 rt2:5 rn:5 rt:5 &ldstpair
+
# Load/store register offset
@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
@@ -128,6 +134,68 @@ STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=
LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+### Load/store pair — non-temporal (STNP/LDNP)
+
+# STNP/LDNP: offset only, no writeback. Non-temporal hint ignored.
+STP 00 101 0 000 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 00 101 0 000 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP 10 101 0 000 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP 10 101 0 000 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 00 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP_v 00 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP_v 01 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP_v 01 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 10 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+LDP_v 10 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+
+### Load/store pair — post-indexed
+
+STP 00 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP 00 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP 01 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=1 w=1
+STP 10 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+LDP 10 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STP_v 00 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP_v 00 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+STP_v 01 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+LDP_v 01 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STP_v 10 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=1 w=1
+LDP_v 10 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=1 w=1
+
+### Load/store pair — signed offset
+
+STP 00 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 00 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 01 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=0 w=0
+STP 10 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP 10 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 00 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP_v 00 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP_v 01 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP_v 01 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 10 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+LDP_v 10 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+
+### Load/store pair — pre-indexed
+
+STP 00 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP 00 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP 01 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=0 w=1
+STP 10 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+LDP 10 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+STP_v 00 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP_v 00 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+STP_v 01 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+LDP_v 01 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+STP_v 10 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=1
+LDP_v 10 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=1
+
+### Load/store pair — STGP (store allocation tag + pair)
+
+STGP 01 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STGP 01 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STGP 01 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+
### Load/store register — register offset
# GPR
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index bf09e2a6..6c63a0d0 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -122,6 +122,117 @@ static uint64_t load_extend(uint64_t val, int sz, int sign, int ext)
return val;
}
+/*
+ * Load/store pair: STP, LDP, STNP, LDNP, STGP, LDPSW
+ * (DDI 0487 C3.3.14 -- C3.3.16)
+ */
+
+static bool trans_STP(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz; /* 4 or 8 bytes */
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset; /* post-index: unmodified base */
+ uint8_t buf[16]; /* max 2 x 8 bytes */
+
+ uint64_t v1 = gpr_read(ctx, a->rt);
+ uint64_t v2 = gpr_read(ctx, a->rt2);
+ memcpy(buf, &v1, esize);
+ memcpy(buf + esize, &v2, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDP(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz;
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+ uint64_t v1 = 0, v2 = 0;
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ memcpy(&v1, buf, esize);
+ memcpy(&v2, buf + esize, esize);
+
+ /* LDPSW: sign-extend 32-bit values to 64-bit (sign=1, sz=2) */
+ if (a->sign) {
+ v1 = sign_extend(v1, 8 * esize);
+ v2 = sign_extend(v2, 8 * esize);
+ }
+
+ gpr_write(ctx, a->rt, v1);
+ gpr_write(ctx, a->rt2, v2);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/* STGP: tag operation is a NOP for emulation; data stored via STP */
+static bool trans_STGP(DisasContext *ctx, arg_ldstpair *a)
+{
+ return trans_STP(ctx, a);
+}
+
+/*
+ * SIMD/FP load/store pair: STP_v, LDP_v
+ * (DDI 0487 C3.3.14 -- C3.3.16)
+ */
+
+static bool trans_STP_v(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz; /* 4, 8, or 16 bytes */
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[32]; /* max 2 x 16 bytes */
+
+ fpreg_read(ctx, a->rt, buf, esize);
+ fpreg_read(ctx, a->rt2, buf + esize, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDP_v(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz;
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[32];
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ fpreg_write(ctx, a->rt, buf, esize);
+ fpreg_write(ctx, a->rt2, buf + esize, esize);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
/* Load/store single -- immediate (GPR) (DDI 0487 C3.3.8 -- C3.3.13) */
static bool trans_STR_i(DisasContext *ctx, arg_ldst_imm *a)
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 4/6] target/arm/emulate: add load/store exclusive
2026-03-16 2:50 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
` (2 preceding siblings ...)
2026-03-16 2:50 ` [PATCH v4 3/6] target/arm/emulate: add load/store pair Lucas Amaral
@ 2026-03-16 2:50 ` Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 5/6] target/arm/emulate: add atomic, compare-and-swap, and PAC load Lucas Amaral
` (2 subsequent siblings)
6 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-16 2:50 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, peter.maydell, mohamed, Lucas Amaral
Add emulation for load/store exclusive instructions (DDI 0487 C3.3.6).
Exclusive monitors have no meaning on emulated MMIO accesses, so STXR
always reports success (Rs=0) and LDXR does not set a monitor.
Instruction coverage:
- STXR/STLXR: exclusive store, 8/16/32/64-bit
- LDXR/LDAXR: exclusive load, 8/16/32/64-bit
- STXP/STLXP: exclusive store pair, 32/64-bit
- LDXP/LDAXP: exclusive load pair, 32/64-bit
STXP/LDXP use two explicit decode patterns (sz=2, sz=3) for the
32/64-bit size variants.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 22 +++++++++
target/arm/emulate/arm_emulate.c | 74 ++++++++++++++++++++++++++++++
2 files changed, 96 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index f3de3f86..fadf6fd2 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -10,6 +10,9 @@
# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
&ldst_imm rt rn imm sz sign w p unpriv ext u
+# Load/store exclusive
+&stxr rn rt rt2 rs sz lasr
+
# Load/store pair (GPR and SIMD/FP)
&ldstpair rt2 rt rn imm sz sign w p
@@ -18,6 +21,9 @@
### Format templates
+# Exclusives
+@stxr sz:2 ...... ... rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr
+
# Load/store immediate (9-bit signed)
@ldst_imm .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=0
@ldst_imm_pre .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=1
@@ -134,6 +140,22 @@ STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=
LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+### Load/store exclusive
+
+# STXR / STLXR (sz encodes 8/16/32/64-bit)
+STXR .. 001000 000 ..... . ..... ..... ..... @stxr
+
+# LDXR / LDAXR
+LDXR .. 001000 010 ..... . ..... ..... ..... @stxr
+
+# STXP / STLXP (bit[31]=1, bit[30]=sf → sz=2 for 32-bit, sz=3 for 64-bit)
+STXP 10 001000 001 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=2
+STXP 11 001000 001 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=3
+
+# LDXP / LDAXP
+LDXP 10 001000 011 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=2
+LDXP 11 001000 011 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=3
+
### Load/store pair — non-temporal (STNP/LDNP)
# STNP/LDNP: offset only, no writeback. Non-temporal hint ignored.
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index 6c63a0d0..52e41703 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -425,6 +425,80 @@ static bool trans_LDR_v(DisasContext *ctx, arg_ldst *a)
return true;
}
+/*
+ * Load/store exclusive: STXR, LDXR, STXP, LDXP
+ * (DDI 0487 C3.3.6)
+ *
+ * Exclusive monitors have no meaning on MMIO. STXR always reports
+ * success (Rs=0) and LDXR does not set an exclusive monitor.
+ */
+
+static bool trans_STXR(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t val = gpr_read(ctx, a->rt);
+
+ if (mem_write(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ /* Report success -- no exclusive monitor on emulated access */
+ gpr_write(ctx, a->rs, 0);
+ return true;
+}
+
+static bool trans_LDXR(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, val);
+ return true;
+}
+
+static bool trans_STXP(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz; /* sz=2->4, sz=3->8 */
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+
+ uint64_t v1 = gpr_read(ctx, a->rt);
+ uint64_t v2 = gpr_read(ctx, a->rt2);
+ memcpy(buf, &v1, esize);
+ memcpy(buf + esize, &v2, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rs, 0); /* success */
+ return true;
+}
+
+static bool trans_LDXP(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+ uint64_t v1 = 0, v2 = 0;
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ memcpy(&v1, buf, esize);
+ memcpy(&v2, buf + esize, esize);
+ gpr_write(ctx, a->rt, v1);
+ gpr_write(ctx, a->rt2, v2);
+ return true;
+}
+
/* PRFM, DC cache maintenance -- treated as NOP */
static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 5/6] target/arm/emulate: add atomic, compare-and-swap, and PAC load
2026-03-16 2:50 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
` (3 preceding siblings ...)
2026-03-16 2:50 ` [PATCH v4 4/6] target/arm/emulate: add load/store exclusive Lucas Amaral
@ 2026-03-16 2:50 ` Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 6/6] target/arm/hvf, whpx: wire ISV=0 emulation for data aborts Lucas Amaral
2026-03-17 14:27 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Alex Bennée
6 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-16 2:50 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, peter.maydell, mohamed, Lucas Amaral
Add emulation for remaining ISV=0 load/store instruction classes.
Atomic memory operations (DDI 0487 C3.3.2):
- LDADD, LDCLR, LDEOR, LDSET: arithmetic/logic atomics
- LDSMAX, LDSMIN, LDUMAX, LDUMIN: signed/unsigned min/max
- SWP: atomic swap
Non-atomic read-modify-write, sufficient for MMIO where concurrent
access is not a concern. Acquire/release semantics are ignored.
Compare-and-swap (DDI 0487 C3.3.1):
- CAS/CASA/CASAL/CASL: single-register compare-and-swap
- CASP/CASPA/CASPAL/CASPL: register-pair compare-and-swap
CASP validates even register pairs; odd or r31 returns UNHANDLED.
Load with PAC (DDI 0487 C6.2.121):
- LDRAA/LDRAB: pointer-authenticated load, offset/pre-indexed
Pointer authentication is not emulated (equivalent to auth always
succeeding), which is correct for MMIO since PAC is a software
security mechanism, not a memory access semantic.
CASP uses two explicit decode patterns for the 32/64-bit size
variants. LDRA's offset immediate is stored raw in the decode;
the handler scales by << 3.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 45 ++++++
target/arm/emulate/arm_emulate.c | 233 +++++++++++++++++++++++++++++
2 files changed, 278 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index fadf6fd2..9292bfdf 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -16,6 +16,16 @@
# Load/store pair (GPR and SIMD/FP)
&ldstpair rt2 rt rn imm sz sign w p
+# Atomic memory operations
+&atomic rs rn rt a r sz
+
+# Compare-and-swap
+&cas rs rn rt sz a r
+
+# Load with PAC (LDRAA/LDRAB, FEAT_PAuth)
+%ldra_imm 22:s1 12:9
+&ldra rt rn imm m w
+
# Load/store register offset
&ldst rm rn rt sign ext sz opt s
@@ -36,6 +46,15 @@
# Load/store pair: imm7 is signed, scaled by element size in handler
@ldstpair .. ... . ... . imm:s7 rt2:5 rn:5 rt:5 &ldstpair
+# Atomics
+@atomic sz:2 ... . .. a:1 r:1 . rs:5 . ... .. rn:5 rt:5 &atomic
+
+# Compare-and-swap: sz extracted by pattern (CAS) or set constant (CASP)
+@cas .. ...... . a:1 . rs:5 r:1 ..... rn:5 rt:5 &cas
+
+# Load with PAC
+@ldra .. ... . .. m:1 . . ......... w:1 . rn:5 rt:5 &ldra imm=%ldra_imm
+
# Load/store register offset
@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
@@ -241,6 +260,32 @@ STR_v 00 111 1 00 10 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=
LDR_v sz:2 111 1 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
LDR_v 00 111 1 00 11 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+### Compare-and-swap
+
+# CAS / CASA / CASAL / CASL
+CAS sz:2 001000 1 . 1 ..... . 11111 ..... ..... @cas
+
+# CASP / CASPA / CASPAL / CASPL (pair: Rt,Rt+1 and Rs,Rs+1)
+CASP 00 001000 0 . 1 ..... . 11111 ..... ..... @cas sz=2
+CASP 01 001000 0 . 1 ..... . 11111 ..... ..... @cas sz=3
+
+### Atomic memory operations
+
+LDADD .. 111 0 00 . . 1 ..... 0000 00 ..... ..... @atomic
+LDCLR .. 111 0 00 . . 1 ..... 0001 00 ..... ..... @atomic
+LDEOR .. 111 0 00 . . 1 ..... 0010 00 ..... ..... @atomic
+LDSET .. 111 0 00 . . 1 ..... 0011 00 ..... ..... @atomic
+LDSMAX .. 111 0 00 . . 1 ..... 0100 00 ..... ..... @atomic
+LDSMIN .. 111 0 00 . . 1 ..... 0101 00 ..... ..... @atomic
+LDUMAX .. 111 0 00 . . 1 ..... 0110 00 ..... ..... @atomic
+LDUMIN .. 111 0 00 . . 1 ..... 0111 00 ..... ..... @atomic
+SWP .. 111 0 00 . . 1 ..... 1000 00 ..... ..... @atomic
+
+### Load with PAC (FEAT_PAuth)
+
+# LDRAA (M=0) / LDRAB (M=1), offset (W=0) / pre-indexed (W=1)
+LDRA 11 111 0 00 . . 1 ......... . 1 ..... ..... @ldra
+
### System instructions — DC cache maintenance
# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index 52e41703..44a559ad 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -499,6 +499,239 @@ static bool trans_LDXP(DisasContext *ctx, arg_stxr *a)
return true;
}
+/*
+ * Atomic memory operations (DDI 0487 C3.3.2)
+ *
+ * Non-atomic read-modify-write; sufficient for MMIO.
+ * Acquire/release semantics ignored (sequentially consistent by design).
+ */
+
+typedef uint64_t (*atomic_op_fn)(uint64_t old, uint64_t operand, int bits);
+
+static uint64_t atomic_add(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old + op;
+}
+
+static uint64_t atomic_clr(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old & ~op;
+}
+
+static uint64_t atomic_eor(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old ^ op;
+}
+
+static uint64_t atomic_set(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old | op;
+}
+
+static uint64_t atomic_smax(uint64_t old, uint64_t op, int bits)
+{
+ int64_t a = sign_extend(old, bits);
+ int64_t b = sign_extend(op, bits);
+ return (a >= b) ? old : op;
+}
+
+static uint64_t atomic_smin(uint64_t old, uint64_t op, int bits)
+{
+ int64_t a = sign_extend(old, bits);
+ int64_t b = sign_extend(op, bits);
+ return (a <= b) ? old : op;
+}
+
+static uint64_t atomic_umax(uint64_t old, uint64_t op, int bits)
+{
+ uint64_t mask = (bits == 64) ? UINT64_MAX : (1ULL << bits) - 1;
+ return ((old & mask) >= (op & mask)) ? old : op;
+}
+
+static uint64_t atomic_umin(uint64_t old, uint64_t op, int bits)
+{
+ uint64_t mask = (bits == 64) ? UINT64_MAX : (1ULL << bits) - 1;
+ return ((old & mask) <= (op & mask)) ? old : op;
+}
+
+static bool do_atomic(DisasContext *ctx, arg_atomic *a, atomic_op_fn fn)
+{
+ int esize = 1 << a->sz;
+ int bits = 8 * esize;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t old = 0;
+
+ if (mem_read(ctx, va, &old, esize) != 0) {
+ return true;
+ }
+
+ uint64_t operand = gpr_read(ctx, a->rs);
+ uint64_t result = fn(old, operand, bits);
+
+ if (mem_write(ctx, va, &result, esize) != 0) {
+ return true;
+ }
+
+ /* Rt receives the old value (before modification) */
+ gpr_write(ctx, a->rt, old);
+ return true;
+}
+
+static bool trans_LDADD(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_add);
+}
+
+static bool trans_LDCLR(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_clr);
+}
+
+static bool trans_LDEOR(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_eor);
+}
+
+static bool trans_LDSET(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_set);
+}
+
+static bool trans_LDSMAX(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_smax);
+}
+
+static bool trans_LDSMIN(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_smin);
+}
+
+static bool trans_LDUMAX(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_umax);
+}
+
+static bool trans_LDUMIN(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_umin);
+}
+
+static bool trans_SWP(DisasContext *ctx, arg_atomic *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t old = 0;
+
+ if (mem_read(ctx, va, &old, esize) != 0) {
+ return true;
+ }
+
+ uint64_t newval = gpr_read(ctx, a->rs);
+ if (mem_write(ctx, va, &newval, esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, old);
+ return true;
+}
+
+/* Compare-and-swap: CAS, CASP (DDI 0487 C3.3.1) */
+
+static bool trans_CAS(DisasContext *ctx, arg_cas *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t current = 0;
+
+ if (mem_read(ctx, va, ¤t, esize) != 0) {
+ return true;
+ }
+
+ uint64_t mask = (esize == 8) ? UINT64_MAX : (1ULL << (8 * esize)) - 1;
+ uint64_t compare = gpr_read(ctx, a->rs) & mask;
+
+ if ((current & mask) == compare) {
+ uint64_t newval = gpr_read(ctx, a->rt) & mask;
+ if (mem_write(ctx, va, &newval, esize) != 0) {
+ return true;
+ }
+ }
+
+ /* Rs receives the old memory value (whether or not swap occurred) */
+ gpr_write(ctx, a->rs, current);
+ return true;
+}
+
+/* CASP: compare-and-swap pair (Rs,Rs+1 compared; Rt,Rt+1 stored) */
+static bool trans_CASP(DisasContext *ctx, arg_cas *a)
+{
+ /* CASP requires even register pairs; odd or r31 is UNPREDICTABLE */
+ if ((a->rs & 1) || a->rs >= 31 || (a->rt & 1) || a->rt >= 31) {
+ return false;
+ }
+
+ int esize = 1 << a->sz; /* per-register size */
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+ uint64_t cur1 = 0, cur2 = 0;
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ memcpy(&cur1, buf, esize);
+ memcpy(&cur2, buf + esize, esize);
+
+ uint64_t mask = (esize == 8) ? UINT64_MAX : (1ULL << (8 * esize)) - 1;
+ uint64_t cmp1 = gpr_read(ctx, a->rs) & mask;
+ uint64_t cmp2 = gpr_read(ctx, a->rs + 1) & mask;
+
+ if ((cur1 & mask) == cmp1 && (cur2 & mask) == cmp2) {
+ uint64_t new1 = gpr_read(ctx, a->rt) & mask;
+ uint64_t new2 = gpr_read(ctx, a->rt + 1) & mask;
+ memcpy(buf, &new1, esize);
+ memcpy(buf + esize, &new2, esize);
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ }
+
+ gpr_write(ctx, a->rs, cur1);
+ gpr_write(ctx, a->rs + 1, cur2);
+ return true;
+}
+
+/*
+ * Load with PAC: LDRAA / LDRAB (FEAT_PAuth)
+ * (DDI 0487 C6.2.121)
+ *
+ * Pointer authentication is not emulated -- the base register is used
+ * directly (equivalent to auth always succeeding).
+ */
+
+static bool trans_LDRA(DisasContext *ctx, arg_ldra *a)
+{
+ int64_t offset = (int64_t)a->imm << 3; /* S:imm9, scaled by 8 */
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = base + offset; /* auth not emulated */
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, 8) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, val);
+
+ if (a->w) {
+ base_write(ctx, a->rn, va);
+ }
+ return true;
+}
+
/* PRFM, DC cache maintenance -- treated as NOP */
static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 6/6] target/arm/hvf, whpx: wire ISV=0 emulation for data aborts
2026-03-16 2:50 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
` (4 preceding siblings ...)
2026-03-16 2:50 ` [PATCH v4 5/6] target/arm/emulate: add atomic, compare-and-swap, and PAC load Lucas Amaral
@ 2026-03-16 2:50 ` Lucas Amaral
2026-03-17 14:27 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Alex Bennée
6 siblings, 0 replies; 25+ messages in thread
From: Lucas Amaral @ 2026-03-16 2:50 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, agraf, peter.maydell, mohamed, Lucas Amaral
When a data abort with ISV=0 occurs during MMIO emulation, the
syndrome register does not carry the access size or target register.
Previously this hit an assert(isv) and killed the VM.
Replace the assert with instruction fetch + decode + emulate using the
shared library in target/arm/emulate/. The faulting instruction is read
from guest memory via cpu_memory_rw_debug(), decoded by the decodetree-
generated decoder, and emulated against the vCPU register file.
Both HVF (macOS) and WHPX (Windows Hyper-V) use the same pattern:
1. cpu_synchronize_state() to flush hypervisor registers
2. Fetch 4-byte instruction at env->pc
3. arm_emul_insn(env, insn)
4. On success, advance PC past the emulated instruction
If the instruction is unhandled or a memory error occurs, a synchronous
external abort is injected into the guest via syn_data_abort_no_iss()
with fnv=1 and fsc=0x10, matching the syndrome that KVM uses in
kvm_inject_arm_sea(). The guest kernel's fault handler then reports
the error through its normal data abort path.
WHPX adds a whpx_inject_data_abort() helper and adjusts the
whpx_handle_mmio() return convention so the caller skips PC advancement
when an exception has been injected.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/hvf/hvf.c | 46 ++++++++++++++++++++++++++--
target/arm/whpx/whpx-all.c | 61 +++++++++++++++++++++++++++++++++++++-
2 files changed, 103 insertions(+), 4 deletions(-)
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 5fc8f6bb..000e54bd 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -32,6 +32,7 @@
#include "arm-powerctl.h"
#include "target/arm/cpu.h"
#include "target/arm/internals.h"
+#include "emulate/arm_emulate.h"
#include "target/arm/multiprocessing.h"
#include "target/arm/gtimer.h"
#include "target/arm/trace.h"
@@ -2175,10 +2176,49 @@ static int hvf_handle_exception(CPUState *cpu, hv_vcpu_exit_exception_t *excp)
assert(!s1ptw);
/*
- * TODO: ISV will be 0 for SIMD or SVE accesses.
- * Inject the exception into the guest.
+ * ISV=0: syndrome doesn't carry access size/register info.
+ * Fetch and emulate via target/arm/emulate/.
*/
- assert(isv);
+ if (!isv) {
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ uint32_t insn;
+ ArmEmulResult r;
+
+ cpu_synchronize_state(cpu);
+
+ if (cpu_memory_rw_debug(cpu, env->pc,
+ (uint8_t *)&insn, 4, false) != 0) {
+ bool same_el = arm_current_el(env) == 1;
+ uint32_t esr = syn_data_abort_no_iss(same_el,
+ 1, 0, 0, 0, iswrite, 0x10);
+
+ error_report("HVF: cannot read insn at pc=0x%" PRIx64,
+ (uint64_t)env->pc);
+ env->exception.vaddress = excp->virtual_address;
+ hvf_raise_exception(cpu, EXCP_DATA_ABORT, esr, 1);
+ break;
+ }
+
+ r = arm_emul_insn(env, insn);
+ if (r == ARM_EMUL_UNHANDLED || r == ARM_EMUL_ERR_MEM) {
+ bool same_el = arm_current_el(env) == 1;
+ uint32_t esr = syn_data_abort_no_iss(same_el,
+ 1, 0, 0, 0, iswrite, 0x10);
+
+ error_report("HVF: ISV=0 %s insn 0x%08x at "
+ "pc=0x%" PRIx64 ", injecting data abort",
+ r == ARM_EMUL_UNHANDLED ? "unhandled"
+ : "memory error",
+ insn, (uint64_t)env->pc);
+ env->exception.vaddress = excp->virtual_address;
+ hvf_raise_exception(cpu, EXCP_DATA_ABORT, esr, 1);
+ break;
+ }
+
+ advance_pc = true;
+ break;
+ }
/*
* Emulate MMIO.
diff --git a/target/arm/whpx/whpx-all.c b/target/arm/whpx/whpx-all.c
index 513551be..0c04073e 100644
--- a/target/arm/whpx/whpx-all.c
+++ b/target/arm/whpx/whpx-all.c
@@ -29,6 +29,7 @@
#include "syndrome.h"
#include "target/arm/cpregs.h"
#include "internals.h"
+#include "emulate/arm_emulate.h"
#include "system/whpx-internal.h"
#include "system/whpx-accel-ops.h"
@@ -352,6 +353,27 @@ static void whpx_set_gp_reg(CPUState *cpu, int rt, uint64_t val)
whpx_set_reg(cpu, reg, reg_val);
}
+/*
+ * Inject a synchronous external abort (data abort) into the guest.
+ * Used when ISV=0 instruction emulation fails. Matches the syndrome
+ * that KVM uses in kvm_inject_arm_sea().
+ */
+static void whpx_inject_data_abort(CPUState *cpu, bool iswrite)
+{
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ bool same_el = arm_current_el(env) == 1;
+ uint32_t esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, iswrite, 0x10);
+
+ cpu->exception_index = EXCP_DATA_ABORT;
+ env->exception.target_el = 1;
+ env->exception.syndrome = esr;
+
+ bql_lock();
+ arm_cpu_do_interrupt(cpu);
+ bql_unlock();
+}
+
static int whpx_handle_mmio(CPUState *cpu, WHV_MEMORY_ACCESS_CONTEXT *ctx)
{
uint64_t syndrome = ctx->Syndrome;
@@ -366,7 +388,40 @@ static int whpx_handle_mmio(CPUState *cpu, WHV_MEMORY_ACCESS_CONTEXT *ctx)
uint64_t val = 0;
assert(!cm);
- assert(isv);
+
+ /*
+ * ISV=0: syndrome doesn't carry access size/register info.
+ * Fetch and decode the faulting instruction via the emulation library.
+ */
+ if (!isv) {
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ uint32_t insn;
+ ArmEmulResult r;
+
+ cpu_synchronize_state(cpu);
+
+ if (cpu_memory_rw_debug(cpu, env->pc,
+ (uint8_t *)&insn, 4, false) != 0) {
+ error_report("WHPX: cannot read insn at pc=0x%" PRIx64,
+ (uint64_t)env->pc);
+ whpx_inject_data_abort(cpu, iswrite);
+ return 1;
+ }
+
+ r = arm_emul_insn(env, insn);
+ if (r == ARM_EMUL_UNHANDLED || r == ARM_EMUL_ERR_MEM) {
+ error_report("WHPX: ISV=0 %s insn 0x%08x at "
+ "pc=0x%" PRIx64 ", injecting data abort",
+ r == ARM_EMUL_UNHANDLED ? "unhandled"
+ : "memory error",
+ insn, (uint64_t)env->pc);
+ whpx_inject_data_abort(cpu, iswrite);
+ return 1;
+ }
+
+ return 0;
+ }
if (iswrite) {
val = whpx_get_gp_reg(cpu, srt);
@@ -451,6 +506,10 @@ int whpx_vcpu_run(CPUState *cpu)
}
ret = whpx_handle_mmio(cpu, &vcpu->exit_ctx.MemoryAccess);
+ if (ret > 0) {
+ advance_pc = false;
+ ret = 0;
+ }
break;
case WHvRunVpExitReasonCanceled:
cpu->exception_index = EXCP_INTERRUPT;
--
2.52.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library
2026-03-16 2:50 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
` (5 preceding siblings ...)
2026-03-16 2:50 ` [PATCH v4 6/6] target/arm/hvf, whpx: wire ISV=0 emulation for data aborts Lucas Amaral
@ 2026-03-17 14:27 ` Alex Bennée
6 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2026-03-17 14:27 UTC (permalink / raw)
To: Lucas Amaral; +Cc: qemu-devel, qemu-arm, agraf, peter.maydell, mohamed
Lucas Amaral <lucaaamaral@gmail.com> writes:
> Add a shared emulation library for AArch64 load/store instructions that
> cause ISV=0 data aborts under hardware virtualization, and wire it into
> HVF (macOS) and WHPX (Windows).
FYI posting follow-up versions as reply to existing threads is likely to
hide your series from the patchew tooling and possibly the maintainers
as it hides in the old threads.
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 1/6] target/arm/emulate: add ISV=0 emulation library with load/store immediate
2026-03-16 2:50 ` [PATCH v4 1/6] target/arm/emulate: add ISV=0 emulation library with load/store immediate Lucas Amaral
@ 2026-03-19 22:00 ` Richard Henderson
0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2026-03-19 22:00 UTC (permalink / raw)
To: qemu-devel
On 3/16/26 15:50, Lucas Amaral wrote:
> +typedef struct {
> + CPUState *cpu;
> + CPUARMState *env;
> + ArmEmulResult result;
> +} DisasContext;
...
> +ArmEmulResult arm_emul_insn(CPUArchState *env, uint32_t insn)
> +{
> + DisasContext ctx = {
> + .cpu = env_cpu(env),
> + .env = env,
The env_cpu function is trivial pointer arithmetic.
Put the one that's used more into DisasContext and use env_cpu or cpu_env inline to get to
the other.
> diff --git a/target/arm/emulate/meson.build b/target/arm/emulate/meson.build
> new file mode 100644
> index 00000000..c0b38dd1
> --- /dev/null
> +++ b/target/arm/emulate/meson.build
> @@ -0,0 +1,6 @@
> +gen_a64_ldst = decodetree.process('a64-ldst.decode',
> + extra_args: ['--static-decode=decode_a64_ldst'])
> +
> +arm_common_system_ss.add(when: 'TARGET_AARCH64', if_true: [
> + gen_a64_ldst, files('arm_emulate.c')
> +])
Do we really want to include this emulation when the host virtualization won't use it?
I'm sure Kconfig can be used to select it from the relevant virt configs.
r~
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2026-03-19 22:01 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 21:48 [PATCH] target/arm/hvf: emulate ISV=0 data abort instructions Lucas Amaral
2026-03-10 1:28 ` Mohamed Mediouni
2026-03-10 9:23 ` Peter Maydell
2026-03-13 2:18 ` [PATCH v2 0/3] target/arm: ISV=0 data abort emulation library Lucas Amaral
2026-03-13 2:18 ` [PATCH v2 1/3] target/arm: add AArch64 ISV=0 instruction " Lucas Amaral
2026-03-13 6:33 ` Mohamed Mediouni
2026-03-13 8:59 ` Peter Maydell
2026-03-13 2:18 ` [PATCH v2 2/3] tests: add unit tests for ISV=0 " Lucas Amaral
2026-03-13 2:18 ` [PATCH v2 3/3] target/arm: wire ISV=0 emulation into HVF and WHPX Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 1/6] target/arm/emulate: add ISV=0 emulation library with load/store immediate Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 2/6] target/arm/emulate: add load/store register offset Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 3/6] target/arm/emulate: add load/store pair Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 4/6] target/arm/emulate: add load/store exclusive Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 5/6] target/arm/emulate: add atomic, compare-and-swap, and PAC load Lucas Amaral
2026-03-15 3:41 ` [PATCH v3 6/6] target/arm/hvf, whpx: wire ISV=0 emulation for data aborts Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 1/6] target/arm/emulate: add ISV=0 emulation library with load/store immediate Lucas Amaral
2026-03-19 22:00 ` Richard Henderson
2026-03-16 2:50 ` [PATCH v4 2/6] target/arm/emulate: add load/store register offset Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 3/6] target/arm/emulate: add load/store pair Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 4/6] target/arm/emulate: add load/store exclusive Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 5/6] target/arm/emulate: add atomic, compare-and-swap, and PAC load Lucas Amaral
2026-03-16 2:50 ` [PATCH v4 6/6] target/arm/hvf, whpx: wire ISV=0 emulation for data aborts Lucas Amaral
2026-03-17 14:27 ` [PATCH v4 0/6] target/arm: ISV=0 data abort emulation library Alex Bennée
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox