[PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
@ 2025-10-14 10:24 Peter Maydell
  2025-10-14 10:41 ` Salil Mehta via
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Peter Maydell @ 2025-10-14 10:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: Salil Mehta, Marc Zyngier

Currently in arm_gicv3_icc_reset() we read the kernel's value of
ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly
works, but we're actually breaking an assumption the kernel makes
that userspace only accesses the in-kernel GIC data when the VM is
totally paused, which may not be the case if a single vCPU is being
reset.  The effect is that it's possible that the read attempt
returns EBUSY.

Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1
once in device realize. This brings ICC_CTLR_EL1 into line with
the other cpuif registers, where we assume we know what the kernel
is resetting them to and just update QEMU's data structures in
arm_gicv3_icc_reset().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
I've only tested this fairly lightly, but it seems to work.
Salil, does this fix the EBUSY issues you were seeing ?

 include/hw/intc/arm_gicv3_common.h |  3 ++
 hw/intc/arm_gicv3_kvm.c            | 49 +++++++++++++++++++++---------
 2 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
index 38aa1961c50..61d51915e07 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -166,6 +166,9 @@ struct GICv3CPUState {
     uint64_t icc_igrpen[3];
     uint64_t icc_ctlr_el3;
 
+    /* For KVM, cached copy of the kernel reset value of ICC_CTLR_EL1 */
+    uint64_t kvm_reset_icc_ctlr_el1;
+
     /* Virtualization control interface */
     uint64_t ich_apr[3][4]; /* ich_apr[GICV3_G1][x] never used */
     uint64_t ich_hcr_el2;
diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index 9829e2146da..b95e6ea057a 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -666,11 +666,24 @@ static void kvm_arm_gicv3_get(GICv3State *s)
 
 static void arm_gicv3_icc_reset(CPUARMState *env, const ARMCPRegInfo *ri)
 {
-    GICv3State *s;
-    GICv3CPUState *c;
+    GICv3CPUState *c = (GICv3CPUState *)env->gicv3state;
 
-    c = (GICv3CPUState *)env->gicv3state;
-    s = c->gic;
+    /*
+     * This function is called when each vcpu resets. The kernel
+     * API for the GIC assumes that it is only to be used when the
+     * whole VM is paused, so if we attempt to read the kernel's
+     * reset values here we might get EBUSY failures.
+     * So instead we assume we know what the kernel's reset values
+     * are (mostly zeroes) and only update the QEMU state struct
+     * fields. The exception is that we do need to know the kernel's
+     * idea of the ICC_CTLR_EL1 reset value, so we cache that at
+     * device realize time.
+     *
+     * This makes these sysregs different from the usual CPU ones,
+     * which can be validly read and written when only the single
+     * vcpu they apply to is paused, and where (in target/arm code)
+     * we read the reset values out of the kernel on every reset.
+     */
 
     c->icc_pmr_el1 = 0;
     /*
@@ -691,16 +704,8 @@ static void arm_gicv3_icc_reset(CPUARMState *env, const ARMCPRegInfo *ri)
     memset(c->icc_apr, 0, sizeof(c->icc_apr));
     memset(c->icc_igrpen, 0, sizeof(c->icc_igrpen));
 
-    if (s->migration_blocker) {
-        return;
-    }
-
-    /* Initialize to actual HW supported configuration */
-    kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
-                      KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
-                      &c->icc_ctlr_el1[GICV3_NS], false, &error_abort);
-
-    c->icc_ctlr_el1[GICV3_S] = c->icc_ctlr_el1[GICV3_NS];
+    c->icc_ctlr_el1[GICV3_NS] = c->kvm_reset_icc_ctlr_el1;
+    c->icc_ctlr_el1[GICV3_S] = c->kvm_reset_icc_ctlr_el1;
 }
 
 static void kvm_arm_gicv3_reset_hold(Object *obj, ResetType type)
@@ -939,6 +944,22 @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
                                     kvm_arm_gicv3_notifier,
                                     MIG_MODE_CPR_TRANSFER);
     }
+
+    /*
+     * Now we can read the kernel's initial value of ICC_CTLR_EL1, which
+     * we will need if a CPU interface is reset. If the kernel is ancient
+     * and doesn't support writing the GIC state then we don't need to
+     * care what reset does to QEMU's data structures.
+     */
+    if (!s->migration_blocker) {
+        for (i = 0; i < s->num_cpu; i++) {
+            GICv3CPUState *c = &s->cpu[i];
+
+            kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
+                              KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
+                              &c->kvm_reset_icc_ctlr_el1, false, &error_abort);
+        }
+    }
 }
 
 static void kvm_arm_gicv3_class_init(ObjectClass *klass, const void *data)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 10:24 [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset Peter Maydell
@ 2025-10-14 10:41 ` Salil Mehta via
  2025-10-14 13:23   ` Salil Mehta via
  2025-10-15 10:58 ` Salil Mehta via
  2025-10-16 12:17 ` Salil Mehta via
  2 siblings, 1 reply; 32+ messages in thread
From: Salil Mehta via @ 2025-10-14 10:41 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel@nongnu.org; +Cc: Salil Mehta, Marc Zyngier

Hi Peter,

> From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Peter
> Maydell
> Sent: Tuesday, October 14, 2025 11:25 AM
> To: qemu-devel@nongnu.org
> 
> Currently in arm_gicv3_icc_reset() we read the kernel's value of
> ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly works, but
> we're actually breaking an assumption the kernel makes that userspace only
> accesses the in-kernel GIC data when the VM is totally paused, which may
> not be the case if a single vCPU is being reset.  The effect is that it's possible
> that the read attempt returns EBUSY.
> 
> Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1 once in
> device realize. This brings ICC_CTLR_EL1 into line with the other cpuif
> registers, where we assume we know what the kernel is resetting them to
> and just update QEMU's data structures in arm_gicv3_icc_reset().
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> I've only tested this fairly lightly, but it seems to work.
> Salil, does this fix the EBUSY issues you were seeing ?


Let me try this and get back to you.  Also, just to let you know that -EBUSY
can return from other places as well. Please check  my reply in the other
mail-chain.

Many thanks!

Best regards
Salil.

> 
>  include/hw/intc/arm_gicv3_common.h |  3 ++
>  hw/intc/arm_gicv3_kvm.c            | 49 +++++++++++++++++++++---------
>  2 files changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/include/hw/intc/arm_gicv3_common.h
> b/include/hw/intc/arm_gicv3_common.h
> index 38aa1961c50..61d51915e07 100644
> --- a/include/hw/intc/arm_gicv3_common.h
> +++ b/include/hw/intc/arm_gicv3_common.h
> @@ -166,6 +166,9 @@ struct GICv3CPUState {
>      uint64_t icc_igrpen[3];
>      uint64_t icc_ctlr_el3;
> 
> +    /* For KVM, cached copy of the kernel reset value of ICC_CTLR_EL1 */
> +    uint64_t kvm_reset_icc_ctlr_el1;
> +
>      /* Virtualization control interface */
>      uint64_t ich_apr[3][4]; /* ich_apr[GICV3_G1][x] never used */
>      uint64_t ich_hcr_el2;
> diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c index
> 9829e2146da..b95e6ea057a 100644
> --- a/hw/intc/arm_gicv3_kvm.c
> +++ b/hw/intc/arm_gicv3_kvm.c
> @@ -666,11 +666,24 @@ static void kvm_arm_gicv3_get(GICv3State *s)
> 
>  static void arm_gicv3_icc_reset(CPUARMState *env, const ARMCPRegInfo
> *ri)  {
> -    GICv3State *s;
> -    GICv3CPUState *c;
> +    GICv3CPUState *c = (GICv3CPUState *)env->gicv3state;
> 
> -    c = (GICv3CPUState *)env->gicv3state;
> -    s = c->gic;
> +    /*
> +     * This function is called when each vcpu resets. The kernel
> +     * API for the GIC assumes that it is only to be used when the
> +     * whole VM is paused, so if we attempt to read the kernel's
> +     * reset values here we might get EBUSY failures.
> +     * So instead we assume we know what the kernel's reset values
> +     * are (mostly zeroes) and only update the QEMU state struct
> +     * fields. The exception is that we do need to know the kernel's
> +     * idea of the ICC_CTLR_EL1 reset value, so we cache that at
> +     * device realize time.
> +     *
> +     * This makes these sysregs different from the usual CPU ones,
> +     * which can be validly read and written when only the single
> +     * vcpu they apply to is paused, and where (in target/arm code)
> +     * we read the reset values out of the kernel on every reset.
> +     */
> 
>      c->icc_pmr_el1 = 0;
>      /*
> @@ -691,16 +704,8 @@ static void arm_gicv3_icc_reset(CPUARMState *env,
> const ARMCPRegInfo *ri)
>      memset(c->icc_apr, 0, sizeof(c->icc_apr));
>      memset(c->icc_igrpen, 0, sizeof(c->icc_igrpen));
> 
> -    if (s->migration_blocker) {
> -        return;
> -    }
> -
> -    /* Initialize to actual HW supported configuration */
> -    kvm_device_access(s->dev_fd,
> KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
> -                      KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
> -                      &c->icc_ctlr_el1[GICV3_NS], false, &error_abort);
> -
> -    c->icc_ctlr_el1[GICV3_S] = c->icc_ctlr_el1[GICV3_NS];
> +    c->icc_ctlr_el1[GICV3_NS] = c->kvm_reset_icc_ctlr_el1;
> +    c->icc_ctlr_el1[GICV3_S] = c->kvm_reset_icc_ctlr_el1;
>  }
> 
>  static void kvm_arm_gicv3_reset_hold(Object *obj, ResetType type) @@ -
> 939,6 +944,22 @@ static void kvm_arm_gicv3_realize(DeviceState *dev,
> Error **errp)
>                                      kvm_arm_gicv3_notifier,
>                                      MIG_MODE_CPR_TRANSFER);
>      }
> +
> +    /*
> +     * Now we can read the kernel's initial value of ICC_CTLR_EL1, which
> +     * we will need if a CPU interface is reset. If the kernel is ancient
> +     * and doesn't support writing the GIC state then we don't need to
> +     * care what reset does to QEMU's data structures.
> +     */
> +    if (!s->migration_blocker) {
> +        for (i = 0; i < s->num_cpu; i++) {
> +            GICv3CPUState *c = &s->cpu[i];
> +
> +            kvm_device_access(s->dev_fd,
> KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
> +                              KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
> +                              &c->kvm_reset_icc_ctlr_el1, false, &error_abort);
> +        }
> +    }
>  }
> 
>  static void kvm_arm_gicv3_class_init(ObjectClass *klass, const void *data)
> --
> 2.43.0
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 10:41 ` Salil Mehta via
@ 2025-10-14 13:23   ` Salil Mehta via
  2025-10-14 13:31     ` Peter Maydell
  0 siblings, 1 reply; 32+ messages in thread
From: Salil Mehta via @ 2025-10-14 13:23 UTC (permalink / raw)
  To: Salil Mehta, Peter Maydell, qemu-devel@nongnu.org
  Cc: Salil Mehta, Marc Zyngier

Hi Peter,

> From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Salil
> Mehta via
> Sent: Tuesday, October 14, 2025 11:41 AM
> To: Peter Maydell <peter.maydell@linaro.org>; qemu-devel@nongnu.org
> 
> Hi Peter,
> 
> > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org
> <qemu-
> > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Peter
> > Maydell
> > Sent: Tuesday, October 14, 2025 11:25 AM
> > To: qemu-devel@nongnu.org
> >
> > Currently in arm_gicv3_icc_reset() we read the kernel's value of
> > ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly
> > works, but we're actually breaking an assumption the kernel makes that
> > userspace only accesses the in-kernel GIC data when the VM is totally
> > paused, which may not be the case if a single vCPU is being reset.
> > The effect is that it's possible that the read attempt returns EBUSY.
> >
> > Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1
> > once in device realize. This brings ICC_CTLR_EL1 into line with the
> > other cpuif registers, where we assume we know what the kernel is
> > resetting them to and just update QEMU's data structures in
> arm_gicv3_icc_reset().
> >
> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > ---
> > I've only tested this fairly lightly, but it seems to work.
> > Salil, does this fix the EBUSY issues you were seeing ?
> 
> 
> Let me try this and get back to you.  Also, just to let you know that -EBUSY
> can return from other places as well. Please check  my reply in the other mail-
> chain.


Got this.

(gdb) handle SIGUSR1 nostop noprint pass
Signal        Stop      Print   Pass to program Description
SIGUSR1       No        No      Yes             User defined signal 1
(gdb) run
Starting program:
/opt/workspace/code/qemu/qemu/build/qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 -cpu host -smp cpus=2,disabledcpus=2 -m 300M -kernel /opt/workspace/code/linux/linux/arch/arm64/boot/Image
-initrd /opt/workspace/code/filesystem/rootfs.cpio.gz -append console=ttyAMA0\ root=/dev/ram\ earlycon\ rdinit=/init\ maxcpus=1\ acpi=force -nographic -bios /opt/workspace/code/uefi/edk2/Build/ArmVirtQemu-AARCH64/RELEASE_GCC5/FV/QEMU_EFI.fd
[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0xfffff5b5eb40 (LWP 31994)]
[New Thread 0xfffff4e88b40 (LWP 31996)]
[New Thread 0xffffd4dfeb40 (LWP 31997)]
Unexpected error in kvm_device_access() at ../accel/kvm/kvm-all.c:3475:
qemu-system-aarch64: KVM_GET_DEVICE_ATTR failed: Group 6 attr
0x000000000000c664: Inappropriate ioctl for device

Thread 1 "qemu-system-aar" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=281474841870368, signo=signo@entry=6, no_tid=no_tid@entry=0) at
./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=281474841870368, signo=signo@entry=6, no_tid=no_tid@entry=0) at
./nptl/pthread_kill.c:44
#1  0x0000fffff6ee2054 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x0000fffff6e9a83c in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#3  0x0000fffff6e87134 in __GI_abort () at ./stdlib/abort.c:79
#4  0x0000aaaaabc84a98 in error_handle (errp=0xaaaaad20b720 <error_abort>, err=0xaaaaad7e1a80) at ../util/error.c:38
#5  0x0000aaaaabc84c74 in error_setv
    (errp=0xaaaaad20b720 <error_abort>, src=0xaaaaabee91d0 "../accel/kvm/kvm-all.c", line=3475, func=0xaaaaabeea5a8 <__func__.13> "kvm_device_access", err_class=ERROR_CLASS_GENERIC_ERROR,
fmt=0xaaaaabee9e60 "KVM_%s_DEVICE_ATTR failed: Group %d attr 0x%016lx", ap=..., suffix=0xfffff6fb3570 "Inappropriate ioctl for
device") at ../util/error.c:80
#6  0x0000aaaaabc84fdc in error_setg_errno_internal
    (errp=0xaaaaad20b720 <error_abort>, src=0xaaaaabee91d0 "../accel/kvm/kvm-all.c", line=3475, func=0xaaaaabeea5a8 <__func__.13> "kvm_device_access", os_errno=25, fmt=0xaaaaabee9e60 "KVM_%s_DEVICE_ATTR failed: Group %d attr 0x%016lx") at
../util/error.c:115
#7  0x0000aaaaaba1c2b0 in kvm_device_access (fd=0, group=6, attr=50788, val=0xaaaaad7b18f8, write=false, errp=0xaaaaad20b720
<error_abort>) at ../accel/kvm/kvm-all.c:3475
#8  0x0000aaaaab98d204 in kvm_arm_gicv3_realize (dev=0xaaaaad7ac930,
errp=0xffffffffea00) at ../hw/intc/arm_gicv3_kvm.c:938
#9  0x0000aaaaaba27584 in device_set_realized (obj=0xaaaaad7ac930, value=true, errp=0xffffffffeaf8) at ../hw/core/qdev.c:599
#10 0x0000aaaaaba32c78 in property_set_bool (obj=0xaaaaad7ac930, v=0xaaaaad7930c0, name=0xaaaaabef01a0 "realized", opaque=0xaaaaad302430, errp=0xffffffffeaf8) at ../qom/object.c:2375
#11 0x0000aaaaaba302b4 in object_property_set (obj=0xaaaaad7ac930,
name=0xaaaaabef01a0 "realized", v=0xaaaaad7930c0, errp=0xffffffffeaf8) at ../qom/object.c:1450
#12 0x0000aaaaaba36a78 in object_property_set_qobject (obj=0xaaaaad7ac930, name=0xaaaaabef01a0 "realized", value=0xaaaaad793200, errp=0xaaaaad20b728 <error_fatal>)
    at ../qom/qom-qobject.c:28
#13 0x0000aaaaaba306b8 in object_property_set_bool (obj=0xaaaaad7ac930, name=0xaaaaabef01a0 "realized", value=true,
errp=0xaaaaad20b728 <error_fatal>) at ../qom/object.c:1520
#14 0x0000aaaaaba268a4 in qdev_realize (dev=0xaaaaad7ac930, bus=0xaaaaad68cb20, errp=0xaaaaad20b728 <error_fatal>) at
../hw/core/qdev.c:297
#15 0x0000aaaaaba268e8 in qdev_realize_and_unref (dev=0xaaaaad7ac930, bus=0xaaaaad68cb20, errp=0xaaaaad20b728 <error_fatal>) at
../hw/core/qdev.c:304
#16 0x0000aaaaaaf8fbfc in sysbus_realize_and_unref (dev=0xaaaaad7ac930, errp=0xaaaaad20b728 <error_fatal>) at
../hw/core/sysbus.c:254
#17 0x0000aaaaab6375dc in create_gic (vms=0xaaaaad6849f0,
mem=0xaaaaad59cee0) at ../hw/arm/virt.c:889
#18 0x0000aaaaab63d850 in machvirt_init (machine=0xaaaaad6849f0) at
../hw/arm/virt.c:2810
#19 0x0000aaaaaaf86a50 in machine_run_board_init (machine=0xaaaaad6849f0, mem_path=0x0, errp=0xffffffffee48) at
../hw/core/machine.c:1722
#20 0x0000aaaaab3ab98c in qemu_init_board () at ../system/vl.c:2723
#21 0x0000aaaaab3abdd4 in qmp_x_exit_preconfig (errp=0xaaaaad20b728
<error_fatal>) at ../system/vl.c:2821
#22 0x0000aaaaab3ae430 in qemu_init (argc=19, argv=0xfffffffff238) at
../system/vl.c:3882
#23 0x0000aaaaabb85008 in main (argc=19, argv=0xfffffffff238) at
../system/main.c:71
(gdb)


Please check:

[1]   https://lore.kernel.org/lkml/6ef5f8d7b52b4eee8dbf9186046e920c@huawei.com/
[2]   https://lore.kernel.org/lkml/8b82541b805b4a9293f15740df73eaa8@huawei.com/


Many thanks!

Best regards
Salil.


> >
> >  include/hw/intc/arm_gicv3_common.h |  3 ++
> >  hw/intc/arm_gicv3_kvm.c            | 49 +++++++++++++++++++++---------
> >  2 files changed, 38 insertions(+), 14 deletions(-)
> >
> > diff --git a/include/hw/intc/arm_gicv3_common.h
> > b/include/hw/intc/arm_gicv3_common.h
> > index 38aa1961c50..61d51915e07 100644
> > --- a/include/hw/intc/arm_gicv3_common.h
> > +++ b/include/hw/intc/arm_gicv3_common.h
> > @@ -166,6 +166,9 @@ struct GICv3CPUState {
> >      uint64_t icc_igrpen[3];
> >      uint64_t icc_ctlr_el3;
> >
> > +    /* For KVM, cached copy of the kernel reset value of ICC_CTLR_EL1 */
> > +    uint64_t kvm_reset_icc_ctlr_el1;
> > +
> >      /* Virtualization control interface */
> >      uint64_t ich_apr[3][4]; /* ich_apr[GICV3_G1][x] never used */
> >      uint64_t ich_hcr_el2;
> > diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c index
> > 9829e2146da..b95e6ea057a 100644
> > --- a/hw/intc/arm_gicv3_kvm.c
> > +++ b/hw/intc/arm_gicv3_kvm.c
> > @@ -666,11 +666,24 @@ static void kvm_arm_gicv3_get(GICv3State *s)
> >
> >  static void arm_gicv3_icc_reset(CPUARMState *env, const ARMCPRegInfo
> > *ri)  {
> > -    GICv3State *s;
> > -    GICv3CPUState *c;
> > +    GICv3CPUState *c = (GICv3CPUState *)env->gicv3state;
> >
> > -    c = (GICv3CPUState *)env->gicv3state;
> > -    s = c->gic;
> > +    /*
> > +     * This function is called when each vcpu resets. The kernel
> > +     * API for the GIC assumes that it is only to be used when the
> > +     * whole VM is paused, so if we attempt to read the kernel's
> > +     * reset values here we might get EBUSY failures.
> > +     * So instead we assume we know what the kernel's reset values
> > +     * are (mostly zeroes) and only update the QEMU state struct
> > +     * fields. The exception is that we do need to know the kernel's
> > +     * idea of the ICC_CTLR_EL1 reset value, so we cache that at
> > +     * device realize time.
> > +     *
> > +     * This makes these sysregs different from the usual CPU ones,
> > +     * which can be validly read and written when only the single
> > +     * vcpu they apply to is paused, and where (in target/arm code)
> > +     * we read the reset values out of the kernel on every reset.
> > +     */
> >
> >      c->icc_pmr_el1 = 0;
> >      /*
> > @@ -691,16 +704,8 @@ static void arm_gicv3_icc_reset(CPUARMState
> *env,
> > const ARMCPRegInfo *ri)
> >      memset(c->icc_apr, 0, sizeof(c->icc_apr));
> >      memset(c->icc_igrpen, 0, sizeof(c->icc_igrpen));
> >
> > -    if (s->migration_blocker) {
> > -        return;
> > -    }
> > -
> > -    /* Initialize to actual HW supported configuration */
> > -    kvm_device_access(s->dev_fd,
> > KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
> > -                      KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
> > -                      &c->icc_ctlr_el1[GICV3_NS], false, &error_abort);
> > -
> > -    c->icc_ctlr_el1[GICV3_S] = c->icc_ctlr_el1[GICV3_NS];
> > +    c->icc_ctlr_el1[GICV3_NS] = c->kvm_reset_icc_ctlr_el1;
> > +    c->icc_ctlr_el1[GICV3_S] = c->kvm_reset_icc_ctlr_el1;
> >  }
> >
> >  static void kvm_arm_gicv3_reset_hold(Object *obj, ResetType type) @@
> > -
> > 939,6 +944,22 @@ static void kvm_arm_gicv3_realize(DeviceState *dev,
> > Error **errp)
> >                                      kvm_arm_gicv3_notifier,
> >                                      MIG_MODE_CPR_TRANSFER);
> >      }
> > +
> > +    /*
> > +     * Now we can read the kernel's initial value of ICC_CTLR_EL1, which
> > +     * we will need if a CPU interface is reset. If the kernel is ancient
> > +     * and doesn't support writing the GIC state then we don't need to
> > +     * care what reset does to QEMU's data structures.
> > +     */
> > +    if (!s->migration_blocker) {
> > +        for (i = 0; i < s->num_cpu; i++) {
> > +            GICv3CPUState *c = &s->cpu[i];
> > +
> > +            kvm_device_access(s->dev_fd,
> > KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
> > +                              KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
> > +                              &c->kvm_reset_icc_ctlr_el1, false, &error_abort);
> > +        }
> > +    }
> >  }
> >
> >  static void kvm_arm_gicv3_class_init(ObjectClass *klass, const void
> > *data)
> > --
> > 2.43.0
> >
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 13:23   ` Salil Mehta via
@ 2025-10-14 13:31     ` Peter Maydell
  2025-10-14 13:41       ` Salil Mehta via
  2025-10-16 12:09       ` Salil Mehta via
  0 siblings, 2 replies; 32+ messages in thread
From: Peter Maydell @ 2025-10-14 13:31 UTC (permalink / raw)
  To: Salil Mehta; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

On Tue, 14 Oct 2025 at 14:23, Salil Mehta <salil.mehta@huawei.com> wrote:
>
> Hi Peter,
>
> > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Salil
> > Mehta via
> > Sent: Tuesday, October 14, 2025 11:41 AM
> > To: Peter Maydell <peter.maydell@linaro.org>; qemu-devel@nongnu.org
> >
> > Hi Peter,
> >
> > > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org
> > <qemu-
> > > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Peter
> > > Maydell
> > > Sent: Tuesday, October 14, 2025 11:25 AM
> > > To: qemu-devel@nongnu.org
> > >
> > > Currently in arm_gicv3_icc_reset() we read the kernel's value of
> > > ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly
> > > works, but we're actually breaking an assumption the kernel makes that
> > > userspace only accesses the in-kernel GIC data when the VM is totally
> > > paused, which may not be the case if a single vCPU is being reset.
> > > The effect is that it's possible that the read attempt returns EBUSY.
> > >
> > > Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1
> > > once in device realize. This brings ICC_CTLR_EL1 into line with the
> > > other cpuif registers, where we assume we know what the kernel is
> > > resetting them to and just update QEMU's data structures in
> > arm_gicv3_icc_reset().
> > >
> > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > > ---
> > > I've only tested this fairly lightly, but it seems to work.
> > > Salil, does this fix the EBUSY issues you were seeing ?
> >
> >
> > Let me try this and get back to you.  Also, just to let you know that -EBUSY
> > can return from other places as well. Please check  my reply in the other mail-
> > chain.
>
>
> Got this.
>
> (gdb) handle SIGUSR1 nostop noprint pass
> Signal        Stop      Print   Pass to program Description
> SIGUSR1       No        No      Yes             User defined signal 1
> (gdb) run
> Starting program:
> /opt/workspace/code/qemu/qemu/build/qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 -cpu host -smp cpus=2,disabledcpus=2 -m 300M -kernel /opt/workspace/code/linux/linux/arch/arm64/boot/Image
> -initrd /opt/workspace/code/filesystem/rootfs.cpio.gz -append console=ttyAMA0\ root=/dev/ram\ earlycon\ rdinit=/init\ maxcpus=1\ acpi=force -nographic -bios /opt/workspace/code/uefi/edk2/Build/ArmVirtQemu-AARCH64/RELEASE_GCC5/FV/QEMU_EFI.fd
> [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> [New Thread 0xfffff5b5eb40 (LWP 31994)]
> [New Thread 0xfffff4e88b40 (LWP 31996)]
> [New Thread 0xffffd4dfeb40 (LWP 31997)]
> Unexpected error in kvm_device_access() at ../accel/kvm/kvm-all.c:3475:
> qemu-system-aarch64: KVM_GET_DEVICE_ATTR failed: Group 6 attr
> 0x000000000000c664: Inappropriate ioctl for device

Does it do this consistently, or only sometimes? What
host kernel version are you running? And what QEMU
commit (plus this patch)?

I'm guessing from that "disabledcpus=2" part that you're
running some not-yet-upstream set of QEMU patches. Please
drop those, and test only with this, to rule out the
possibility of some bug/unexpected interaction with those.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 13:31     ` Peter Maydell
@ 2025-10-14 13:41       ` Salil Mehta via
  2025-10-14 13:49         ` Peter Maydell
  2025-10-16 12:09       ` Salil Mehta via
  1 sibling, 1 reply; 32+ messages in thread
From: Salil Mehta via @ 2025-10-14 13:41 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Tuesday, October 14, 2025 2:31 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> 
> On Tue, 14 Oct 2025 at 14:23, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > Hi Peter,
> >
> > > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org
> <qemu-
> > > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Salil
> > > Mehta via
> > > Sent: Tuesday, October 14, 2025 11:41 AM
> > > To: Peter Maydell <peter.maydell@linaro.org>; qemu-
> devel@nongnu.org
> > >
> > > Hi Peter,
> > >
> > > > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org
> > > <qemu-
> > > > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of
> > > > devel-bounces+Peter
> > > > Maydell
> > > > Sent: Tuesday, October 14, 2025 11:25 AM
> > > > To: qemu-devel@nongnu.org
> > > >
> > > > Currently in arm_gicv3_icc_reset() we read the kernel's value of
> > > > ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly
> > > > works, but we're actually breaking an assumption the kernel makes
> > > > that userspace only accesses the in-kernel GIC data when the VM is
> > > > totally paused, which may not be the case if a single vCPU is being
> reset.
> > > > The effect is that it's possible that the read attempt returns EBUSY.
> > > >
> > > > Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1
> > > > once in device realize. This brings ICC_CTLR_EL1 into line with
> > > > the other cpuif registers, where we assume we know what the kernel
> > > > is resetting them to and just update QEMU's data structures in
> > > arm_gicv3_icc_reset().
> > > >
> > > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > > > ---
> > > > I've only tested this fairly lightly, but it seems to work.
> > > > Salil, does this fix the EBUSY issues you were seeing ?
> > >
> > >
> > > Let me try this and get back to you.  Also, just to let you know
> > > that -EBUSY can return from other places as well. Please check  my
> > > reply in the other mail- chain.
> >
> >
> > Got this.
> >
> > (gdb) handle SIGUSR1 nostop noprint pass
> > Signal        Stop      Print   Pass to program Description
> > SIGUSR1       No        No      Yes             User defined signal 1
> > (gdb) run
> > Starting program:
> > /opt/workspace/code/qemu/qemu/build/qemu-system-aarch64 --enable-
> kvm
> > -machine virt,gic-version=3 -cpu host -smp cpus=2,disabledcpus=2 -m
> > 300M -kernel /opt/workspace/code/linux/linux/arch/arm64/boot/Image
> > -initrd /opt/workspace/code/filesystem/rootfs.cpio.gz -append
> > console=ttyAMA0\ root=/dev/ram\ earlycon\ rdinit=/init\ maxcpus=1\
> > acpi=force -nographic -bios
> > /opt/workspace/code/uefi/edk2/Build/ArmVirtQemu-
> AARCH64/RELEASE_GCC5/F
> > V/QEMU_EFI.fd [Thread debugging using libthread_db enabled] Using host
> > libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> > [New Thread 0xfffff5b5eb40 (LWP 31994)] [New Thread 0xfffff4e88b40
> > (LWP 31996)] [New Thread 0xffffd4dfeb40 (LWP 31997)] Unexpected error
> > in kvm_device_access() at ../accel/kvm/kvm-all.c:3475:
> > qemu-system-aarch64: KVM_GET_DEVICE_ATTR failed: Group 6 attr
> > 0x000000000000c664: Inappropriate ioctl for device
> 
> Does it do this consistently, or only sometimes? What host kernel version are
> you running? And what QEMU commit (plus this patch)?

I've tried 3 times and it happened all the 3 times but this is a very pertinent
question and I'm not sure if every attempt will lead to this.

I thought you asked me to validate the fix by replacing below:

https://lore.kernel.org/qemu-devel/20251001010127.3092631-22-salil.mehta@opnsrc.net/


Yes, I'm using the recent RFC V6 vCPU Hotplug patches branch I've pushed to the
community.

https://lore.kernel.org/qemu-devel/20251001010127.3092631-1-salil.mehta@opnsrc.net/

You can also get it here:
https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v6


Agreed. its worth seeing without the patches applied. I'll share with you the
result shortly.


Thanks
Salil.

> 
> I'm guessing from that "disabledcpus=2" part that you're running some not-
> yet-upstream set of QEMU patches. Please drop those, and test only with
> this, to rule out the possibility of some bug/unexpected interaction with
> those.
> 
> thanks
> -- PMM

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 13:41       ` Salil Mehta via
@ 2025-10-14 13:49         ` Peter Maydell
  2025-10-14 14:22           ` Salil Mehta via
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Maydell @ 2025-10-14 13:49 UTC (permalink / raw)
  To: Salil Mehta; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

On Tue, 14 Oct 2025 at 14:41, Salil Mehta <salil.mehta@huawei.com> wrote:
> I thought you asked me to validate the fix by replacing below:
>
> https://lore.kernel.org/qemu-devel/20251001010127.3092631-22-salil.mehta@opnsrc.net/
>
>
> Yes, I'm using the recent RFC V6 vCPU Hotplug patches branch I've pushed to the
> community.
>
> https://lore.kernel.org/qemu-devel/20251001010127.3092631-1-salil.mehta@opnsrc.net/

That's the one with the "lazy realize" hack, right? I imagine
what's happening is that we realize the GIC, and the code in
this patch assumes that all the CPUs are already realized at
that point. When we try to get the register value for a
not-yet-realized CPU the kernel complains.

(I strongly agree with Igor's review remarks here
https://lore.kernel.org/qemu-devel/20251006160027.20067fe4@fedora/
that lazy realizing of CPU objects is a bad idea.)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 13:49         ` Peter Maydell
@ 2025-10-14 14:22           ` Salil Mehta via
  2025-10-14 14:28             ` Peter Maydell
  0 siblings, 1 reply; 32+ messages in thread
From: Salil Mehta via @ 2025-10-14 14:22 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

Hi Peter,

> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Tuesday, October 14, 2025 2:50 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> 
> On Tue, 14 Oct 2025 at 14:41, Salil Mehta <salil.mehta@huawei.com> wrote:
> > I thought you asked me to validate the fix by replacing below:
> >
> > https://lore.kernel.org/qemu-devel/20251001010127.3092631-22-salil.meh
> > ta@opnsrc.net/
> >
> >
> > Yes, I'm using the recent RFC V6 vCPU Hotplug patches branch I've
> > pushed to the community.
> >
> > https://lore.kernel.org/qemu-devel/20251001010127.3092631-1-salil.meht
> > a@opnsrc.net/
> 
> That's the one with the "lazy realize" hack, right? I imagine what's happening
> is that we realize the GIC, and the code in this patch assumes that all the
> CPUs are already realized at that point. When we try to get the register value
> for a not-yet-realized CPU the kernel complains.

Even if we realize all of the vCPUs the problem will not go away. This problem
is happening because we have recently started to Exit Hypercalls to userspace.
This means we are now accessing the system register in a non-atomic context.

In fact in contrary to above, lazy realization actually helps in reducing the vCPU
lock contention as there are no threads running within KVM_RUN IOCTL. Hence,
those threads do not take the lock and hence do not cause lock contention.

If we are handling HVC and resetting the system register in vCPU thread context
then we are already in atomic context as vCPU mutexes are taken inside the KVM .
The problem what we are seeing comes into picture only when we are trying to
access the system registers without holding vCPU mutex lock because we are
not in KVM_RUN IOCTL.

For example,
1. When we Exit the HVC.SMC Hypercall into userspace and access the ICC_CTLR_EL1
system register via KVM Device IOCTL.

OR

2. Like in the current patch, we are trying to access ICC_CTLR_EL1 when we are not
in any vCPU context running inside KVM_RUN IOCTL. Here, we will most probably
contend with CPU0 held mutex (at least)

> 
> (I strongly agree with Igor's review remarks here
> https://lore.kernel.org/qemu-devel/20251006160027.20067fe4@fedora/
> that lazy realizing of CPU objects is a bad idea.)

The observation you are seeing has got nothing to do with lazy realization.
The problem happens even after threads are realized and then we try to access
the ICC_CTLR_EL1 register during cpu_reset()

Many thanks!

Best regards
Salil.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 14:22           ` Salil Mehta via
@ 2025-10-14 14:28             ` Peter Maydell
  2025-10-14 14:48               ` Salil Mehta via
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Maydell @ 2025-10-14 14:28 UTC (permalink / raw)
  To: Salil Mehta; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

On Tue, 14 Oct 2025 at 15:22, Salil Mehta <salil.mehta@huawei.com> wrote:
>
> Hi Peter,
>
> > From: Peter Maydell <peter.maydell@linaro.org>
> > Sent: Tuesday, October 14, 2025 2:50 PM
> > To: Salil Mehta <salil.mehta@huawei.com>
> >
> > On Tue, 14 Oct 2025 at 14:41, Salil Mehta <salil.mehta@huawei.com> wrote:
> > > I thought you asked me to validate the fix by replacing below:
> > >
> > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-22-salil.meh
> > > ta@opnsrc.net/
> > >
> > >
> > > Yes, I'm using the recent RFC V6 vCPU Hotplug patches branch I've
> > > pushed to the community.
> > >
> > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-1-salil.meht
> > > a@opnsrc.net/
> >
> > That's the one with the "lazy realize" hack, right? I imagine what's happening
> > is that we realize the GIC, and the code in this patch assumes that all the
> > CPUs are already realized at that point. When we try to get the register value
> > for a not-yet-realized CPU the kernel complains.
>
>
> Even if we realize all of the vCPUs the problem will not go away. This problem
> is happening because we have recently started to Exit Hypercalls to userspace.
> This means we are now accessing the system register in a non-atomic context.

The point of this patch is that it moves the read of ICC_CTLR_EL1
out of the reset path and into the GIC realize method, at which point
no vCPUs should have started running. But it does assume that you
don't have half-created VCPUs connected to the GIC.

> The observation you are seeing has got nothing to do with lazy realization.
> The problem happens even after threads are realized and then we try to access
> the ICC_CTLR_EL1 register during cpu_reset()

With this patch, we should not be accessing ICC_CTLR_EL1
during CPU reset. The backtrace you posted does
not have CPU reset in it, so whatever is going wrong
there must be something else.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 14:28             ` Peter Maydell
@ 2025-10-14 14:48               ` Salil Mehta via
  2025-10-14 14:59                 ` Peter Maydell
  0 siblings, 1 reply; 32+ messages in thread
From: Salil Mehta via @ 2025-10-14 14:48 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

Hi Peter,

> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Tuesday, October 14, 2025 3:29 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> 
> On Tue, 14 Oct 2025 at 15:22, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > Hi Peter,
> >
> > > From: Peter Maydell <peter.maydell@linaro.org>
> > > Sent: Tuesday, October 14, 2025 2:50 PM
> > > To: Salil Mehta <salil.mehta@huawei.com>
> > >
> > > On Tue, 14 Oct 2025 at 14:41, Salil Mehta <salil.mehta@huawei.com>
> wrote:
> > > > I thought you asked me to validate the fix by replacing below:
> > > >
> > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-22-salil
> > > > .meh
> > > > ta@opnsrc.net/
> > > >
> > > >
> > > > Yes, I'm using the recent RFC V6 vCPU Hotplug patches branch I've
> > > > pushed to the community.
> > > >
> > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-1-salil.
> > > > meht
> > > > a@opnsrc.net/
> > >
> > > That's the one with the "lazy realize" hack, right? I imagine what's
> > > happening is that we realize the GIC, and the code in this patch
> > > assumes that all the CPUs are already realized at that point. When
> > > we try to get the register value for a not-yet-realized CPU the kernel
> complains.
> >
> >
> > Even if we realize all of the vCPUs the problem will not go away. This
> > problem is happening because we have recently started to Exit Hypercalls
> to userspace.
> > This means we are now accessing the system register in a non-atomic
> context.
> 
> The point of this patch is that it moves the read of ICC_CTLR_EL1 out of the
> reset path and into the GIC realize method, at which point no vCPUs should
> have started running. But it does assume that you don't have half-created
> VCPUs connected to the GIC.

This Is not true. Actually, inner cpu_exec() (in kvm-all..c)  loop keeps on dipping
into the KVM_RUN IOCTL and exiting back with INTR continuously as the realized
vCPUs are in RUNNABLE state initially. The actual "start-powered-off" policy only
gets applied after first system-reset happens.

Hence, this is the state of *transient* lock contention within the KVM and is 
probabilistic. It also explains why it does not happens always.

If we increase the number of realized vCPU threads, the probability of this
transient lock contention becomes even higher and you will tend to see this
condition most of the times.

> 
> > The observation you are seeing has got nothing to do with lazy realization.
> > The problem happens even after threads are realized and then we try to
> > access the ICC_CTLR_EL1 register during cpu_reset()
> 
> With this patch, we should not be accessing ICC_CTLR_EL1 during CPU reset.
> The backtrace you posted does not have CPU reset in it, so whatever is going
> wrong there must be something else.

Yes, but its crashing in the realization of the GIC i.e. in context of machvirt_init()
First reset of the vCPUs happens much later than this. Hence, the reason of this
contention is different than the one you are trying to solve using this patch.
We can get -EBUSY at many different paths but the reason is always vCPU lock
contention. It could be transient or because guest are actually running inside
the KVM_RUN IOCTL (which means vCPU mutex will be held - a more permanent
condition)

Many thanks!

Best regards
Salil.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 14:48               ` Salil Mehta via
@ 2025-10-14 14:59                 ` Peter Maydell
  2025-10-14 15:13                   ` Salil Mehta via
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Maydell @ 2025-10-14 14:59 UTC (permalink / raw)
  To: Salil Mehta; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

On Tue, 14 Oct 2025 at 15:48, Salil Mehta <salil.mehta@huawei.com> wrote:
>
> Hi Peter,
>
> > From: Peter Maydell <peter.maydell@linaro.org>
> > Sent: Tuesday, October 14, 2025 3:29 PM
> > To: Salil Mehta <salil.mehta@huawei.com>
> >
> > On Tue, 14 Oct 2025 at 15:22, Salil Mehta <salil.mehta@huawei.com> wrote:
> > >
> > > Hi Peter,
> > >
> > > > From: Peter Maydell <peter.maydell@linaro.org>
> > > > Sent: Tuesday, October 14, 2025 2:50 PM
> > > > To: Salil Mehta <salil.mehta@huawei.com>
> > > >
> > > > On Tue, 14 Oct 2025 at 14:41, Salil Mehta <salil.mehta@huawei.com>
> > wrote:
> > > > > I thought you asked me to validate the fix by replacing below:
> > > > >
> > > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-22-salil
> > > > > .meh
> > > > > ta@opnsrc.net/
> > > > >
> > > > >
> > > > > Yes, I'm using the recent RFC V6 vCPU Hotplug patches branch I've
> > > > > pushed to the community.
> > > > >
> > > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-1-salil.
> > > > > meht
> > > > > a@opnsrc.net/
> > > >
> > > > That's the one with the "lazy realize" hack, right? I imagine what's
> > > > happening is that we realize the GIC, and the code in this patch
> > > > assumes that all the CPUs are already realized at that point. When
> > > > we try to get the register value for a not-yet-realized CPU the kernel
> > complains.
> > >
> > >
> > > Even if we realize all of the vCPUs the problem will not go away. This
> > > problem is happening because we have recently started to Exit Hypercalls
> > to userspace.
> > > This means we are now accessing the system register in a non-atomic
> > context.
> >
> > The point of this patch is that it moves the read of ICC_CTLR_EL1 out of the
> > reset path and into the GIC realize method, at which point no vCPUs should
> > have started running. But it does assume that you don't have half-created
> > VCPUs connected to the GIC.
>
>
> This Is not true. Actually, inner cpu_exec() (in kvm-all..c)  loop keeps on dipping
> into the KVM_RUN IOCTL and exiting back with INTR continuously as the realized
> vCPUs are in RUNNABLE state initially. The actual "start-powered-off" policy only
> gets applied after first system-reset happens.

In what situation do we ever start running a VCPU before
the *GIC* has been realized? The GIC should get realized
as part of creating the virt board, which must complete
before we do anything like running a vcpu.

> > > The observation you are seeing has got nothing to do with lazy realization.
> > > The problem happens even after threads are realized and then we try to
> > > access the ICC_CTLR_EL1 register during cpu_reset()
> >
> > With this patch, we should not be accessing ICC_CTLR_EL1 during CPU reset.
> > The backtrace you posted does not have CPU reset in it, so whatever is going
> > wrong there must be something else.
>
> Yes, but its crashing in the realization of the GIC i.e. in context of machvirt_init()
> First reset of the vCPUs happens much later than this. Hence, the reason of this
> contention is different than the one you are trying to solve using this patch.

Yes, and my suggestion is that the failure you are seeing is only
because you have got half-created vcpu objects. Your backtrace
shows that the error here is not EBUSY, but ENOTTY.

-- PMM


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 14:59                 ` Peter Maydell
@ 2025-10-14 15:13                   ` Salil Mehta via
  2025-10-14 15:16                     ` Salil Mehta via
  2025-10-14 15:23                     ` Peter Maydell
  0 siblings, 2 replies; 32+ messages in thread
From: Salil Mehta via @ 2025-10-14 15:13 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Tuesday, October 14, 2025 4:00 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> 
> On Tue, 14 Oct 2025 at 15:48, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > Hi Peter,
> >
> > > From: Peter Maydell <peter.maydell@linaro.org>
> > > Sent: Tuesday, October 14, 2025 3:29 PM
> > > To: Salil Mehta <salil.mehta@huawei.com>
> > >
> > > On Tue, 14 Oct 2025 at 15:22, Salil Mehta <salil.mehta@huawei.com>
> wrote:
> > > >
> > > > Hi Peter,
> > > >
> > > > > From: Peter Maydell <peter.maydell@linaro.org>
> > > > > Sent: Tuesday, October 14, 2025 2:50 PM
> > > > > To: Salil Mehta <salil.mehta@huawei.com>
> > > > >
> > > > > On Tue, 14 Oct 2025 at 14:41, Salil Mehta
> > > > > <salil.mehta@huawei.com>
> > > wrote:
> > > > > > I thought you asked me to validate the fix by replacing below:
> > > > > >
> > > > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-22-s
> > > > > > alil
> > > > > > .meh
> > > > > > ta@opnsrc.net/
> > > > > >
> > > > > >
> > > > > > Yes, I'm using the recent RFC V6 vCPU Hotplug patches branch
> > > > > > I've pushed to the community.
> > > > > >
> > > > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-1-
> salil.
> > > > > > meht
> > > > > > a@opnsrc.net/
> > > > >
> > > > > That's the one with the "lazy realize" hack, right? I imagine
> > > > > what's happening is that we realize the GIC, and the code in
> > > > > this patch assumes that all the CPUs are already realized at
> > > > > that point. When we try to get the register value for a
> > > > > not-yet-realized CPU the kernel
> > > complains.
> > > >
> > > >
> > > > Even if we realize all of the vCPUs the problem will not go away.
> > > > This problem is happening because we have recently started to Exit
> > > > Hypercalls
> > > to userspace.
> > > > This means we are now accessing the system register in a
> > > > non-atomic
> > > context.
> > >
> > > The point of this patch is that it moves the read of ICC_CTLR_EL1
> > > out of the reset path and into the GIC realize method, at which
> > > point no vCPUs should have started running. But it does assume that
> > > you don't have half-created VCPUs connected to the GIC.
> >
> >
> > This Is not true. Actually, inner cpu_exec() (in kvm-all..c)  loop
> > keeps on dipping into the KVM_RUN IOCTL and exiting back with INTR
> > continuously as the realized vCPUs are in RUNNABLE state initially.
> > The actual "start-powered-off" policy only gets applied after first system-
> reset happens.
> 
> In what situation do we ever start running a VCPU before the *GIC* has
> been realized? The GIC should get realized as part of creating the virt board,
> which must complete before we do anything like running a vcpu.


Just after realization of vCPU in the machvirt_init() you can see the default 
power_state is PSCI CPU_ON, which means KVM_MP_STATE_RUNNABLE.
Since, the thread is up and not doing IO wait in userspace it gets into
cpu_exec() loop and actually run KVM_RUN IOCTL. Inside the KVM it
momentarily takes the vCPU mutex but later exit and releases. This keeps
going on for all of the vCPU threads realized early.

Sure, but GIC is not getting used by any of the vCPU threads. The guest
kernel and hence the VGIC driver does not exist yet. It needs to do its
initialization first before we can even think of any interrupt handling?



> 
> > > > The observation you are seeing has got nothing to do with lazy
> realization.
> > > > The problem happens even after threads are realized and then we
> > > > try to access the ICC_CTLR_EL1 register during cpu_reset()
> > >
> > > With this patch, we should not be accessing ICC_CTLR_EL1 during CPU
> reset.
> > > The backtrace you posted does not have CPU reset in it, so whatever
> > > is going wrong there must be something else.
> >
> > Yes, but its crashing in the realization of the GIC i.e. in context of
> > machvirt_init() First reset of the vCPUs happens much later than this.
> > Hence, the reason of this contention is different than the one you are
> trying to solve using this patch.
> 
> Yes, and my suggestion is that the failure you are seeing is only because you
> have got half-created vcpu objects. Your backtrace shows that the error here
> is not EBUSY, but ENOTTY.


Let me revisit that part again.


Many thanks!

Best regards
Salil.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 15:13                   ` Salil Mehta via
@ 2025-10-14 15:16                     ` Salil Mehta via
  2025-10-14 15:23                     ` Peter Maydell
  1 sibling, 0 replies; 32+ messages in thread
From: Salil Mehta via @ 2025-10-14 15:16 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier



> From: Salil Mehta
> Sent: Tuesday, October 14, 2025 4:13 PM
> To: 'Peter Maydell' <peter.maydell@linaro.org>
> Cc: qemu-devel@nongnu.org; Salil Mehta <salil.mehta@opnsrc.net>; Marc
> Zyngier <maz@kernel.org>
> Subject: RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1
> from kernel in cpuif reset
> 
> > From: Peter Maydell <peter.maydell@linaro.org>
> > Sent: Tuesday, October 14, 2025 4:00 PM
> > To: Salil Mehta <salil.mehta@huawei.com>
> >
> > On Tue, 14 Oct 2025 at 15:48, Salil Mehta <salil.mehta@huawei.com>
> wrote:
> > >
> > > Hi Peter,
> > >
> > > > From: Peter Maydell <peter.maydell@linaro.org>
> > > > Sent: Tuesday, October 14, 2025 3:29 PM
> > > > To: Salil Mehta <salil.mehta@huawei.com>
> > > >
> > > > On Tue, 14 Oct 2025 at 15:22, Salil Mehta <salil.mehta@huawei.com>
> > wrote:
> > > > >
> > > > > Hi Peter,
> > > > >
> > > > > > From: Peter Maydell <peter.maydell@linaro.org>
> > > > > > Sent: Tuesday, October 14, 2025 2:50 PM
> > > > > > To: Salil Mehta <salil.mehta@huawei.com>
> > > > > >
> > > > > > On Tue, 14 Oct 2025 at 14:41, Salil Mehta
> > > > > > <salil.mehta@huawei.com>
> > > > wrote:
> > > > > > > I thought you asked me to validate the fix by replacing below:
> > > > > > >
> > > > > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-22
> > > > > > > -s
> > > > > > > alil
> > > > > > > .meh
> > > > > > > ta@opnsrc.net/
> > > > > > >
> > > > > > >
> > > > > > > Yes, I'm using the recent RFC V6 vCPU Hotplug patches branch
> > > > > > > I've pushed to the community.
> > > > > > >
> > > > > > > https://lore.kernel.org/qemu-devel/20251001010127.3092631-1-
> > salil.
> > > > > > > meht
> > > > > > > a@opnsrc.net/
> > > > > >
> > > > > > That's the one with the "lazy realize" hack, right? I imagine
> > > > > > what's happening is that we realize the GIC, and the code in
> > > > > > this patch assumes that all the CPUs are already realized at
> > > > > > that point. When we try to get the register value for a
> > > > > > not-yet-realized CPU the kernel
> > > > complains.
> > > > >
> > > > >
> > > > > Even if we realize all of the vCPUs the problem will not go away.
> > > > > This problem is happening because we have recently started to
> > > > > Exit Hypercalls
> > > > to userspace.
> > > > > This means we are now accessing the system register in a
> > > > > non-atomic
> > > > context.
> > > >
> > > > The point of this patch is that it moves the read of ICC_CTLR_EL1
> > > > out of the reset path and into the GIC realize method, at which
> > > > point no vCPUs should have started running. But it does assume
> > > > that you don't have half-created VCPUs connected to the GIC.
> > >
> > >
> > > This Is not true. Actually, inner cpu_exec() (in kvm-all..c)  loop
> > > keeps on dipping into the KVM_RUN IOCTL and exiting back with INTR
> > > continuously as the realized vCPUs are in RUNNABLE state initially.
> > > The actual "start-powered-off" policy only gets applied after first
> > > system-
> > reset happens.
> >
> > In what situation do we ever start running a VCPU before the *GIC* has
> > been realized? The GIC should get realized as part of creating the
> > virt board, which must complete before we do anything like running a vcpu.
> 
> 
> Just after realization of vCPU in the machvirt_init() you can see the default
> power_state is PSCI CPU_ON, which means KVM_MP_STATE_RUNNABLE.
> Since, the thread is up and not doing IO wait in userspace it gets into
> cpu_exec() loop and actually run KVM_RUN IOCTL. Inside the KVM it
> momentarily takes the vCPU mutex but later exit and releases. This keeps
> going on for all of the vCPU threads realized early.

Sorry for the confusion,

Forgive me, power state is not PSCI CPU_ON but the KVM_MP_STATE is still
RUNNABLE


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 15:13                   ` Salil Mehta via
  2025-10-14 15:16                     ` Salil Mehta via
@ 2025-10-14 15:23                     ` Peter Maydell
  2025-10-14 15:32                       ` Salil Mehta via
  2025-10-14 15:39                       ` Salil Mehta via
  1 sibling, 2 replies; 32+ messages in thread
From: Peter Maydell @ 2025-10-14 15:23 UTC (permalink / raw)
  To: Salil Mehta; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

On Tue, 14 Oct 2025 at 16:13, Salil Mehta <salil.mehta@huawei.com> wrote:
>
> > From: Peter Maydell <peter.maydell@linaro.org>
> > In what situation do we ever start running a VCPU before the *GIC* has
> > been realized? The GIC should get realized as part of creating the virt board,
> > which must complete before we do anything like running a vcpu.
>
>
> Just after realization of vCPU in the machvirt_init() you can see the default
> power_state is PSCI CPU_ON, which means KVM_MP_STATE_RUNNABLE.
> Since, the thread is up and not doing IO wait in userspace it gets into
> cpu_exec() loop and actually run KVM_RUN IOCTL. Inside the KVM it
> momentarily takes the vCPU mutex but later exit and releases. This keeps
> going on for all of the vCPU threads realized early.

Yikes. We definitely should fix that : letting the vcpu run
before we get to qemu_machine_creation_done() seems like it
would be a massive source of race conditions.

-- PMM


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 15:23                     ` Peter Maydell
@ 2025-10-14 15:32                       ` Salil Mehta via
  2025-10-14 15:43                         ` Peter Maydell
  2025-10-14 16:07                         ` Salil Mehta via
  2025-10-14 15:39                       ` Salil Mehta via
  1 sibling, 2 replies; 32+ messages in thread
From: Salil Mehta via @ 2025-10-14 15:32 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Tuesday, October 14, 2025 4:24 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> 
> On Tue, 14 Oct 2025 at 16:13, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > > From: Peter Maydell <peter.maydell@linaro.org> In what situation do
> > > we ever start running a VCPU before the *GIC* has been realized? The
> > > GIC should get realized as part of creating the virt board, which
> > > must complete before we do anything like running a vcpu.
> >
> >
> > Just after realization of vCPU in the machvirt_init() you can see the
> > default power_state is PSCI CPU_ON, which means
> KVM_MP_STATE_RUNNABLE.
> > Since, the thread is up and not doing IO wait in userspace it gets
> > into
> > cpu_exec() loop and actually run KVM_RUN IOCTL. Inside the KVM it
> > momentarily takes the vCPU mutex but later exit and releases. This
> > keeps going on for all of the vCPU threads realized early.
> 
> Yikes. We definitely should fix that : letting the vcpu run before we get to
> qemu_machine_creation_done() seems like it would be a massive source of
> race conditions.

I've already proposed fix for this by parking such threads in userspace. Please
check functions virt_(un)park_cpu_in_userspace(). But need to check if we can use
this trick can be used at the very early stages of the VM initialization.

https://lore.kernel.org/qemu-devel/20251001010127.3092631-18-salil.mehta@opnsrc.net/


I also think we need 1:1 mapping between the ARMCPU power-state and the
KVM MP_STATE.  Right now,

KVM_MP_STATE_RUNNABLE = {PSCI CPU_OFF, PSCI CPU_ON}
KVM_MP_STATE_STOPPED    = {PSCI CPU_OFF}

KVM MP State RUNNABLE seems ambiguous. Should we use PSCI CPU_PENDING at
early stages instead?


Thanks!

Best regards
Salil.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 15:23                     ` Peter Maydell
  2025-10-14 15:32                       ` Salil Mehta via
@ 2025-10-14 15:39                       ` Salil Mehta via
  1 sibling, 0 replies; 32+ messages in thread
From: Salil Mehta via @ 2025-10-14 15:39 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Tuesday, October 14, 2025 4:24 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> 
> On Tue, 14 Oct 2025 at 16:13, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > > From: Peter Maydell <peter.maydell@linaro.org> In what situation do
> > > we ever start running a VCPU before the *GIC* has been realized? The
> > > GIC should get realized as part of creating the virt board, which
> > > must complete before we do anything like running a vcpu.
> >
> >
> > Just after realization of vCPU in the machvirt_init() you can see the
> > default power_state is PSCI CPU_ON, which means
> KVM_MP_STATE_RUNNABLE.
> > Since, the thread is up and not doing IO wait in userspace it gets
> > into
> > cpu_exec() loop and actually run KVM_RUN IOCTL. Inside the KVM it
> > momentarily takes the vCPU mutex but later exit and releases. This
> > keeps going on for all of the vCPU threads realized early.
> 
> Yikes. We definitely should fix that : letting the vcpu run before we get to
> qemu_machine_creation_done() seems like it would be a massive source of
> race conditions.


BTW, I did raise this issue in the last KVM Call with Alex Bennée


Best regards
Salil.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 15:32                       ` Salil Mehta via
@ 2025-10-14 15:43                         ` Peter Maydell
  2025-10-14 15:54                           ` Salil Mehta via
  2025-10-14 19:36                           ` Salil Mehta via
  2025-10-14 16:07                         ` Salil Mehta via
  1 sibling, 2 replies; 32+ messages in thread
From: Peter Maydell @ 2025-10-14 15:43 UTC (permalink / raw)
  To: Salil Mehta; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

On Tue, 14 Oct 2025 at 16:33, Salil Mehta <salil.mehta@huawei.com> wrote:
>
> > From: Peter Maydell <peter.maydell@linaro.org>
> > Sent: Tuesday, October 14, 2025 4:24 PM
> > To: Salil Mehta <salil.mehta@huawei.com>
> >
> > On Tue, 14 Oct 2025 at 16:13, Salil Mehta <salil.mehta@huawei.com> wrote:
> > >
> > > > From: Peter Maydell <peter.maydell@linaro.org> In what situation do
> > > > we ever start running a VCPU before the *GIC* has been realized? The
> > > > GIC should get realized as part of creating the virt board, which
> > > > must complete before we do anything like running a vcpu.
> > >
> > >
> > > Just after realization of vCPU in the machvirt_init() you can see the
> > > default power_state is PSCI CPU_ON, which means
> > KVM_MP_STATE_RUNNABLE.
> > > Since, the thread is up and not doing IO wait in userspace it gets
> > > into
> > > cpu_exec() loop and actually run KVM_RUN IOCTL. Inside the KVM it
> > > momentarily takes the vCPU mutex but later exit and releases. This
> > > keeps going on for all of the vCPU threads realized early.
> >
> > Yikes. We definitely should fix that : letting the vcpu run before we get to
> > qemu_machine_creation_done() seems like it would be a massive source of
> > race conditions.
>
> I've already proposed fix for this by parking such threads in userspace. Please
> check functions virt_(un)park_cpu_in_userspace(). But need to check if we can use
> this trick can be used at the very early stages of the VM initialization.

I had a look at this on x86, and we correctly don't try to
KVM_RUN the vcpus early. What happens there is:
 * the vcpu thread calls qemu_process_cpu_events()
 * this causes it to go to sleep on the cpu->halt_cond: so
   it will not end up doing KVM_RUN yet
 * later, the main thread completes initialization of the board
 * qdev_machine_creation_done() calls cpu_synchronize_all_post_init()
 * for kvm, this causes us to call kvm_cpu_synchronize_post_init()
   for each vcpu
 * that will call run_on_cpu() which ends up calling qemu_cpu_kick()
 * qemu_cpu_kick() does a broadcast on cpu->halt_cond, which
   wakes up the vcpu thread and lets it go into kvm_cpu_exec()
   for the first time

Why doesn't this mechanism work on Arm ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 15:43                         ` Peter Maydell
@ 2025-10-14 15:54                           ` Salil Mehta via
  2025-10-14 19:36                           ` Salil Mehta via
  1 sibling, 0 replies; 32+ messages in thread
From: Salil Mehta via @ 2025-10-14 15:54 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

> From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Peter
> Maydell
> Sent: Tuesday, October 14, 2025 4:44 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> 
> On Tue, 14 Oct 2025 at 16:33, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > > From: Peter Maydell <peter.maydell@linaro.org>
> > > Sent: Tuesday, October 14, 2025 4:24 PM
> > > To: Salil Mehta <salil.mehta@huawei.com>
> > >
> > > On Tue, 14 Oct 2025 at 16:13, Salil Mehta <salil.mehta@huawei.com>
> wrote:
> > > >
> > > > > From: Peter Maydell <peter.maydell@linaro.org> In what situation
> > > > > do we ever start running a VCPU before the *GIC* has been
> > > > > realized? The GIC should get realized as part of creating the
> > > > > virt board, which must complete before we do anything like running a
> vcpu.
> > > >
> > > >
> > > > Just after realization of vCPU in the machvirt_init() you can see
> > > > the default power_state is PSCI CPU_ON, which means
> > > KVM_MP_STATE_RUNNABLE.
> > > > Since, the thread is up and not doing IO wait in userspace it gets
> > > > into
> > > > cpu_exec() loop and actually run KVM_RUN IOCTL. Inside the KVM it
> > > > momentarily takes the vCPU mutex but later exit and releases. This
> > > > keeps going on for all of the vCPU threads realized early.
> > >
> > > Yikes. We definitely should fix that : letting the vcpu run before
> > > we get to
> > > qemu_machine_creation_done() seems like it would be a massive source
> > > of race conditions.
> >
> > I've already proposed fix for this by parking such threads in
> > userspace. Please check functions virt_(un)park_cpu_in_userspace().
> > But need to check if we can use this trick can be used at the very early
> stages of the VM initialization.
> 
> I had a look at this on x86, and we correctly don't try to KVM_RUN the vcpus
> early. What happens there is:
>  * the vcpu thread calls qemu_process_cpu_events()
>  * this causes it to go to sleep on the cpu->halt_cond: so
>    it will not end up doing KVM_RUN yet
>  * later, the main thread completes initialization of the board
>  * qdev_machine_creation_done() calls cpu_synchronize_all_post_init()
>  * for kvm, this causes us to call kvm_cpu_synchronize_post_init()
>    for each vcpu
>  * that will call run_on_cpu() which ends up calling qemu_cpu_kick()
>  * qemu_cpu_kick() does a broadcast on cpu->halt_cond, which
>    wakes up the vcpu thread and lets it go into kvm_cpu_exec()
>    for the first time
> 
> Why doesn't this mechanism work on Arm ?


I will need some time to revisit this.


Many thanks!

Best regards
Salil.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 15:32                       ` Salil Mehta via
  2025-10-14 15:43                         ` Peter Maydell
@ 2025-10-14 16:07                         ` Salil Mehta via
  2025-10-14 16:12                           ` Peter Maydell
  1 sibling, 1 reply; 32+ messages in thread
From: Salil Mehta via @ 2025-10-14 16:07 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

> From: Salil Mehta
> Sent: Tuesday, October 14, 2025 4:33 PM
> To: 'Peter Maydell' <peter.maydell@linaro.org>
> 
> > From: Peter Maydell <peter.maydell@linaro.org>
> > Sent: Tuesday, October 14, 2025 4:24 PM
> > To: Salil Mehta <salil.mehta@huawei.com>
> >
> > On Tue, 14 Oct 2025 at 16:13, Salil Mehta <salil.mehta@huawei.com>
> wrote:
> > >
> > > > From: Peter Maydell <peter.maydell@linaro.org> In what situation
> > > > do we ever start running a VCPU before the *GIC* has been
> > > > realized? The GIC should get realized as part of creating the virt
> > > > board, which must complete before we do anything like running a vcpu.
> > >
> > >
> > > Just after realization of vCPU in the machvirt_init() you can see
> > > the default power_state is PSCI CPU_ON, which means
> > KVM_MP_STATE_RUNNABLE.
> > > Since, the thread is up and not doing IO wait in userspace it gets
> > > into
> > > cpu_exec() loop and actually run KVM_RUN IOCTL. Inside the KVM it
> > > momentarily takes the vCPU mutex but later exit and releases. This
> > > keeps going on for all of the vCPU threads realized early.
> >
> > Yikes. We definitely should fix that : letting the vcpu run before we
> > get to
> > qemu_machine_creation_done() seems like it would be a massive source
> > of race conditions.
> 
> I've already proposed fix for this by parking such threads in userspace. Please
> check functions virt_(un)park_cpu_in_userspace(). But need to check if we
> can use this trick can be used at the very early stages of the VM initialization.
> 
> https://lore.kernel.org/qemu-devel/20251001010127.3092631-18-
> salil.mehta@opnsrc.net/
> 
> 
> I also think we need 1:1 mapping between the ARMCPU power-state and the
> KVM MP_STATE.  Right now,
> 
> KVM_MP_STATE_RUNNABLE = {PSCI CPU_OFF, PSCI CPU_ON}
> KVM_MP_STATE_STOPPED    = {PSCI CPU_OFF}
> 
> KVM MP State RUNNABLE seems ambiguous. Should we use PSCI
> CPU_PENDING at early stages instead?

There is one more issue.

/*
 * Update KVM's MP_STATE based on what QEMU thinks it is
 */
static int kvm_arm_sync_mpstate_to_kvm(ARMCPU *cpu)
{
    if (cap_has_mp_state) {
        struct kvm_mp_state mp_state = {
            .mp_state = (cpu->power_state == PSCI_OFF) ?
            KVM_MP_STATE_STOPPED : KVM_MP_STATE_RUNNABLE
        };
        return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MP_STATE, &mp_state);
    }
    return 0;
}


value of PSCI_OFF = 1 and we do not initialize the default state of power_state.
This means KVM_MP_STATE state will be configured as RUNNABLE at the first
instance till the cpu_reset() happens. This is not correct either.

But then what should be the vCPUs default state, KVM_MP_STATE _STOPPED?
Stopped is implemented as sleep inside the KVM - a blocking call.  An invitation
to the vCPU lock contention? or should be sleep in userspace?

static void kvm_vcpu_sleep(struct kvm_vcpu *vcpu)
{
	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);

	rcuwait_wait_event(wait,
			   (!kvm_arm_vcpu_stopped(vcpu)) && (!vcpu->arch.pause),
			   TASK_INTERRUPTIBLE);

	if (kvm_arm_vcpu_stopped(vcpu) || vcpu->arch.pause) {
		/* Awaken to handle a signal, request we sleep again later. */
		kvm_make_request(KVM_REQ_SLEEP, vcpu);
	}

	/*
	 * Make sure we will observe a potential reset request if we've
	 * observed a change to the power state. Pairs with the smp_wmb() in
	 * kvm_psci_vcpu_on().
	 */
	smp_rmb();
}



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 16:07                         ` Salil Mehta via
@ 2025-10-14 16:12                           ` Peter Maydell
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Maydell @ 2025-10-14 16:12 UTC (permalink / raw)
  To: Salil Mehta; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

On Tue, 14 Oct 2025 at 17:07, Salil Mehta <salil.mehta@huawei.com> wrote:
> There is one more issue.
>
> /*
>  * Update KVM's MP_STATE based on what QEMU thinks it is
>  */
> static int kvm_arm_sync_mpstate_to_kvm(ARMCPU *cpu)
> {
>     if (cap_has_mp_state) {
>         struct kvm_mp_state mp_state = {
>             .mp_state = (cpu->power_state == PSCI_OFF) ?
>             KVM_MP_STATE_STOPPED : KVM_MP_STATE_RUNNABLE
>         };
>         return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MP_STATE, &mp_state);
>     }
>     return 0;
> }
>
>
> value of PSCI_OFF = 1 and we do not initialize the default state of power_state.
> This means KVM_MP_STATE state will be configured as RUNNABLE at the first
> instance till the cpu_reset() happens. This is not correct either.

The answer is that nothing should care about the KVM_MP_STATE
until after the CPU has reset for the first time (i.e. we
should never try to KVM_RUN a CPU object we never reset).

-- PMM


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 15:43                         ` Peter Maydell
  2025-10-14 15:54                           ` Salil Mehta via
@ 2025-10-14 19:36                           ` Salil Mehta via
  2025-10-17  1:43                             ` Salil Mehta
  1 sibling, 1 reply; 32+ messages in thread
From: Salil Mehta via @ 2025-10-14 19:36 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

Hi Peter,

> From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Peter
> Maydell
> Sent: Tuesday, October 14, 2025 4:44 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> 
> On Tue, 14 Oct 2025 at 16:33, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > > From: Peter Maydell <peter.maydell@linaro.org>
> > > Sent: Tuesday, October 14, 2025 4:24 PM
> > > To: Salil Mehta <salil.mehta@huawei.com>
> > >
> > > On Tue, 14 Oct 2025 at 16:13, Salil Mehta <salil.mehta@huawei.com>
> wrote:
> > > >
> > > > > From: Peter Maydell <peter.maydell@linaro.org> In what situation
> > > > > do we ever start running a VCPU before the *GIC* has been
> > > > > realized? The GIC should get realized as part of creating the
> > > > > virt board, which must complete before we do anything like running a
> vcpu.
> > > >
> > > >
> > > > Just after realization of vCPU in the machvirt_init() you can see
> > > > the default power_state is PSCI CPU_ON, which means
> > > KVM_MP_STATE_RUNNABLE.
> > > > Since, the thread is up and not doing IO wait in userspace it gets
> > > > into
> > > > cpu_exec() loop and actually run KVM_RUN IOCTL. Inside the KVM it
> > > > momentarily takes the vCPU mutex but later exit and releases. This
> > > > keeps going on for all of the vCPU threads realized early.
> > >
> > > Yikes. We definitely should fix that : letting the vcpu run before
> > > we get to
> > > qemu_machine_creation_done() seems like it would be a massive source
> > > of race conditions.
> >
> > I've already proposed fix for this by parking such threads in
> > userspace. Please check functions virt_(un)park_cpu_in_userspace().
> > But need to check if we can use this trick can be used at the very early
> stages of the VM initialization.
> 
> I had a look at this on x86, and we correctly don't try to KVM_RUN the vcpus
> early. What happens there is:
>  * the vcpu thread calls qemu_process_cpu_events()


I cannot find this function in the Qemu mainline of 28th September ?


>  * this causes it to go to sleep on the cpu->halt_cond: so
>    it will not end up doing KVM_RUN yet
>  * later, the main thread completes initialization of the board
>  * qdev_machine_creation_done() calls cpu_synchronize_all_post_init()
>  * for kvm, this causes us to call kvm_cpu_synchronize_post_init()
>    for each vcpu
>  * that will call run_on_cpu() which ends up calling qemu_cpu_kick()
>  * qemu_cpu_kick() does a broadcast on cpu->halt_cond, which
>    wakes up the vcpu thread and lets it go into kvm_cpu_exec()
>    for the first time
> 
> Why doesn't this mechanism work on Arm ?

It is a combination of things:

void qemu_wait_io_event(CPUState *cpu)
{
[...]
    while (cpu_thread_is_idle(cpu)) {
         [...]
        qemu_cond_wait(cpu->halt_cond, &bql);
    }
[...]
}

1. To block we should wait on 'halt_cond' as you rightly pointed.
2. but condition to wait is to check of the CPU is IDLE or not.
3. Various conditions in which CPU can be termed IDLE are:
     3.1  STOPPED
     3.2  HALTED
     3.3 It does not have any queued  work to process.


Because CPU never ran we can rule out 3.1 & 3.2. If CPU is in
'halted' condition i.e.  'CPUState::halted=1' then we can also assume
thread to be in IDLE state. On ARM, default value of this variable is 0.
This get only set in context to CPU common reset when the first time
all CPUs are reset in context to the system reset.

static void cpu_common_reset_hold(Object *obj, ResetType type)
{
[...]
    cpu->halted = cpu->start_powered_off;
     [...];
}


Maybe we can fix this default value?


Best regards
Salil.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 10:24 [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset Peter Maydell
  2025-10-14 10:41 ` Salil Mehta via
@ 2025-10-15 10:58 ` Salil Mehta via
  2025-10-15 12:06   ` Peter Maydell
  2025-10-16 12:17 ` Salil Mehta via
  2 siblings, 1 reply; 32+ messages in thread
From: Salil Mehta via @ 2025-10-15 10:58 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel@nongnu.org; +Cc: Salil Mehta, Marc Zyngier

Hi Peter,

> From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Peter
> Maydell
> Sent: Tuesday, October 14, 2025 11:25 AM
> To: qemu-devel@nongnu.org
> Cc: Salil Mehta <salil.mehta@opnsrc.net>; Marc Zyngier <maz@kernel.org>
> Subject: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from
> kernel in cpuif reset
> 
> Currently in arm_gicv3_icc_reset() we read the kernel's value of
> ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly works, but
> we're actually breaking an assumption the kernel makes that userspace only
> accesses the in-kernel GIC data when the VM is totally paused, which may
> not be the case if a single vCPU is being reset.  The effect is that it's possible
> that the read attempt returns EBUSY.
> 
> Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1 once in
> device realize. This brings ICC_CTLR_EL1 into line with the other cpuif
> registers, where we assume we know what the kernel is resetting them to
> and just update QEMU's data structures in arm_gicv3_icc_reset().
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> I've only tested this fairly lightly, but it seems to work.
> Salil, does this fix the EBUSY issues you were seeing ?
> 
>  include/hw/intc/arm_gicv3_common.h |  3 ++
>  hw/intc/arm_gicv3_kvm.c            | 49 +++++++++++++++++++++---------
>  2 files changed, 38 insertions(+), 14 deletions(-)
> 
[...]
> +
> +    /*
> +     * Now we can read the kernel's initial value of ICC_CTLR_EL1, which
> +     * we will need if a CPU interface is reset. If the kernel is ancient
> +     * and doesn't support writing the GIC state then we don't need to
> +     * care what reset does to QEMU's data structures.
> +     */
> +    if (!s->migration_blocker) {
> +        for (i = 0; i < s->num_cpu; i++) {
> +            GICv3CPUState *c = &s->cpu[i];
> +
> +            kvm_device_access(s->dev_fd,
> KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
> +                              KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
> +                              &c->kvm_reset_icc_ctlr_el1, false, &error_abort);
> +        }
> +    }
>  }

Above changes assume that the driver's configured value of the ICC_CTLR_EL1
system register is the same as the default value. I've verified that this
currently the case. However, it safe to assume that this will remain true
in the future as well?

If it is acceptable to always use the default values, then the existing change
proposed in RFC V6 only requires a minor modification. Specifically, we can
remove ' first_psci_on_request_seen' and other related code that updates
the driver-configured value after the guest kernel has loaded during runtime.

We can instead cache the value once during the first cpu_reset() for each
CPU interface and reuse that cached value for all subsequent cpu_reset()
operations.  This would simplify the implementation while retaining
major goals.

https://lore.kernel.org/qemu-devel/20251001010127.3092631-22-salil.mehta@opnsrc.net/

You humble Consideration:

Calling pause_vcpus_all() at VM initialization time--during  the first
cpu_reset()--should not cause any issues, as all secondary vCPUs are idle
(i.e. their PCs are not initialized and kernel has not yet loaded). Pausing
such vCPUs at this stage should not be a disruptive action.

Moreover, if this stage poses a problem for a KVM device IOCTL access, it
similarly affect other components like GICv3 ITS, which also uses KVM
device IOCTL during GICv3 reset hold. At this stage all the vCPUs should still
be in a pre-boot or idle state

static void kvm_arm_its_reset_hold(Object *obj, ResetType type)
{
[...]

    if (kvm_device_check_attr(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
                               KVM_DEV_ARM_ITS_CTRL_RESET)) {
        kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
                          KVM_DEV_ARM_ITS_CTRL_RESET, NULL, true, &error_abort);
        return;
    }
[...]

}

We are not even pausing vCPUs here, yet we expect above KVM device IOCTL
to succeed on the first attempt. The operation also ends up acquiring the
mutex for all vCPUs within the KVM. 

Many thanks!

Best regards
Salil.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-15 10:58 ` Salil Mehta via
@ 2025-10-15 12:06   ` Peter Maydell
  2025-10-16 11:13     ` Salil Mehta via
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Maydell @ 2025-10-15 12:06 UTC (permalink / raw)
  To: Salil Mehta; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

On Wed, 15 Oct 2025 at 11:58, Salil Mehta <salil.mehta@huawei.com> wrote:
>
> Hi Peter,
>
> > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Peter
> > Maydell
> > Sent: Tuesday, October 14, 2025 11:25 AM
> > To: qemu-devel@nongnu.org
> > Cc: Salil Mehta <salil.mehta@opnsrc.net>; Marc Zyngier <maz@kernel.org>
> > Subject: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from
> > kernel in cpuif reset
> >
> > Currently in arm_gicv3_icc_reset() we read the kernel's value of
> > ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly works, but
> > we're actually breaking an assumption the kernel makes that userspace only
> > accesses the in-kernel GIC data when the VM is totally paused, which may
> > not be the case if a single vCPU is being reset.  The effect is that it's possible
> > that the read attempt returns EBUSY.
> >
> > Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1 once in
> > device realize. This brings ICC_CTLR_EL1 into line with the other cpuif
> > registers, where we assume we know what the kernel is resetting them to
> > and just update QEMU's data structures in arm_gicv3_icc_reset().
> >
> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > ---
> > I've only tested this fairly lightly, but it seems to work.
> > Salil, does this fix the EBUSY issues you were seeing ?
> >
> >  include/hw/intc/arm_gicv3_common.h |  3 ++
> >  hw/intc/arm_gicv3_kvm.c            | 49 +++++++++++++++++++++---------
> >  2 files changed, 38 insertions(+), 14 deletions(-)
> >
> [...]
> > +
> > +    /*
> > +     * Now we can read the kernel's initial value of ICC_CTLR_EL1, which
> > +     * we will need if a CPU interface is reset. If the kernel is ancient
> > +     * and doesn't support writing the GIC state then we don't need to
> > +     * care what reset does to QEMU's data structures.
> > +     */
> > +    if (!s->migration_blocker) {
> > +        for (i = 0; i < s->num_cpu; i++) {
> > +            GICv3CPUState *c = &s->cpu[i];
> > +
> > +            kvm_device_access(s->dev_fd,
> > KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
> > +                              KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
> > +                              &c->kvm_reset_icc_ctlr_el1, false, &error_abort);
> > +        }
> > +    }
> >  }
>
>
> Above changes assume that the driver's configured value of the ICC_CTLR_EL1
> system register is the same as the default value. I've verified that this
> currently the case. However, it safe to assume that this will remain true
> in the future as well?

I don't understand what you mean here. We read the kernel's ICC_CTLR_EL1
at VM startup, when we know it will be the reset value, because we haven't
run any VCPUs yet.

> We can instead cache the value once during the first cpu_reset() for each
> CPU interface and reuse that cached value for all subsequent cpu_reset()
> operations.  This would simplify the implementation while retaining
> major goals.

No, we should read the value at realize time, as this patch does.
Trying to do it at cpu_reset runs into the same problem that we
have now that the kernel doesn't expect us to be reading the
register when another vcpu might be running.

> Calling pause_vcpus_all() at VM initialization time--during  the first
> cpu_reset()--should not cause any issues, as all secondary vCPUs are idle
> (i.e. their PCs are not initialized and kernel has not yet loaded). Pausing
> such vCPUs at this stage should not be a disruptive action.

We should not need to run pause_vcpus_all() at all here. CPU reset
should be fine done for one vcpu with the others running.

> Moreover, if this stage poses a problem for a KVM device IOCTL access, it
> similarly affect other components like GICv3 ITS, which also uses KVM
> device IOCTL during GICv3 reset hold. At this stage all the vCPUs should still
> be in a pre-boot or idle state

> static void kvm_arm_its_reset_hold(Object *obj, ResetType type)
> {
> [...]
>
>     if (kvm_device_check_attr(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
>                                KVM_DEV_ARM_ITS_CTRL_RESET)) {
>         kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
>                           KVM_DEV_ARM_ITS_CTRL_RESET, NULL, true, &error_abort);
>         return;
>     }
> [...]
>
> }
>
> We are not even pausing vCPUs here, yet we expect above KVM device IOCTL
> to succeed on the first attempt. The operation also ends up acquiring the
> mutex for all vCPUs within the KVM.

ITS reset should only happen (a) at start of the run, before any
vcpus run or (b) as part of a full system reset. We should check that
the vcpus are paused when we're doing system reset, but I would expect
that to be the case because it would be likely to be buggy if not.

-- PMM


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-15 12:06   ` Peter Maydell
@ 2025-10-16 11:13     ` Salil Mehta via
  2025-10-16 12:46       ` Peter Maydell
  0 siblings, 1 reply; 32+ messages in thread
From: Salil Mehta via @ 2025-10-16 11:13 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

Hi Peter,

> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Wednesday, October 15, 2025 1:06 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> On Wed, 15 Oct 2025 at 11:58, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > Hi Peter,
> >
> > > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org
> <qemu-
> > > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of
> Peter
> > > Maydell
> > > Sent: Tuesday, October 14, 2025 11:25 AM
> > > To: qemu-devel@nongnu.org
> > > Cc: Salil Mehta <salil.mehta@opnsrc.net>; Marc Zyngier
> > > <maz@kernel.org>
> > > Subject: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1
> > > from kernel in cpuif reset
> > >
> > > Currently in arm_gicv3_icc_reset() we read the kernel's value of
> > > ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly
> > > works, but we're actually breaking an assumption the kernel makes
> > > that userspace only accesses the in-kernel GIC data when the VM is
> > > totally paused, which may not be the case if a single vCPU is being
> > > reset.  The effect is that it's possible that the read attempt returns EBUSY.
> > >
> > > Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1
> > > once in device realize. This brings ICC_CTLR_EL1 into line with the
> > > other cpuif registers, where we assume we know what the kernel is
> > > resetting them to and just update QEMU's data structures in
> arm_gicv3_icc_reset().
> > >
> > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > > ---
> > > I've only tested this fairly lightly, but it seems to work.
> > > Salil, does this fix the EBUSY issues you were seeing ?
> > >
> > >  include/hw/intc/arm_gicv3_common.h |  3 ++
> > >  hw/intc/arm_gicv3_kvm.c            | 49 +++++++++++++++++++++---------
> > >  2 files changed, 38 insertions(+), 14 deletions(-)
> > >
> > [...]
> > > +
> > > +    /*
> > > +     * Now we can read the kernel's initial value of ICC_CTLR_EL1, which
> > > +     * we will need if a CPU interface is reset. If the kernel is ancient
> > > +     * and doesn't support writing the GIC state then we don't need to
> > > +     * care what reset does to QEMU's data structures.
> > > +     */
> > > +    if (!s->migration_blocker) {
> > > +        for (i = 0; i < s->num_cpu; i++) {
> > > +            GICv3CPUState *c = &s->cpu[i];
> > > +
> > > +            kvm_device_access(s->dev_fd,
> > > KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
> > > +                              KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
> > > +                              &c->kvm_reset_icc_ctlr_el1, false, &error_abort);
> > > +        }
> > > +    }
> > >  }
> >
> >
> > Above changes assume that the driver's configured value of the
> > ICC_CTLR_EL1 system register is the same as the default value. I've
> > verified that this currently the case. However, it safe to assume that
> > this will remain true in the future as well?
> 
> I don't understand what you mean here. We read the kernel's ICC_CTLR_EL1
> at VM startup, when we know it will be the reset value, because we haven't
> run any VCPUs yet.


System register fetches its value from ICH_VMCR_EL2 and ICH_VTR_EL2.
In specific, EOIMode, PMHE and CBPR fields of ICC_CTLR_EL1 are from
the VMCR register. Later gets configured when driver gets loaded and again
re-configured in context to each CPU ON request(via in-kernel  CPU Hotplug
state machine; CPUHP_AP_IRQ_GIC_STARTING). This configures the VMCR
again and again. Although, the value as of now is same. 

You might want to check gic_cpu_sys_reg_init() in irq-gic-v3.c

My question was, is it *future* safe to ignore these updates to VMCR fields
which can happen in context to guest kernel actions at point of time.

https://lore.kernel.org/all/20251001010127.3092631-22-salil.mehta@opnsrc.net/

The extra complexity which you saw in above proposed patch was to handle
that although it looked unnecessary as the value was always static.

The kernel patch proposed ensured that cache is always up to date (just in case)

https://lore.kernel.org/lkml/20251008201955.3919537-1-salil.mehta@opnsrc.net/


Internally, before posting RFC V6, I had discussed the implementation you've
proposed in this patch with Marc and Jonathan earlier but because of the issues
of hang I faced and above mentioned unaddressed issue of the driver update
I thought to push the version present in the RFC V6 to start with for your humble
consideration and let you decide 😊


I've tested this patch and I can confirm it is working along with RFC V6

https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.1/



I've also figured out the reason(s) why it did not work earlier i.e. below problems
and fixed them as well:
1. hang problem which I saw earlier
2. crash or no ENOTTY which I recently posted

I'll share the details of above problems in the context where the questions
were asked.


Many thanks!


> > We can instead cache the value once during the first cpu_reset() for
> > each CPU interface and reuse that cached value for all subsequent
> > cpu_reset() operations.  This would simplify the implementation while
> > retaining major goals.
> 
> No, we should read the value at realize time, as this patch does.
> Trying to do it at cpu_reset runs into the same problem that we have now
> that the kernel doesn't expect us to be reading the register when another
> vcpu might be running.


Agreed. we can. Problem was a miss in RFC V6. I fixed that.

Yes, the vCPU lock contention is an issue. Its better to avoid it as far as possible. 


> 
> > Calling pause_vcpus_all() at VM initialization time--during  the first
> > cpu_reset()--should not cause any issues, as all secondary vCPUs are
> > idle (i.e. their PCs are not initialized and kernel has not yet
> > loaded). Pausing such vCPUs at this stage should not be a disruptive action.
> 
> We should not need to run pause_vcpus_all() at all here. CPU reset should
> be fine done for one vcpu with the others running.


Yes, with this implementation working now, we can avoid it, so no issues.


> 
> > Moreover, if this stage poses a problem for a KVM device IOCTL access,
> > it similarly affect other components like GICv3 ITS, which also uses
> > KVM device IOCTL during GICv3 reset hold. At this stage all the vCPUs
> > should still be in a pre-boot or idle state
> 
> > static void kvm_arm_its_reset_hold(Object *obj, ResetType type) {
> > [...]
> >
> >     if (kvm_device_check_attr(s->dev_fd,
> KVM_DEV_ARM_VGIC_GRP_CTRL,
> >                                KVM_DEV_ARM_ITS_CTRL_RESET)) {
> >         kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
> >                           KVM_DEV_ARM_ITS_CTRL_RESET, NULL, true, &error_abort);
> >         return;
> >     }
> > [...]
> >
> > }
> >
> > We are not even pausing vCPUs here, yet we expect above KVM device
> > IOCTL to succeed on the first attempt. The operation also ends up
> > acquiring the mutex for all vCPUs within the KVM.
> 
> ITS reset should only happen (a) at start of the run, before any vcpus run or
> (b) as part of a full system reset. We should check that the vcpus are paused
> when we're doing system reset, but I would expect that to be the case
> because it would be likely to be buggy if not.


Agreed. I took bit of time to review what we stumbled upon in August and
September. The issue actually predated the "parking in user space for disabled
vCPU threads". It was first reported during testing, where enabling and disabling
a vCPU - followed by a reboot - led to a crash in GICv3 ITS because of the
lock contention due to the disabled vCPU thread sleeping inside the KVM.

As you rightly pointed, we might not see the same issue in (a) & (b).

Thanks for the clarification.

Best regards
Salil.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 13:31     ` Peter Maydell
  2025-10-14 13:41       ` Salil Mehta via
@ 2025-10-16 12:09       ` Salil Mehta via
  1 sibling, 0 replies; 32+ messages in thread
From: Salil Mehta via @ 2025-10-16 12:09 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Tuesday, October 14, 2025 2:31 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> 
> On Tue, 14 Oct 2025 at 14:23, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > Hi Peter,
> >
> > > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org
> <qemu-
> > > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Salil
> > > Mehta via
> > > Sent: Tuesday, October 14, 2025 11:41 AM
> > > To: Peter Maydell <peter.maydell@linaro.org>; qemu-
> devel@nongnu.org
> > >
> > > Hi Peter,
> > >
> > > > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org
> > > <qemu-
> > > > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of
> > > > devel-bounces+Peter
> > > > Maydell
> > > > Sent: Tuesday, October 14, 2025 11:25 AM
> > > > To: qemu-devel@nongnu.org
> > > >
> > > > Currently in arm_gicv3_icc_reset() we read the kernel's value of
> > > > ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly
> > > > works, but we're actually breaking an assumption the kernel makes
> > > > that userspace only accesses the in-kernel GIC data when the VM is
> > > > totally paused, which may not be the case if a single vCPU is being
> reset.
> > > > The effect is that it's possible that the read attempt returns EBUSY.
> > > >
> > > > Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1
> > > > once in device realize. This brings ICC_CTLR_EL1 into line with
> > > > the other cpuif registers, where we assume we know what the kernel
> > > > is resetting them to and just update QEMU's data structures in
> > > arm_gicv3_icc_reset().
> > > >
> > > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > > > ---
> > > > I've only tested this fairly lightly, but it seems to work.
> > > > Salil, does this fix the EBUSY issues you were seeing ?
> > >
> > >
> > > Let me try this and get back to you.  Also, just to let you know
> > > that -EBUSY can return from other places as well. Please check  my
> > > reply in the other mail- chain.
> >
> >
> > Got this.
> >
> > (gdb) handle SIGUSR1 nostop noprint pass
> > Signal        Stop      Print   Pass to program Description
> > SIGUSR1       No        No      Yes             User defined signal 1
> > (gdb) run
> > Starting program:
> > /opt/workspace/code/qemu/qemu/build/qemu-system-aarch64 --enable-
> kvm
> > -machine virt,gic-version=3 -cpu host -smp cpus=2,disabledcpus=2 -m
> > 300M -kernel /opt/workspace/code/linux/linux/arch/arm64/boot/Image
> > -initrd /opt/workspace/code/filesystem/rootfs.cpio.gz -append
> > console=ttyAMA0\ root=/dev/ram\ earlycon\ rdinit=/init\ maxcpus=1\
> > acpi=force -nographic -bios
> > /opt/workspace/code/uefi/edk2/Build/ArmVirtQemu-
> AARCH64/RELEASE_GCC5/F
> > V/QEMU_EFI.fd [Thread debugging using libthread_db enabled] Using host
> > libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> > [New Thread 0xfffff5b5eb40 (LWP 31994)] [New Thread 0xfffff4e88b40
> > (LWP 31996)] [New Thread 0xffffd4dfeb40 (LWP 31997)] Unexpected error
> > in kvm_device_access() at ../accel/kvm/kvm-all.c:3475:
> > qemu-system-aarch64: KVM_GET_DEVICE_ATTR failed: Group 6 attr
> > 0x000000000000c664: Inappropriate ioctl for device
> 
> Does it do this consistently, or only sometimes? What host kernel version are
> you running? And what QEMU commit (plus this patch)?
> 
> I'm guessing from that "disabledcpus=2" part that you're running some not-
> yet-upstream set of QEMU patches. Please drop those, and test only with
> this, to rule out the possibility of some bug/unexpected interaction with
> those.


Thanks for this suggestion. We can ignore the crash it happened because
of the wrong conflict resolution.

I also reviewed the problem of the hang which I briefly touched upon earlier.
After revisiting the previous debugging branches and fixes I realized:
1. Hang happened while executing CPU_ON from guest kernel and not while
    accessing the system register during GIC realization phase.
2. It only happened if the {pause,resume}_all_vcpus() was not being used
3. Point 2. happened when I actually implemented the Marc's suggestion to
    cache during the GIC realize. 
4. But the hang only happened when the recently enabled vCPUs were brought
     online for the first time.


But why it happened?
1. when a CPU is realized its CPUState::stopped=true, set by qemu_init_vcpu()
2. This state is reset for all the enabled vCPUs in context to vm_start() when
     resume_all_vcpus() is called.
3. For the disabled vCPUs, we must do the same when they are enabled.
4. Calling cpu_resume() when 'disabled' vCPUs are administratively enabled
     resets the CPUState::stopped to false and kicks the vCPUs.
5. Hang was due to the CPU thread not getting kicked from the IO wait state.
6. This problem went unnoticed earlier because before calling KVM device
    IOCTL we first paused all the vCPUs and then resumed the vCPUs in
    context to the cpu_reset() by calling {pause,resume}_vcpu_all(). This resume
   action resets the CPUState::stopped to false and kicks the thread as well.


I've tested above and it works. Above hang previously observed is fixed.

https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.1/


Many thanks!

Best regards
Salil.











^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 10:24 [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset Peter Maydell
  2025-10-14 10:41 ` Salil Mehta via
  2025-10-15 10:58 ` Salil Mehta via
@ 2025-10-16 12:17 ` Salil Mehta via
  2025-10-16 12:22   ` Peter Maydell
  2 siblings, 1 reply; 32+ messages in thread
From: Salil Mehta via @ 2025-10-16 12:17 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel@nongnu.org; +Cc: Salil Mehta, Marc Zyngier

Hi Peter,

> From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Peter
> Maydell
> Sent: Tuesday, October 14, 2025 11:25 AM
> To: qemu-devel@nongnu.org
> Cc: Salil Mehta <salil.mehta@opnsrc.net>; Marc Zyngier <maz@kernel.org>
> Subject: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from
> kernel in cpuif reset
> 
> Currently in arm_gicv3_icc_reset() we read the kernel's value of
> ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly works, but
> we're actually breaking an assumption the kernel makes that userspace only
> accesses the in-kernel GIC data when the VM is totally paused, which may
> not be the case if a single vCPU is being reset.  The effect is that it's possible
> that the read attempt returns EBUSY.
> 
> Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1 once in
> device realize. This brings ICC_CTLR_EL1 into line with the other cpuif
> registers, where we assume we know what the kernel is resetting them to
> and just update QEMU's data structures in arm_gicv3_icc_reset().
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> I've only tested this fairly lightly, but it seems to work.
> Salil, does this fix the EBUSY issues you were seeing ?


Would you be absorbing this in your tree now or should I make it part
of the RFC V7 ?

Reviewed-by: Salil Mehta <salil.mehta@huawei.com>
Tested-by: Salil Mehta <salil.mehta@huawei.com>


Many thanks!

Best regards
Salil.


> 
>  include/hw/intc/arm_gicv3_common.h |  3 ++
>  hw/intc/arm_gicv3_kvm.c            | 49 +++++++++++++++++++++---------
>  2 files changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/include/hw/intc/arm_gicv3_common.h
> b/include/hw/intc/arm_gicv3_common.h
> index 38aa1961c50..61d51915e07 100644
> --- a/include/hw/intc/arm_gicv3_common.h
> +++ b/include/hw/intc/arm_gicv3_common.h
> @@ -166,6 +166,9 @@ struct GICv3CPUState {
>      uint64_t icc_igrpen[3];
>      uint64_t icc_ctlr_el3;
> 
> +    /* For KVM, cached copy of the kernel reset value of ICC_CTLR_EL1 */
> +    uint64_t kvm_reset_icc_ctlr_el1;
> +
>      /* Virtualization control interface */
>      uint64_t ich_apr[3][4]; /* ich_apr[GICV3_G1][x] never used */
>      uint64_t ich_hcr_el2;
> diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c index
> 9829e2146da..b95e6ea057a 100644
> --- a/hw/intc/arm_gicv3_kvm.c
> +++ b/hw/intc/arm_gicv3_kvm.c
> @@ -666,11 +666,24 @@ static void kvm_arm_gicv3_get(GICv3State *s)
> 
>  static void arm_gicv3_icc_reset(CPUARMState *env, const ARMCPRegInfo
> *ri)  {
> -    GICv3State *s;
> -    GICv3CPUState *c;
> +    GICv3CPUState *c = (GICv3CPUState *)env->gicv3state;
> 
> -    c = (GICv3CPUState *)env->gicv3state;
> -    s = c->gic;
> +    /*
> +     * This function is called when each vcpu resets. The kernel
> +     * API for the GIC assumes that it is only to be used when the
> +     * whole VM is paused, so if we attempt to read the kernel's
> +     * reset values here we might get EBUSY failures.
> +     * So instead we assume we know what the kernel's reset values
> +     * are (mostly zeroes) and only update the QEMU state struct
> +     * fields. The exception is that we do need to know the kernel's
> +     * idea of the ICC_CTLR_EL1 reset value, so we cache that at
> +     * device realize time.
> +     *
> +     * This makes these sysregs different from the usual CPU ones,
> +     * which can be validly read and written when only the single
> +     * vcpu they apply to is paused, and where (in target/arm code)
> +     * we read the reset values out of the kernel on every reset.
> +     */
> 
>      c->icc_pmr_el1 = 0;
>      /*
> @@ -691,16 +704,8 @@ static void arm_gicv3_icc_reset(CPUARMState *env,
> const ARMCPRegInfo *ri)
>      memset(c->icc_apr, 0, sizeof(c->icc_apr));
>      memset(c->icc_igrpen, 0, sizeof(c->icc_igrpen));
> 
> -    if (s->migration_blocker) {
> -        return;
> -    }
> -
> -    /* Initialize to actual HW supported configuration */
> -    kvm_device_access(s->dev_fd,
> KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
> -                      KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
> -                      &c->icc_ctlr_el1[GICV3_NS], false, &error_abort);
> -
> -    c->icc_ctlr_el1[GICV3_S] = c->icc_ctlr_el1[GICV3_NS];
> +    c->icc_ctlr_el1[GICV3_NS] = c->kvm_reset_icc_ctlr_el1;
> +    c->icc_ctlr_el1[GICV3_S] = c->kvm_reset_icc_ctlr_el1;
>  }
> 
>  static void kvm_arm_gicv3_reset_hold(Object *obj, ResetType type) @@ -
> 939,6 +944,22 @@ static void kvm_arm_gicv3_realize(DeviceState *dev,
> Error **errp)
>                                      kvm_arm_gicv3_notifier,
>                                      MIG_MODE_CPR_TRANSFER);
>      }
> +
> +    /*
> +     * Now we can read the kernel's initial value of ICC_CTLR_EL1, which
> +     * we will need if a CPU interface is reset. If the kernel is ancient
> +     * and doesn't support writing the GIC state then we don't need to
> +     * care what reset does to QEMU's data structures.
> +     */
> +    if (!s->migration_blocker) {
> +        for (i = 0; i < s->num_cpu; i++) {
> +            GICv3CPUState *c = &s->cpu[i];
> +
> +            kvm_device_access(s->dev_fd,
> KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
> +                              KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
> +                              &c->kvm_reset_icc_ctlr_el1, false, &error_abort);
> +        }
> +    }
>  }
> 
>  static void kvm_arm_gicv3_class_init(ObjectClass *klass, const void *data)
> --
> 2.43.0
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-16 12:17 ` Salil Mehta via
@ 2025-10-16 12:22   ` Peter Maydell
  2025-10-16 12:36     ` Salil Mehta
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Maydell @ 2025-10-16 12:22 UTC (permalink / raw)
  To: Salil Mehta; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

On Thu, 16 Oct 2025 at 13:17, Salil Mehta <salil.mehta@huawei.com> wrote:
>
> Hi Peter,
>
> > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Peter
> > Maydell
> > Sent: Tuesday, October 14, 2025 11:25 AM
> > To: qemu-devel@nongnu.org
> > Cc: Salil Mehta <salil.mehta@opnsrc.net>; Marc Zyngier <maz@kernel.org>
> > Subject: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from
> > kernel in cpuif reset
> >
> > Currently in arm_gicv3_icc_reset() we read the kernel's value of
> > ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly works, but
> > we're actually breaking an assumption the kernel makes that userspace only
> > accesses the in-kernel GIC data when the VM is totally paused, which may
> > not be the case if a single vCPU is being reset.  The effect is that it's possible
> > that the read attempt returns EBUSY.
> >
> > Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1 once in
> > device realize. This brings ICC_CTLR_EL1 into line with the other cpuif
> > registers, where we assume we know what the kernel is resetting them to
> > and just update QEMU's data structures in arm_gicv3_icc_reset().
> >
> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > ---
> > I've only tested this fairly lightly, but it seems to work.
> > Salil, does this fix the EBUSY issues you were seeing ?
>
>
> Would you be absorbing this in your tree now or should I make it part
> of the RFC V7 ?
>
> Reviewed-by: Salil Mehta <salil.mehta@huawei.com>
> Tested-by: Salil Mehta <salil.mehta@huawei.com>

Thanks for the testing. I'll pull it into target-arm.next since
it does fix a potential issue with the current codebase.

-- PMM


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-16 12:22   ` Peter Maydell
@ 2025-10-16 12:36     ` Salil Mehta
  0 siblings, 0 replies; 32+ messages in thread
From: Salil Mehta @ 2025-10-16 12:36 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Salil Mehta, qemu-devel@nongnu.org, Marc Zyngier

On Thu, Oct 16, 2025 at 12:23 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Thu, 16 Oct 2025 at 13:17, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > Hi Peter,
> >
> > > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> > > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Peter
> > > Maydell
> > > Sent: Tuesday, October 14, 2025 11:25 AM
> > > To: qemu-devel@nongnu.org
> > > Cc: Salil Mehta <salil.mehta@opnsrc.net>; Marc Zyngier <maz@kernel.org>
> > > Subject: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from
> > > kernel in cpuif reset
> > >
> > > Currently in arm_gicv3_icc_reset() we read the kernel's value of
> > > ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly works, but
> > > we're actually breaking an assumption the kernel makes that userspace only
> > > accesses the in-kernel GIC data when the VM is totally paused, which may
> > > not be the case if a single vCPU is being reset.  The effect is that it's possible
> > > that the read attempt returns EBUSY.
> > >
> > > Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1 once in
> > > device realize. This brings ICC_CTLR_EL1 into line with the other cpuif
> > > registers, where we assume we know what the kernel is resetting them to
> > > and just update QEMU's data structures in arm_gicv3_icc_reset().
> > >
> > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > > ---
> > > I've only tested this fairly lightly, but it seems to work.
> > > Salil, does this fix the EBUSY issues you were seeing ?
> >
> >
> > Would you be absorbing this in your tree now or should I make it part
> > of the RFC V7 ?
> >
> > Reviewed-by: Salil Mehta <salil.mehta@huawei.com>
> > Tested-by: Salil Mehta <salil.mehta@huawei.com>
>
> Thanks for the testing. I'll pull it into target-arm.next since
> it does fix a potential issue with the current codebase.


Thanks. I hope you've noticed my earlier reply. There is a potential
*future* breakdown condition with this patch.

https://lore.kernel.org/qemu-devel/e2b03da8f7514b57aef7d236be1dcb90@huawei.com/

I assume you've accepted that as a no-risk?


Best regards
Salil.


>
> -- PMM


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-16 11:13     ` Salil Mehta via
@ 2025-10-16 12:46       ` Peter Maydell
  2025-10-16 15:28         ` Salil Mehta
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Maydell @ 2025-10-16 12:46 UTC (permalink / raw)
  To: Salil Mehta; +Cc: qemu-devel@nongnu.org, Salil Mehta, Marc Zyngier

On Thu, 16 Oct 2025 at 12:13, Salil Mehta <salil.mehta@huawei.com> wrote:
>
> Hi Peter,
> > > Above changes assume that the driver's configured value of the
> > > ICC_CTLR_EL1 system register is the same as the default value. I've
> > > verified that this currently the case. However, it safe to assume that
> > > this will remain true in the future as well?
> >
> > I don't understand what you mean here. We read the kernel's ICC_CTLR_EL1
> > at VM startup, when we know it will be the reset value, because we haven't
> > run any VCPUs yet.
>
>
> System register fetches its value from ICH_VMCR_EL2 and ICH_VTR_EL2.
> In specific, EOIMode, PMHE and CBPR fields of ICC_CTLR_EL1 are from
> the VMCR register. Later gets configured when driver gets loaded and again
> re-configured in context to each CPU ON request(via in-kernel  CPU Hotplug
> state machine; CPUHP_AP_IRQ_GIC_STARTING). This configures the VMCR
> again and again. Although, the value as of now is same.
>
> You might want to check gic_cpu_sys_reg_init() in irq-gic-v3.c

I'm afraid I still don't understand what you mean here. This is
all the guest writing to the GIC registers as it starts up, right?
That has no influence at all on what the reset value of the emulated
hardware should be. (This is the same as on real hardware:
it doesn't matter what the OS writes to registers when it is
running; when the hardware resets it resets to the reset value.)

We want to know what value the kernel gives to this register when
the GIC is in a freshly reset state when no guest code has run.
That should be the value that it has here, when we read it on realize.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-16 12:46       ` Peter Maydell
@ 2025-10-16 15:28         ` Salil Mehta
  2025-10-16 15:46           ` Peter Maydell
  0 siblings, 1 reply; 32+ messages in thread
From: Salil Mehta @ 2025-10-16 15:28 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Salil Mehta, qemu-devel@nongnu.org, Marc Zyngier

Hi Peter,

On Thu, Oct 16, 2025 at 12:46 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Thu, 16 Oct 2025 at 12:13, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > Hi Peter,
> > > > Above changes assume that the driver's configured value of the
> > > > ICC_CTLR_EL1 system register is the same as the default value. I've
> > > > verified that this currently the case. However, it safe to assume that
> > > > this will remain true in the future as well?
> > >
> > > I don't understand what you mean here. We read the kernel's ICC_CTLR_EL1
> > > at VM startup, when we know it will be the reset value, because we haven't
> > > run any VCPUs yet.
> >
> >
> > System register fetches its value from ICH_VMCR_EL2 and ICH_VTR_EL2.
> > In specific, EOIMode, PMHE and CBPR fields of ICC_CTLR_EL1 are from
> > the VMCR register. Later gets configured when driver gets loaded and again
> > re-configured in context to each CPU ON request(via in-kernel  CPU Hotplug
> > state machine; CPUHP_AP_IRQ_GIC_STARTING). This configures the VMCR
> > again and again. Although, the value as of now is same.
> >
> > You might want to check gic_cpu_sys_reg_init() in irq-gic-v3.c
>
> I'm afraid I still don't understand what you mean here. This is
> all the guest writing to the GIC registers as it starts up, right?
> That has no influence at all on what the reset value of the emulated
> hardware should be. (This is the same as on real hardware:
> it doesn't matter what the OS writes to registers when it is
> running; when the hardware resets it resets to the reset value.)

For context, the gic_cpu_init() function is invoked from two paths in
the kernel: first from gic_init_bases() when the GICv3 driver is
initially loaded on the boot CPU, and later from gic_starting_cpu()
during each CPU online transition in the hotplug state machine.

The hotplug path wires up

CPUHP_AP_IRQ_GIC_STARTING -> gic_starting_cpu

in gic_smp_init(). On every CPU online event this leads to:

gic_starting_cpu() -> gic_cpu_init() -> gic_cpu_sys_reg_init()

which reprograms the CPU-interface system registers on that CPU,
including ICC_CTLR_EL1 (fields EOIMode, PMHE, CBPR).

The following dump stack from a guest hotplug event shows this
sequence clearly:

echo 1 > /sys/devices/system/cpu/cpu1/online
[   39.287402] gic_cpu_sys_reg_init+0x4c/0x294
[   39.287406] gic_cpu_init.part.0+0xc0/0x114
[   39.287409] gic_starting_cpu+0x24/0x8c
[   39.287412] cpuhp_invoke_callback+0x104/0x20c
[   39.287419] notify_cpu_starting+0x80/0xac
[   39.287421] secondary_start_kernel+0xdc/0x15c

As a result, ICC_CTLR_EL1 is at its architectural reset value at
VM realize (before any vCPU runs), but it is guest-configured
after the driver runs and again on each later CPU online event.

Although it looks safe in practice from the way EOIMode, CBPR,
and PMHE fields are currently programmed by the guest, the
eventual value always depends on what the guest writes into
VMCR each time it configures the CPU interface during a
CPU_ON sequence. Given that, would it still be correct for
QEMU to treat the realize-time ICC_CTLR_EL1 value as a
persistent runtime value rather than just a reset template?

Since cpu_reset() will also be invoked for every CPU_ON in QEMU,
is there even a remote possibility that a guest GIC driver (Linux or
otherwise) might alter these field configurations across successive
CPU_ON/OFF cycles?

>
> We want to know what value the kernel gives to this register when
> the GIC is in a freshly reset state when no guest code has run.
> That should be the value that it has here, when we read it on realize.

But when PSCI CPU_ON is being handled entirely in the kernel
i.e. no hotplug patches are present and hence no SMCC Hypercall
exit exists then we will always use driver updated VMCR fields (I think).
In the current case even along with the hotplug patches they are the
same but in *future* will they be?

I do understand what you and Marc discussed and pointed out that,
"The reset value is indeed cast in stone once the GIC has been created."
https://lore.kernel.org/lkml/864is2x6z9.wl-maz@kernel.org/

Best regards
Salil.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-16 15:28         ` Salil Mehta
@ 2025-10-16 15:46           ` Peter Maydell
  2025-10-16 15:48             ` Salil Mehta via
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Maydell @ 2025-10-16 15:46 UTC (permalink / raw)
  To: Salil Mehta; +Cc: Salil Mehta, qemu-devel@nongnu.org, Marc Zyngier

On Thu, 16 Oct 2025 at 16:28, Salil Mehta <salil.mehta@opnsrc.net> wrote:
>
> Hi Peter,
>
> On Thu, Oct 16, 2025 at 12:46 PM Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > On Thu, 16 Oct 2025 at 12:13, Salil Mehta <salil.mehta@huawei.com> wrote:
> > >
> > > Hi Peter,
> > > > > Above changes assume that the driver's configured value of the
> > > > > ICC_CTLR_EL1 system register is the same as the default value. I've
> > > > > verified that this currently the case. However, it safe to assume that
> > > > > this will remain true in the future as well?
> > > >
> > > > I don't understand what you mean here. We read the kernel's ICC_CTLR_EL1
> > > > at VM startup, when we know it will be the reset value, because we haven't
> > > > run any VCPUs yet.
> > >
> > >
> > > System register fetches its value from ICH_VMCR_EL2 and ICH_VTR_EL2.
> > > In specific, EOIMode, PMHE and CBPR fields of ICC_CTLR_EL1 are from
> > > the VMCR register. Later gets configured when driver gets loaded and again
> > > re-configured in context to each CPU ON request(via in-kernel  CPU Hotplug
> > > state machine; CPUHP_AP_IRQ_GIC_STARTING). This configures the VMCR
> > > again and again. Although, the value as of now is same.
> > >
> > > You might want to check gic_cpu_sys_reg_init() in irq-gic-v3.c
> >
> > I'm afraid I still don't understand what you mean here. This is
> > all the guest writing to the GIC registers as it starts up, right?
> > That has no influence at all on what the reset value of the emulated
> > hardware should be. (This is the same as on real hardware:
> > it doesn't matter what the OS writes to registers when it is
> > running; when the hardware resets it resets to the reset value.)
>
> For context, the gic_cpu_init() function is invoked from two paths in
> the kernel: first from gic_init_bases() when the GICv3 driver is
> initially loaded on the boot CPU, and later from gic_starting_cpu()
> during each CPU online transition in the hotplug state machine.
>
> The hotplug path wires up
>
> CPUHP_AP_IRQ_GIC_STARTING -> gic_starting_cpu
>
> in gic_smp_init(). On every CPU online event this leads to:
>
> gic_starting_cpu() -> gic_cpu_init() -> gic_cpu_sys_reg_init()
>
> which reprograms the CPU-interface system registers on that CPU,
> including ICC_CTLR_EL1 (fields EOIMode, PMHE, CBPR).
>
> The following dump stack from a guest hotplug event shows this
> sequence clearly:
>
> echo 1 > /sys/devices/system/cpu/cpu1/online
> [   39.287402] gic_cpu_sys_reg_init+0x4c/0x294
> [   39.287406] gic_cpu_init.part.0+0xc0/0x114
> [   39.287409] gic_starting_cpu+0x24/0x8c
> [   39.287412] cpuhp_invoke_callback+0x104/0x20c
> [   39.287419] notify_cpu_starting+0x80/0xac
> [   39.287421] secondary_start_kernel+0xdc/0x15c
>
>
> As a result, ICC_CTLR_EL1 is at its architectural reset value at
> VM realize (before any vCPU runs), but it is guest-configured
> after the driver runs and again on each later CPU online event.

So? We do not care what the guest does with the register.
The reset value is the value it has *before* the guest touches it.

-- PMM


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-16 15:46           ` Peter Maydell
@ 2025-10-16 15:48             ` Salil Mehta via
  0 siblings, 0 replies; 32+ messages in thread
From: Salil Mehta via @ 2025-10-16 15:48 UTC (permalink / raw)
  To: Peter Maydell, Salil Mehta; +Cc: qemu-devel@nongnu.org, Marc Zyngier

> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Thursday, October 16, 2025 4:46 PM
> To: Salil Mehta <salil.mehta@opnsrc.net>
> 
> On Thu, 16 Oct 2025 at 16:28, Salil Mehta <salil.mehta@opnsrc.net> wrote:
> >
> > Hi Peter,
> >
> > On Thu, Oct 16, 2025 at 12:46 PM Peter Maydell
> <peter.maydell@linaro.org> wrote:
> > >
> > > On Thu, 16 Oct 2025 at 12:13, Salil Mehta <salil.mehta@huawei.com>
> wrote:
> > > >
> > > > Hi Peter,
> > > > > > Above changes assume that the driver's configured value of the
> > > > > > ICC_CTLR_EL1 system register is the same as the default value.
> > > > > > I've verified that this currently the case. However, it safe
> > > > > > to assume that this will remain true in the future as well?
> > > > >
> > > > > I don't understand what you mean here. We read the kernel's
> > > > > ICC_CTLR_EL1 at VM startup, when we know it will be the reset
> > > > > value, because we haven't run any VCPUs yet.
> > > >
> > > >
> > > > System register fetches its value from ICH_VMCR_EL2 and
> ICH_VTR_EL2.
> > > > In specific, EOIMode, PMHE and CBPR fields of ICC_CTLR_EL1 are
> > > > from the VMCR register. Later gets configured when driver gets
> > > > loaded and again re-configured in context to each CPU ON
> > > > request(via in-kernel  CPU Hotplug state machine;
> > > > CPUHP_AP_IRQ_GIC_STARTING). This configures the VMCR again and
> again. Although, the value as of now is same.
> > > >
> > > > You might want to check gic_cpu_sys_reg_init() in irq-gic-v3.c
> > >
> > > I'm afraid I still don't understand what you mean here. This is all
> > > the guest writing to the GIC registers as it starts up, right?
> > > That has no influence at all on what the reset value of the emulated
> > > hardware should be. (This is the same as on real hardware:
> > > it doesn't matter what the OS writes to registers when it is
> > > running; when the hardware resets it resets to the reset value.)
> >
> > For context, the gic_cpu_init() function is invoked from two paths in
> > the kernel: first from gic_init_bases() when the GICv3 driver is
> > initially loaded on the boot CPU, and later from gic_starting_cpu()
> > during each CPU online transition in the hotplug state machine.
> >
> > The hotplug path wires up
> >
> > CPUHP_AP_IRQ_GIC_STARTING -> gic_starting_cpu
> >
> > in gic_smp_init(). On every CPU online event this leads to:
> >
> > gic_starting_cpu() -> gic_cpu_init() -> gic_cpu_sys_reg_init()
> >
> > which reprograms the CPU-interface system registers on that CPU,
> > including ICC_CTLR_EL1 (fields EOIMode, PMHE, CBPR).
> >
> > The following dump stack from a guest hotplug event shows this
> > sequence clearly:
> >
> > echo 1 > /sys/devices/system/cpu/cpu1/online
> > [   39.287402] gic_cpu_sys_reg_init+0x4c/0x294
> > [   39.287406] gic_cpu_init.part.0+0xc0/0x114
> > [   39.287409] gic_starting_cpu+0x24/0x8c
> > [   39.287412] cpuhp_invoke_callback+0x104/0x20c
> > [   39.287419] notify_cpu_starting+0x80/0xac
> > [   39.287421] secondary_start_kernel+0xdc/0x15c
> >
> >
> > As a result, ICC_CTLR_EL1 is at its architectural reset value at VM
> > realize (before any vCPU runs), but it is guest-configured after the
> > driver runs and again on each later CPU online event.
> 
> So? We do not care what the guest does with the register.
> The reset value is the value it has *before* the guest touches it.

Sure. got it. thanks for the clarification.


Best regards
Salil.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
  2025-10-14 19:36                           ` Salil Mehta via
@ 2025-10-17  1:43                             ` Salil Mehta
  0 siblings, 0 replies; 32+ messages in thread
From: Salil Mehta @ 2025-10-17  1:43 UTC (permalink / raw)
  To: Salil Mehta; +Cc: Peter Maydell, qemu-devel@nongnu.org, Marc Zyngier

Hi Peter,

On Tue, Oct 14, 2025 at 7:36 PM Salil Mehta <salil.mehta@huawei.com> wrote:
>
> Hi Peter,
>
> > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Peter
> > Maydell
> > Sent: Tuesday, October 14, 2025 4:44 PM
> > To: Salil Mehta <salil.mehta@huawei.com>
> >
> > On Tue, 14 Oct 2025 at 16:33, Salil Mehta <salil.mehta@huawei.com> wrote:
> > >
> > > > From: Peter Maydell <peter.maydell@linaro.org>
> > > > Sent: Tuesday, October 14, 2025 4:24 PM
> > > > To: Salil Mehta <salil.mehta@huawei.com>
> > > >
> > > > On Tue, 14 Oct 2025 at 16:13, Salil Mehta <salil.mehta@huawei.com>
> > wrote:
> > > > >
> > > > > > From: Peter Maydell <peter.maydell@linaro.org> In what situation
> > > > > > do we ever start running a VCPU before the *GIC* has been
> > > > > > realized? The GIC should get realized as part of creating the
> > > > > > virt board, which must complete before we do anything like running a
> > vcpu.
> > > > >
> > > > >
> > > > > Just after realization of vCPU in the machvirt_init() you can see
> > > > > the default power_state is PSCI CPU_ON, which means
> > > > KVM_MP_STATE_RUNNABLE.
> > > > > Since, the thread is up and not doing IO wait in userspace it gets
> > > > > into
> > > > > cpu_exec() loop and actually run KVM_RUN IOCTL. Inside the KVM it
> > > > > momentarily takes the vCPU mutex but later exit and releases. This
> > > > > keeps going on for all of the vCPU threads realized early.
> > > >
> > > > Yikes. We definitely should fix that : letting the vcpu run before
> > > > we get to
> > > > qemu_machine_creation_done() seems like it would be a massive source
> > > > of race conditions.
> > >
> > > I've already proposed fix for this by parking such threads in
> > > userspace. Please check functions virt_(un)park_cpu_in_userspace().
> > > But need to check if we can use this trick can be used at the very early
> > stages of the VM initialization.
> >
> > I had a look at this on x86, and we correctly don't try to KVM_RUN the vcpus
> > early. What happens there is:
> >  * the vcpu thread calls qemu_process_cpu_events()
>
>
> I cannot find this function in the Qemu mainline of 28th September ?
>
>
> >  * this causes it to go to sleep on the cpu->halt_cond: so
> >    it will not end up doing KVM_RUN yet
> >  * later, the main thread completes initialization of the board
> >  * qdev_machine_creation_done() calls cpu_synchronize_all_post_init()
> >  * for kvm, this causes us to call kvm_cpu_synchronize_post_init()
> >    for each vcpu
> >  * that will call run_on_cpu() which ends up calling qemu_cpu_kick()
> >  * qemu_cpu_kick() does a broadcast on cpu->halt_cond, which
> >    wakes up the vcpu thread and lets it go into kvm_cpu_exec()
> >    for the first time
> >
> > Why doesn't this mechanism work on Arm ?
>
> It is a combination of things:
>
> void qemu_wait_io_event(CPUState *cpu)
> {
> [...]
>     while (cpu_thread_is_idle(cpu)) {
>          [...]
>         qemu_cond_wait(cpu->halt_cond, &bql);
>     }
> [...]
> }
>
> 1. To block we should wait on 'halt_cond' as you rightly pointed.
> 2. but condition to wait is to check of the CPU is IDLE or not.
> 3. Various conditions in which CPU can be termed IDLE are:
>      3.1  STOPPED
>      3.2  HALTED
>      3.3 It does not have any queued  work to process.
>
>

I was partly wrong in my earlier analysis—sorry about that. The real flow is:

qdev_realize() -> qemu_vcpu_init() sets CPUState::stopped = true.
This ensures the threads created during machvirt_init() block in I/O
wait. That part works as expected.

qemu_system_reset() runs via qemu_machine_creation_done(). This performs
the CPU reset, and cpu_common_reset_hold() initializes
CPUState::halted from the start-powered-off property.

resume_all_vcpus() is called from vm_start(). It sets
CPUState::stopped = false and kicks each vCPU. However,
CPUState::halted remains whatever start-powered-off configured
(typically 1 for secondaries).

Thus, for each secondary vCPU we have: halted = 1, stopped = 0.

These conditions should be sufficient for the vCPU thread to be treated as
idle and to sleep (I/O wait) on halt_cond.

However, inside cpu_thread_is_idle(), the halted = 1 condition is
effectively bypassed because kvm_halt_in_kernel_allowed defaults to
true. As a result, the thread does not take the userspace sleep path
even though the vCPU is halted.

I traced the code and the earlier patch as follows:

bool cpu_thread_is_idle(CPUState *cpu)
{
    if (cpu->stop || !cpu_work_list_empty(cpu)) {
        return false;
    }
    if (cpu_is_stopped(cpu)) {
        return true;
    }
    if (!cpu->halted || cpu_has_work(cpu)) {
        return false;
    }

     // at this point we expect CPUState::halted = 1

    if (cpus_accel->cpu_thread_is_idle) {
        /* flows goes into this for ARM and returns false */
        return cpus_accel->cpu_thread_is_idle(cpu);
    }
    return true;
}

where cpu_thread_is_idle = kvm_vcpu_thread_is_idle()

static bool kvm_vcpu_thread_is_idle(CPUState *cpu)
{
    return !kvm_halt_in_kernel();
}


File: kvm.h
/**
 * kvm_halt_in_kernel
 *
 * Returns: true if halted cpus should still get a KVM_RUN ioctl to run
 * inside of kernel space. This only works if MP state is implemented.
 */
#define kvm_halt_in_kernel() (kvm_halt_in_kernel_allowed)


File: target/arm/kvm.c
int kvm_arch_init(MachineState *ms, KVMState *s)
{
[...]

    /*
     * PSCI wakes up secondary cores, so we always need to
     * have vCPUs waiting in kernel space
     */
    kvm_halt_in_kernel_allowed = true;
[...]
}

Patch: ARM: KVM: Enable in-kernel timers with user space gic
https://lore.kernel.org/qemu-devel/1499768952-24990-4-git-send-email-peter.maydell@linaro.org/

Net effect: even when halted = 1 and there is no work, KVM/ARM forces
cpu_thread_is_idle() to return false so the thread enters KVM_RUN. This
causes vCPU lock contention during the VM Initialization.

It looks to be a 2017 patch, maybe you can help throw more light in this?


Many thanks!

Best regards
Salil.


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2025-10-17  1:44 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-14 10:24 [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset Peter Maydell
2025-10-14 10:41 ` Salil Mehta via
2025-10-14 13:23   ` Salil Mehta via
2025-10-14 13:31     ` Peter Maydell
2025-10-14 13:41       ` Salil Mehta via
2025-10-14 13:49         ` Peter Maydell
2025-10-14 14:22           ` Salil Mehta via
2025-10-14 14:28             ` Peter Maydell
2025-10-14 14:48               ` Salil Mehta via
2025-10-14 14:59                 ` Peter Maydell
2025-10-14 15:13                   ` Salil Mehta via
2025-10-14 15:16                     ` Salil Mehta via
2025-10-14 15:23                     ` Peter Maydell
2025-10-14 15:32                       ` Salil Mehta via
2025-10-14 15:43                         ` Peter Maydell
2025-10-14 15:54                           ` Salil Mehta via
2025-10-14 19:36                           ` Salil Mehta via
2025-10-17  1:43                             ` Salil Mehta
2025-10-14 16:07                         ` Salil Mehta via
2025-10-14 16:12                           ` Peter Maydell
2025-10-14 15:39                       ` Salil Mehta via
2025-10-16 12:09       ` Salil Mehta via
2025-10-15 10:58 ` Salil Mehta via
2025-10-15 12:06   ` Peter Maydell
2025-10-16 11:13     ` Salil Mehta via
2025-10-16 12:46       ` Peter Maydell
2025-10-16 15:28         ` Salil Mehta
2025-10-16 15:46           ` Peter Maydell
2025-10-16 15:48             ` Salil Mehta via
2025-10-16 12:17 ` Salil Mehta via
2025-10-16 12:22   ` Peter Maydell
2025-10-16 12:36     ` Salil Mehta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).