qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Salil Mehta via <qemu-devel@nongnu.org>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Salil Mehta <salil.mehta@opnsrc.net>,
	Marc Zyngier <maz@kernel.org>
Subject: RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
Date: Thu, 16 Oct 2025 11:13:38 +0000	[thread overview]
Message-ID: <e2b03da8f7514b57aef7d236be1dcb90@huawei.com> (raw)
In-Reply-To: <CAFEAcA8U_3zJsnHt2wxME3j8J83whzTNgP9vEvUGuWHe7KGaYA@mail.gmail.com>

Hi Peter,

> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Wednesday, October 15, 2025 1:06 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> On Wed, 15 Oct 2025 at 11:58, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > Hi Peter,
> >
> > > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org
> <qemu-
> > > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of
> Peter
> > > Maydell
> > > Sent: Tuesday, October 14, 2025 11:25 AM
> > > To: qemu-devel@nongnu.org
> > > Cc: Salil Mehta <salil.mehta@opnsrc.net>; Marc Zyngier
> > > <maz@kernel.org>
> > > Subject: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1
> > > from kernel in cpuif reset
> > >
> > > Currently in arm_gicv3_icc_reset() we read the kernel's value of
> > > ICC_CTLR_EL1 as part of resetting the CPU interface.  This mostly
> > > works, but we're actually breaking an assumption the kernel makes
> > > that userspace only accesses the in-kernel GIC data when the VM is
> > > totally paused, which may not be the case if a single vCPU is being
> > > reset.  The effect is that it's possible that the read attempt returns EBUSY.
> > >
> > > Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1
> > > once in device realize. This brings ICC_CTLR_EL1 into line with the
> > > other cpuif registers, where we assume we know what the kernel is
> > > resetting them to and just update QEMU's data structures in
> arm_gicv3_icc_reset().
> > >
> > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > > ---
> > > I've only tested this fairly lightly, but it seems to work.
> > > Salil, does this fix the EBUSY issues you were seeing ?
> > >
> > >  include/hw/intc/arm_gicv3_common.h |  3 ++
> > >  hw/intc/arm_gicv3_kvm.c            | 49 +++++++++++++++++++++---------
> > >  2 files changed, 38 insertions(+), 14 deletions(-)
> > >
> > [...]
> > > +
> > > +    /*
> > > +     * Now we can read the kernel's initial value of ICC_CTLR_EL1, which
> > > +     * we will need if a CPU interface is reset. If the kernel is ancient
> > > +     * and doesn't support writing the GIC state then we don't need to
> > > +     * care what reset does to QEMU's data structures.
> > > +     */
> > > +    if (!s->migration_blocker) {
> > > +        for (i = 0; i < s->num_cpu; i++) {
> > > +            GICv3CPUState *c = &s->cpu[i];
> > > +
> > > +            kvm_device_access(s->dev_fd,
> > > KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
> > > +                              KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer),
> > > +                              &c->kvm_reset_icc_ctlr_el1, false, &error_abort);
> > > +        }
> > > +    }
> > >  }
> >
> >
> > Above changes assume that the driver's configured value of the
> > ICC_CTLR_EL1 system register is the same as the default value. I've
> > verified that this currently the case. However, it safe to assume that
> > this will remain true in the future as well?
> 
> I don't understand what you mean here. We read the kernel's ICC_CTLR_EL1
> at VM startup, when we know it will be the reset value, because we haven't
> run any VCPUs yet.


System register fetches its value from ICH_VMCR_EL2 and ICH_VTR_EL2.
In specific, EOIMode, PMHE and CBPR fields of ICC_CTLR_EL1 are from
the VMCR register. Later gets configured when driver gets loaded and again
re-configured in context to each CPU ON request(via in-kernel  CPU Hotplug
state machine; CPUHP_AP_IRQ_GIC_STARTING). This configures the VMCR
again and again. Although, the value as of now is same. 

You might want to check gic_cpu_sys_reg_init() in irq-gic-v3.c

My question was, is it *future* safe to ignore these updates to VMCR fields
which can happen in context to guest kernel actions at point of time.

https://lore.kernel.org/all/20251001010127.3092631-22-salil.mehta@opnsrc.net/

The extra complexity which you saw in above proposed patch was to handle
that although it looked unnecessary as the value was always static.

The kernel patch proposed ensured that cache is always up to date (just in case)

https://lore.kernel.org/lkml/20251008201955.3919537-1-salil.mehta@opnsrc.net/


Internally, before posting RFC V6, I had discussed the implementation you've
proposed in this patch with Marc and Jonathan earlier but because of the issues
of hang I faced and above mentioned unaddressed issue of the driver update
I thought to push the version present in the RFC V6 to start with for your humble
consideration and let you decide 😊


I've tested this patch and I can confirm it is working along with RFC V6

https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.1/



I've also figured out the reason(s) why it did not work earlier i.e. below problems
and fixed them as well:
1. hang problem which I saw earlier
2. crash or no ENOTTY which I recently posted

I'll share the details of above problems in the context where the questions
were asked.


Many thanks!


> > We can instead cache the value once during the first cpu_reset() for
> > each CPU interface and reuse that cached value for all subsequent
> > cpu_reset() operations.  This would simplify the implementation while
> > retaining major goals.
> 
> No, we should read the value at realize time, as this patch does.
> Trying to do it at cpu_reset runs into the same problem that we have now
> that the kernel doesn't expect us to be reading the register when another
> vcpu might be running.


Agreed. we can. Problem was a miss in RFC V6. I fixed that.

Yes, the vCPU lock contention is an issue. Its better to avoid it as far as possible. 


> 
> > Calling pause_vcpus_all() at VM initialization time--during  the first
> > cpu_reset()--should not cause any issues, as all secondary vCPUs are
> > idle (i.e. their PCs are not initialized and kernel has not yet
> > loaded). Pausing such vCPUs at this stage should not be a disruptive action.
> 
> We should not need to run pause_vcpus_all() at all here. CPU reset should
> be fine done for one vcpu with the others running.


Yes, with this implementation working now, we can avoid it, so no issues.


> 
> > Moreover, if this stage poses a problem for a KVM device IOCTL access,
> > it similarly affect other components like GICv3 ITS, which also uses
> > KVM device IOCTL during GICv3 reset hold. At this stage all the vCPUs
> > should still be in a pre-boot or idle state
> 
> > static void kvm_arm_its_reset_hold(Object *obj, ResetType type) {
> > [...]
> >
> >     if (kvm_device_check_attr(s->dev_fd,
> KVM_DEV_ARM_VGIC_GRP_CTRL,
> >                                KVM_DEV_ARM_ITS_CTRL_RESET)) {
> >         kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
> >                           KVM_DEV_ARM_ITS_CTRL_RESET, NULL, true, &error_abort);
> >         return;
> >     }
> > [...]
> >
> > }
> >
> > We are not even pausing vCPUs here, yet we expect above KVM device
> > IOCTL to succeed on the first attempt. The operation also ends up
> > acquiring the mutex for all vCPUs within the KVM.
> 
> ITS reset should only happen (a) at start of the run, before any vcpus run or
> (b) as part of a full system reset. We should check that the vcpus are paused
> when we're doing system reset, but I would expect that to be the case
> because it would be likely to be buggy if not.


Agreed. I took bit of time to review what we stumbled upon in August and
September. The issue actually predated the "parking in user space for disabled
vCPU threads". It was first reported during testing, where enabling and disabling
a vCPU - followed by a reboot - led to a crash in GICv3 ITS because of the
lock contention due to the disabled vCPU thread sleeping inside the KVM.

As you rightly pointed, we might not see the same issue in (a) & (b).

Thanks for the clarification.

Best regards
Salil.



  reply	other threads:[~2025-10-16 11:14 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-14 10:24 [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset Peter Maydell
2025-10-14 10:41 ` Salil Mehta via
2025-10-14 13:23   ` Salil Mehta via
2025-10-14 13:31     ` Peter Maydell
2025-10-14 13:41       ` Salil Mehta via
2025-10-14 13:49         ` Peter Maydell
2025-10-14 14:22           ` Salil Mehta via
2025-10-14 14:28             ` Peter Maydell
2025-10-14 14:48               ` Salil Mehta via
2025-10-14 14:59                 ` Peter Maydell
2025-10-14 15:13                   ` Salil Mehta via
2025-10-14 15:16                     ` Salil Mehta via
2025-10-14 15:23                     ` Peter Maydell
2025-10-14 15:32                       ` Salil Mehta via
2025-10-14 15:43                         ` Peter Maydell
2025-10-14 15:54                           ` Salil Mehta via
2025-10-14 19:36                           ` Salil Mehta via
2025-10-17  1:43                             ` Salil Mehta
2025-10-14 16:07                         ` Salil Mehta via
2025-10-14 16:12                           ` Peter Maydell
2025-10-14 15:39                       ` Salil Mehta via
2025-10-16 12:09       ` Salil Mehta via
2025-10-15 10:58 ` Salil Mehta via
2025-10-15 12:06   ` Peter Maydell
2025-10-16 11:13     ` Salil Mehta via [this message]
2025-10-16 12:46       ` Peter Maydell
2025-10-16 15:28         ` Salil Mehta
2025-10-16 15:46           ` Peter Maydell
2025-10-16 15:48             ` Salil Mehta via
2025-10-16 12:17 ` Salil Mehta via
2025-10-16 12:22   ` Peter Maydell
2025-10-16 12:36     ` Salil Mehta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2b03da8f7514b57aef7d236be1dcb90@huawei.com \
    --to=qemu-devel@nongnu.org \
    --cc=maz@kernel.org \
    --cc=peter.maydell@linaro.org \
    --cc=salil.mehta@huawei.com \
    --cc=salil.mehta@opnsrc.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).