From: Salil Mehta via <qemu-devel@nongnu.org>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Salil Mehta <salil.mehta@opnsrc.net>,
Marc Zyngier <maz@kernel.org>
Subject: RE: [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset
Date: Thu, 16 Oct 2025 12:09:15 +0000 [thread overview]
Message-ID: <d2c29e810ccc4d0eb7740b69e36cb4ac@huawei.com> (raw)
In-Reply-To: <CAFEAcA_MZu4stZ4MY4zdpM0zy-gNBA3yj4dkuWL3d-FLFZC6rg@mail.gmail.com>
> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Tuesday, October 14, 2025 2:31 PM
> To: Salil Mehta <salil.mehta@huawei.com>
>
> On Tue, 14 Oct 2025 at 14:23, Salil Mehta <salil.mehta@huawei.com> wrote:
> >
> > Hi Peter,
> >
> > > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org
> <qemu-
> > > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Salil
> > > Mehta via
> > > Sent: Tuesday, October 14, 2025 11:41 AM
> > > To: Peter Maydell <peter.maydell@linaro.org>; qemu-
> devel@nongnu.org
> > >
> > > Hi Peter,
> > >
> > > > From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org
> > > <qemu-
> > > > devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of
> > > > devel-bounces+Peter
> > > > Maydell
> > > > Sent: Tuesday, October 14, 2025 11:25 AM
> > > > To: qemu-devel@nongnu.org
> > > >
> > > > Currently in arm_gicv3_icc_reset() we read the kernel's value of
> > > > ICC_CTLR_EL1 as part of resetting the CPU interface. This mostly
> > > > works, but we're actually breaking an assumption the kernel makes
> > > > that userspace only accesses the in-kernel GIC data when the VM is
> > > > totally paused, which may not be the case if a single vCPU is being
> reset.
> > > > The effect is that it's possible that the read attempt returns EBUSY.
> > > >
> > > > Avoid this by reading the kernel's value of the reset ICC_CTLR_EL1
> > > > once in device realize. This brings ICC_CTLR_EL1 into line with
> > > > the other cpuif registers, where we assume we know what the kernel
> > > > is resetting them to and just update QEMU's data structures in
> > > arm_gicv3_icc_reset().
> > > >
> > > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > > > ---
> > > > I've only tested this fairly lightly, but it seems to work.
> > > > Salil, does this fix the EBUSY issues you were seeing ?
> > >
> > >
> > > Let me try this and get back to you. Also, just to let you know
> > > that -EBUSY can return from other places as well. Please check my
> > > reply in the other mail- chain.
> >
> >
> > Got this.
> >
> > (gdb) handle SIGUSR1 nostop noprint pass
> > Signal Stop Print Pass to program Description
> > SIGUSR1 No No Yes User defined signal 1
> > (gdb) run
> > Starting program:
> > /opt/workspace/code/qemu/qemu/build/qemu-system-aarch64 --enable-
> kvm
> > -machine virt,gic-version=3 -cpu host -smp cpus=2,disabledcpus=2 -m
> > 300M -kernel /opt/workspace/code/linux/linux/arch/arm64/boot/Image
> > -initrd /opt/workspace/code/filesystem/rootfs.cpio.gz -append
> > console=ttyAMA0\ root=/dev/ram\ earlycon\ rdinit=/init\ maxcpus=1\
> > acpi=force -nographic -bios
> > /opt/workspace/code/uefi/edk2/Build/ArmVirtQemu-
> AARCH64/RELEASE_GCC5/F
> > V/QEMU_EFI.fd [Thread debugging using libthread_db enabled] Using host
> > libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> > [New Thread 0xfffff5b5eb40 (LWP 31994)] [New Thread 0xfffff4e88b40
> > (LWP 31996)] [New Thread 0xffffd4dfeb40 (LWP 31997)] Unexpected error
> > in kvm_device_access() at ../accel/kvm/kvm-all.c:3475:
> > qemu-system-aarch64: KVM_GET_DEVICE_ATTR failed: Group 6 attr
> > 0x000000000000c664: Inappropriate ioctl for device
>
> Does it do this consistently, or only sometimes? What host kernel version are
> you running? And what QEMU commit (plus this patch)?
>
> I'm guessing from that "disabledcpus=2" part that you're running some not-
> yet-upstream set of QEMU patches. Please drop those, and test only with
> this, to rule out the possibility of some bug/unexpected interaction with
> those.
Thanks for this suggestion. We can ignore the crash it happened because
of the wrong conflict resolution.
I also reviewed the problem of the hang which I briefly touched upon earlier.
After revisiting the previous debugging branches and fixes I realized:
1. Hang happened while executing CPU_ON from guest kernel and not while
accessing the system register during GIC realization phase.
2. It only happened if the {pause,resume}_all_vcpus() was not being used
3. Point 2. happened when I actually implemented the Marc's suggestion to
cache during the GIC realize.
4. But the hang only happened when the recently enabled vCPUs were brought
online for the first time.
But why it happened?
1. when a CPU is realized its CPUState::stopped=true, set by qemu_init_vcpu()
2. This state is reset for all the enabled vCPUs in context to vm_start() when
resume_all_vcpus() is called.
3. For the disabled vCPUs, we must do the same when they are enabled.
4. Calling cpu_resume() when 'disabled' vCPUs are administratively enabled
resets the CPUState::stopped to false and kicks the vCPUs.
5. Hang was due to the CPU thread not getting kicked from the IO wait state.
6. This problem went unnoticed earlier because before calling KVM device
IOCTL we first paused all the vCPUs and then resumed the vCPUs in
context to the cpu_reset() by calling {pause,resume}_vcpu_all(). This resume
action resets the CPUState::stopped to false and kicks the thread as well.
I've tested above and it works. Above hang previously observed is fixed.
https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.1/
Many thanks!
Best regards
Salil.
next prev parent reply other threads:[~2025-10-16 12:10 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-14 10:24 [PATCH] hw/intc/arm_gicv3_kvm: Avoid reading ICC_CTLR_EL1 from kernel in cpuif reset Peter Maydell
2025-10-14 10:41 ` Salil Mehta via
2025-10-14 13:23 ` Salil Mehta via
2025-10-14 13:31 ` Peter Maydell
2025-10-14 13:41 ` Salil Mehta via
2025-10-14 13:49 ` Peter Maydell
2025-10-14 14:22 ` Salil Mehta via
2025-10-14 14:28 ` Peter Maydell
2025-10-14 14:48 ` Salil Mehta via
2025-10-14 14:59 ` Peter Maydell
2025-10-14 15:13 ` Salil Mehta via
2025-10-14 15:16 ` Salil Mehta via
2025-10-14 15:23 ` Peter Maydell
2025-10-14 15:32 ` Salil Mehta via
2025-10-14 15:43 ` Peter Maydell
2025-10-14 15:54 ` Salil Mehta via
2025-10-14 19:36 ` Salil Mehta via
2025-10-17 1:43 ` Salil Mehta
2025-10-14 16:07 ` Salil Mehta via
2025-10-14 16:12 ` Peter Maydell
2025-10-14 15:39 ` Salil Mehta via
2025-10-16 12:09 ` Salil Mehta via [this message]
2025-10-15 10:58 ` Salil Mehta via
2025-10-15 12:06 ` Peter Maydell
2025-10-16 11:13 ` Salil Mehta via
2025-10-16 12:46 ` Peter Maydell
2025-10-16 15:28 ` Salil Mehta
2025-10-16 15:46 ` Peter Maydell
2025-10-16 15:48 ` Salil Mehta via
2025-10-16 12:17 ` Salil Mehta via
2025-10-16 12:22 ` Peter Maydell
2025-10-16 12:36 ` Salil Mehta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d2c29e810ccc4d0eb7740b69e36cb4ac@huawei.com \
--to=qemu-devel@nongnu.org \
--cc=maz@kernel.org \
--cc=peter.maydell@linaro.org \
--cc=salil.mehta@huawei.com \
--cc=salil.mehta@opnsrc.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).