From: Alex Williamson <alex@shazbot.org>
To: Narayana Murty N <nnmlinux@linux.ibm.com>
Cc: clg@redhat.com, vaibhav@linux.ibm.com, harshpb@linux.ibm.com,
qemu-devel@nongnu.org
Subject: Re: [PATCH 1/1] vfio/pci: Ensure D0 power state before guest access
Date: Tue, 10 Feb 2026 15:33:56 -0700 [thread overview]
Message-ID: <20260210153356.01f79f4b@shazbot.org> (raw)
In-Reply-To: <20260209162242.11317-1-nnmlinux@linux.ibm.com>
On Mon, 9 Feb 2026 21:50:48 +0530
Narayana Murty N <nnmlinux@linux.ibm.com> wrote:
> Add vfio_ensure_d0_state() to safely transition PCI devices from D3hot/D3cold
> to D0 before QEMU guest access, preventing config space inaccessibility and
> tg3 IRQ crashes during VFIO realize.
>
> Key changes:
> - D3hot: Direct PMCSR write (offset 0x44) to force PowerState=00 (D0)
> - D3cold: pm_runtime_resume() + pm_runtime_get_sync() for full power restore
> - Polling loop verifies D0 transition completion
> - No-op for already D0 devices
>
> Fixes PowerPC EEH races where devices enter low-power states during VFIO
> handover, causing config space access failures.
>
> Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com>
> ---
> hw/vfio/pci.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 52 insertions(+)
NAK. This is very broken.
QEMU cannot write to arbitrary sysfs attributes. QEMU should not write
power state controls to sysfs nor impose a device power state policy.
vbasedev->fd is more than likely invalid where we're performing an
ioctl test, making the entire premise of the test invalid.
When the device is opened by QEMU, vfio-pci will issues a
pm_runtime_resume_and_get(), incrementing the PM usage counter and
waking the device. This should properly bring the device to the D0
power state and keep it there regardless of any ill-timed race to low
power state. If it does not, then fix it in the kernel or block
vfio-pci from using low power states, ie. disable_idle_d3. Thanks,
Alex
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index c734472721..851cd789aa 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3392,6 +3392,51 @@ bool vfio_pci_interrupt_setup(VFIOPCIDevice *vdev, Error **errp)
> return true;
> }
>
> +static int write_sysfs(const char *path, const char *value)
> +{
> + FILE *f = fopen(path, "w");
> + if (!f) {
> + return -1;
> + }
> + int ret = fprintf(f, "%s", value);
> + fclose(f);
> + return (ret > 0) ? 0 : -1;
> +}
> +
> +static void vfio_ensure_d0_state(VFIOPCIDevice *vdev)
> +{
> + VFIODevice *vbasedev = &vdev->vbasedev;
> + char sysfs_power_path[PATH_MAX];
> +
> + /*
> + * Test config region accessibility (D3cold-safe, no PCI config
> + * reads!)
> + */
> + struct vfio_region_info reg_info = {
> + .argsz = sizeof(reg_info),
> + .index = VFIO_PCI_CONFIG_REGION_INDEX,
> + .offset = 0,
> + .size = 0
> + };
> +
> + if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info) < 0) {
> + warn_report("vfio: %s config region probe failed (D3cold): %s",
> + vbasedev->name, strerror(errno));
> +
> + /* D3cold confirmed → sysfs power control (EEH-safe) */
> + snprintf(sysfs_power_path, sizeof(sysfs_power_path),
> + "/sys/bus/pci/devices/%s/power/control", vbasedev->name;
> +
> + /* Force runtime resume */
> + if (write_sysfs(sysfs_power_path, "on") == 0) {
> + g_usleep(10000); /* 10ms settle */
> + write_sysfs(sysfs_power_path, "auto");
> + info_report("vfio: %s D3cold → D0 via sysfs", vbasedev->name);
> + }
> + }
> + return;
> +}
> +
> static void vfio_pci_realize(PCIDevice *pdev, Error **errp)
> {
> ERRP_GUARD();
> @@ -3401,6 +3446,13 @@ static void vfio_pci_realize(PCIDevice *pdev, Error **errp)
> char uuid[UUID_STR_LEN];
> g_autofree char *name = NULL;
>
> + /*
> + * ensure the power state of the pci device to D0,
> + * otherwise it will set to D0, before accessing the
> + * config space.
> + */
> + vfio_ensure_d0_state(vdev);
> +
> if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
> if (!(~vdev->host.domain || ~vdev->host.bus ||
> ~vdev->host.slot || ~vdev->host.function)) {
prev parent reply other threads:[~2026-02-10 22:34 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-09 16:20 [PATCH 1/1] vfio/pci: Ensure D0 power state before guest access Narayana Murty N
2026-02-10 22:33 ` Alex Williamson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260210153356.01f79f4b@shazbot.org \
--to=alex@shazbot.org \
--cc=clg@redhat.com \
--cc=harshpb@linux.ibm.com \
--cc=nnmlinux@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=vaibhav@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.