From: Alex Williamson <alex.williamson@redhat.com>
To: Eric Auger <eauger@redhat.com>
Cc: eric.auger@redhat.com, eric.auger.pro@gmail.com,
qemu-devel@nongnu.org, clg@redhat.com, zhenzhong.duan@intel.com
Subject: Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in d3hot state
Date: Thu, 20 Feb 2025 08:48:58 -0700 [thread overview]
Message-ID: <20250220084858.0bd25b3f.alex.williamson@redhat.com> (raw)
In-Reply-To: <20250220080723.2ee71a7b.alex.williamson@redhat.com>
On Thu, 20 Feb 2025 08:07:23 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:
> On Thu, 20 Feb 2025 11:45:35 +0100
> Eric Auger <eauger@redhat.com> wrote:
>
> > Hi Alex,
> >
> > On 2/20/25 11:31 AM, Eric Auger wrote:
> > >
> > > Hi Alex,
> > >
> > > On 2/19/25 10:19 PM, Alex Williamson wrote:
> > >> On Wed, 19 Feb 2025 11:58:44 -0700
> > >> Alex Williamson <alex.williamson@redhat.com> wrote:
> > >>
> > >>> On Wed, 19 Feb 2025 18:58:58 +0100
> > >>> Eric Auger <eric.auger@redhat.com> wrote:
> > >>>
> > >>>> Since kernel commit:
> > >>>> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access
> > >>>> in D3hot power state")
> > >>>> any attempt to do an mmap access to a BAR when the device is in d3hot
> > >>>> state will generate a fault.
> > >>>>
> > >>>> On system_powerdown, if the VFIO device is translated by an IOMMU,
> > >>>> the device is moved to D3hot state and then the vIOMMU gets disabled
> > >>>> by the guest. As a result of this later operation, the address space is
> > >>>> swapped from translated to untranslated. When re-enabling the aliased
> > >>>> regions, the RAM regions are dma-mapped again and this causes DMA_MAP
> > >>>> faults when attempting the operation on BARs.
> > >>>>
> > >>>> To avoid doing the remap on those BARs, we compute whether the
> > >>>> device is in D3hot state and if so, skip the DMA MAP.
> > >>> Thinking on this some more, QEMU PCI code already manages the device
> > >>> BARs appearing in the address space based on the memory enable bit in
> > >>> the command register. Should we do the same for PM state?
> > >>>
> > >>> IOW, the device going into low power state should remove the BARs from
> > >>> the AddressSpace and waking the device should re-add them. The BAR DMA
> > >>> mapping should then always be consistent, whereas here nothing would
> > >>> remap the BARs when the device is woken.
> > >>>
> > >>> I imagine we'd need an interface to register the PM capability with the
> > >>> core QEMU PCI code, where address space updates are performed relative
> > >>> to both memory enable and power status. There might be a way to
> > >>> implement this just for vfio-pci devices by toggling the enable state
> > >>> of the BAR mmaps relative to PM state, but doing it at the PCI core
> > >>> level seems like it'd provide behavior more true to physical hardware.
> > >> I took a stab at this approach here, it doesn't obviously break
> > >> anything in my configs, but I haven't yet tried to reproduce this exact
> > >> scenario.
> > >>
> > >> https://gitlab.com/alex.williamson/qemu/-/tree/pci-pm-power-state
> >
> > it does not totally fix the issue: I now get:
> >
> > qemu-system-x86_64: warning: vfio_container_dma_map(0x55cc25705680,
> > 0x380000000000, 0x1000000, 0x7f8762000000) = -14 (Bad address)
> > 0000:41:00.0: PCI peer-to-peer transactions on BARs are not supported.
>
> Hmm, I'll reproduce and debug further. The intention here is that BARs
> for the device in D3hot would not be DMA mapped, effectively as if the
> memory enable bit in the command register were cleared, therefore I'd
> hoped the listener is not called for this range.
I forgot to mark the PM state field as writable in config space, so we
were always reading back D0 state. Adding the following to
pci_pm_init() resolves it:
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -445,6 +445,7 @@ int pci_pm_init(PCIDevice *d, uint8_t offset, Error **errp)
d->pm_cap = cap;
d->cap_present |= QEMU_PCI_CAP_PM;
+ pci_set_word(d->wmask + cap + PCI_PM_CTRL, PCI_PM_CTRL_STATE_MASK);
return cap;
}
Changing this might cause a problem with migration, ISTR we validate
the wmask with the source. Anyway, I'll post the series and we can
test further and discuss it there. Thanks,
Alex
next prev parent reply other threads:[~2025-02-20 15:49 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-19 17:58 [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in d3hot state Eric Auger
2025-02-19 17:58 ` [RFC 1/2] hw/vfio: Introduce vfio_is_dma_map_allowed() callback Eric Auger
2025-02-19 17:59 ` [RFC 2/2] hw/vfio/pci: Prevents BARs from being dma mapped in d3hot state Eric Auger
2025-02-19 18:58 ` [RFC 0/2] hw/vfio/pci: Prevent " Alex Williamson
2025-02-19 21:19 ` Alex Williamson
2025-02-20 10:31 ` Eric Auger
2025-02-20 10:45 ` Eric Auger
2025-02-20 15:07 ` Alex Williamson
2025-02-20 15:48 ` Alex Williamson [this message]
2025-02-20 4:24 ` Duan, Zhenzhong
2025-02-20 5:05 ` Alex Williamson
2025-02-20 8:25 ` Duan, Zhenzhong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250220084858.0bd25b3f.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=clg@redhat.com \
--cc=eauger@redhat.com \
--cc=eric.auger.pro@gmail.com \
--cc=eric.auger@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).