From: Alex Williamson <alex.williamson@redhat.com>
To: Auger Eric <eric.auger@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [PATCH qemu v3] RFC: vfio-pci: Allow mmap of MSIX BAR
Date: Tue, 23 Jan 2018 12:55:31 -0700 [thread overview]
Message-ID: <20180123125531.0de84de2@w520.home> (raw)
In-Reply-To: <424c3873-7368-5fbe-95bf-3040c2bec73a@redhat.com>
On Tue, 23 Jan 2018 16:34:34 +0100
Auger Eric <eric.auger@redhat.com> wrote:
> Hi Alexey,
> On 23/01/18 02:22, Alexey Kardashevskiy wrote:
> > This makes use of a new VFIO_REGION_INFO_CAP_MSIX_MAPPABLE capability
> > which tells that a region with MSIX data can be mapped entirely, i.e.
> > the VFIO PCI driver won't prevent MSIX vectors area from being mapped.
> >
> > With this change, all BARs are mapped in a single chunk and MSIX vectors
> > are emulated on top unless the machine requests not to by defining and
> > enabling a new "vfio-no-msix-emulation" property. At the moment only
> > sPAPR machine does so - it prohibits MSIX emulation and does not allow
> > enabling it as it does not define the "set" callback for the new property;
> > the new property also does not appear in "-machine pseries,help".
> >
> > If MSIX vectors section is not aligned to the page size, the KVM memory
> > listener does not register it with the KVM as a memory slot and MSIX is
> > emulated by QEMU as before. This may create MMIO RAM memory sections with
> > an address or/and a size not aligned which will make vfio_dma_map() fail;
> > to address this, this makes treats such failures as non-fatal.
> >
> > This requires the kernel change - "vfio-pci: Allow mapping MSIX BAR" -
> > for the new capability: https://www.spinics.net/lists/kvm/msg160282.html
> >
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> > Changes:
> > v3:
> > * vfio_listener_region_add() won't make qemu exit if failed on MMIO MR
> >
> > ---
> > include/hw/vfio/vfio-common.h | 1 +
> > linux-headers/linux/vfio.h | 5 +++++
> > hw/ppc/spapr.c | 7 +++++++
> > hw/vfio/common.c | 19 +++++++++++++++++++
> > hw/vfio/pci.c | 10 ++++++++++
> > 5 files changed, 42 insertions(+)
> >
> > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > index f3a2ac9..927d600 100644
> > --- a/include/hw/vfio/vfio-common.h
> > +++ b/include/hw/vfio/vfio-common.h
> > @@ -171,6 +171,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
> > struct vfio_region_info **info);
> > int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type,
> > uint32_t subtype, struct vfio_region_info **info);
> > +bool vfio_is_cap_present(VFIODevice *vbasedev, uint16_t cap_type, int region);
> > #endif
> > extern const MemoryListener vfio_prereg_listener;
> >
> > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> > index 4312e96..b45182e 100644
> > --- a/linux-headers/linux/vfio.h
> > +++ b/linux-headers/linux/vfio.h
> > @@ -301,6 +301,11 @@ struct vfio_region_info_cap_type {
> > #define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG (2)
> > #define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG (3)
> >
> > +/*
> > + * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped.
> > + */
> > +#define VFIO_REGION_INFO_CAP_MSIX_MAPPABLE 3
> > +
> > /**
> > * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
> > * struct vfio_irq_info)
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index d1acfe8..5ff43ce 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2789,6 +2789,11 @@ static void spapr_set_modern_hotplug_events(Object *obj, bool value,
> > spapr->use_hotplug_event_source = value;
> > }
> >
> > +static bool spapr_get_msix_emulation(Object *obj, Error **errp)
> > +{
> > + return true;
> > +}
> > +
> > static char *spapr_get_resize_hpt(Object *obj, Error **errp)
> > {
> > sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> > @@ -2870,6 +2875,8 @@ static void spapr_instance_init(Object *obj)
> > object_property_set_description(obj, "vsmt",
> > "Virtual SMT: KVM behaves as if this were"
> > " the host's SMT mode", &error_abort);
> > + object_property_add_bool(obj, "vfio-no-msix-emulation",
> > + spapr_get_msix_emulation, NULL, NULL);
> > }
> >
> > static void spapr_machine_finalizefn(Object *obj)
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index b77be3a..842c5b2 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -530,6 +530,10 @@ static void vfio_listener_region_add(MemoryListener *listener,
> > return;
> >
> > fail:
> > + if (memory_region_is_ram_device(section->mr)) {
> > + error_report("failed to vfio_dma_map. pci p2p may not work");
> > + return;
> > + }
>
> I don't think this is an acceptable solution as it produces plenty of
> errors such as
>
> qemu-system-aarch64: VFIO_MAP_DMA: -22
> qemu-system-aarch64: vfio_dma_map(0x30a7ab90, 0x8000000000, 0x2000,
> 0xfffd79a00000) = -22 (Invalid argument)
This much of the error above could be avoided by looking at the type1
page size bitmap before trying to perform the mapping.
> qemu-system-aarch64: failed to vfio_dma_map. pci p2p may not work
This of course is still valid though we can continue to discuss if it's
worthwhile user information.
> testing context is: aarch64 + Mellanox CX-4 PF where we still try to map
> the BAR0 area [0 - 0x2000] before the MSI-X table which is not aligned
> with the host 64kB page.
>
> Region 0: Memory at 80400000000 (64-bit, prefetchable) [size=32M]
> Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
> Vector table: BAR=0 offset=00002000
> PBA: BAR=0 offset=00003000
>
>
> To me a kind of mmaps region fixup looks required?
I don't think so. vfio is now telling us that we can mmap the entire
BAR, this is true. The type1 info is telling us that the minimum DMA
mapping granularity, PAGE_SIZE on the host (64K in your case). The
MSI-X offset on this device introduces an 8K range that type1 cannot
map and QEMU already has all the information it needs to know this. The
solutions are either to move MSI-X elsewhere or to extend type1 to
allow DMA mappings down to the IOMMU minimum granularity.
The MSI-X fixup that is now bypassed avoided this not generating an
mmap over the vector table, which meant this 8K region wasn't covered
by a memory slot and therefore never got passed through region_add. So
p2p dma was silently disabled. Now we do mmap the entire BAR, but we
add the MSI-X region on top of it so that when it gets flattened we end
up with these sub-page slots. Skipping them at region_add has the same
overall effect.
Before:
|-| <-- MSI-X vector emulation MR
+
|---PAGE_SIZE aligned mmap MR----|
+
|------------slow mapped BAR M------------|
=
|----Resulting region_add--------| (PAGE_SIZE aligned)
After:
|-| <-- MSI-X vector emulation MR
+
|------------PAGE_SIZE aligned mmap MR----|
+
|------------slow mapped BAR M------------|
=
|-| |----Resulting region_add-------------| (Gap no longer PAGE_SIZE aligned)
So doesn't it make sense that region_add would now filter these out?
Should we continue to silently skip unmapped MMIO? Thanks,
Alex
next prev parent reply other threads:[~2018-01-23 19:55 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-23 1:22 [Qemu-devel] [PATCH qemu v3] RFC: vfio-pci: Allow mmap of MSIX BAR Alexey Kardashevskiy
2018-01-23 15:34 ` Auger Eric
2018-01-23 19:55 ` Alex Williamson [this message]
2018-01-24 14:13 ` Auger Eric
2018-01-24 15:02 ` Alex Williamson
2018-01-24 15:30 ` Auger Eric
2018-01-24 15:47 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180123125531.0de84de2@w520.home \
--to=alex.williamson@redhat.com \
--cc=aik@ozlabs.ru \
--cc=david@gibson.dropbear.id.au \
--cc=eric.auger@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).