From: David Gibson <david@gibson.dropbear.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: kvm@vger.kernel.org, Alex Williamson <alex.williamson@redhat.com>,
Eric Auger <eric.auger@redhat.com>
Subject: Re: [PATCH kernel] vfio-pci: Allow mapping MSIX BAR
Date: Fri, 15 Dec 2017 15:13:22 +1100 [thread overview]
Message-ID: <20171215041322.GF7753@umbus.fritz.box> (raw)
In-Reply-To: <20171208041829.27520-1-aik@ozlabs.ru>
[-- Attachment #1: Type: text/plain, Size: 7173 bytes --]
On Fri, Dec 08, 2017 at 03:18:29PM +1100, Alexey Kardashevskiy wrote:
> By default VFIO disables mapping of MSIX BAR to the userspace as
> the userspace may program it in a way allowing spurious interrupts;
> instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl.
> In order to eliminate guessing from the userspace about what is
> mmapable, VFIO also advertises a sparse list of regions allowed to mmap.
>
> This works fine as long as the system page size equals to the MSIX
> alignment requirement which is 4KB. However with a bigger page size
> the existing code prohibits mapping non-MSIX parts of a page with MSIX
> structures so these parts have to be emulated via slow reads/writes on
> a VFIO device fd. If these emulated bits are accessed often, this has
> serious impact on performance.
>
> This allows mmap of the entire BAR containing MSIX vector table.
>
> This removes the sparse capability for PCI devices as it becomes useless.
>
> As the userspace needs to know for sure whether mmapping of the MSIX
> vector containing data can succeed, this adds a new capability -
> VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - which explicitly tells the userspace
> that the entire BAR can be mmapped.
>
> This does not touch the MSIX mangling in the BAR read/write handlers as
> we are doing this just to enable direct access to non MSIX registers.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
This seems reasonable, except for the interface issue that Alex
pointed out already.
TBH, I'm not really sold on the need for a new kernel cap, rather than
just having qemu attempt the mamp() and fall back if it fails. But
I'm not really fussed either way.
> ---
> include/uapi/linux/vfio.h | 10 +++++++
> drivers/vfio/pci/vfio_pci.c | 65 +++++++--------------------------------------
> drivers/vfio/vfio.c | 9 +++++++
> 3 files changed, 29 insertions(+), 55 deletions(-)
>
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index ae46105..0fb4407 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -300,6 +300,16 @@ struct vfio_region_info_cap_type {
> #define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG (2)
> #define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG (3)
>
> +/*
> + * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped
> + * which allows direct access to non-MSIX registers which happened to be within
> + * the same system page.
> + *
> + * Even though the userspace gets direct access to the MSIX data, the existing
> + * VFIO_DEVICE_SET_IRQS interface must still be used for MSIX configuration.
> + */
> +#define VFIO_REGION_INFO_CAP_MSIX_MAPPABLE 3
> +
> /**
> * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
> * struct vfio_irq_info)
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index a98681d..9bcc1e1 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -565,46 +565,16 @@ static int vfio_pci_for_each_slot_or_bus(struct pci_dev *pdev,
> return walk.ret;
> }
>
> -static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
> - struct vfio_info_cap *caps)
> +static int msix_sparse_msix_mmappable_cap(struct vfio_pci_device *vdev,
> + struct vfio_info_cap *caps)
> {
> - struct vfio_region_info_cap_sparse_mmap *sparse;
> - size_t end, size;
> - int nr_areas = 2, i = 0, ret;
> + struct vfio_info_cap_header header = {
> + .id = VFIO_REGION_INFO_CAP_MSIX_MAPPABLE,
> + .version = 1
> + };
>
> - end = pci_resource_len(vdev->pdev, vdev->msix_bar);
> -
> - /* If MSI-X table is aligned to the start or end, only one area */
> - if (((vdev->msix_offset & PAGE_MASK) == 0) ||
> - (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end))
> - nr_areas = 1;
> -
> - size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas));
> -
> - sparse = kzalloc(size, GFP_KERNEL);
> - if (!sparse)
> - return -ENOMEM;
> -
> - sparse->nr_areas = nr_areas;
> -
> - if (vdev->msix_offset & PAGE_MASK) {
> - sparse->areas[i].offset = 0;
> - sparse->areas[i].size = vdev->msix_offset & PAGE_MASK;
> - i++;
> - }
> -
> - if (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) < end) {
> - sparse->areas[i].offset = PAGE_ALIGN(vdev->msix_offset +
> - vdev->msix_size);
> - sparse->areas[i].size = end - sparse->areas[i].offset;
> - i++;
> - }
> -
> - ret = vfio_info_add_capability(caps, VFIO_REGION_INFO_CAP_SPARSE_MMAP,
> - sparse);
> - kfree(sparse);
> -
> - return ret;
> + return vfio_info_add_capability(caps,
> + VFIO_REGION_INFO_CAP_MSIX_MAPPABLE, &header);
> }
>
> int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
> @@ -695,7 +665,8 @@ static long vfio_pci_ioctl(void *device_data,
> if (vdev->bar_mmap_supported[info.index]) {
> info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
> if (info.index == vdev->msix_bar) {
> - ret = msix_sparse_mmap_cap(vdev, &caps);
> + ret = msix_sparse_msix_mmappable_cap(
> + vdev, &caps);
> if (ret)
> return ret;
> }
> @@ -1125,22 +1096,6 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
> if (req_start + req_len > phys_len)
> return -EINVAL;
>
> - if (index == vdev->msix_bar) {
> - /*
> - * Disallow mmaps overlapping the MSI-X table; users don't
> - * get to touch this directly. We could find somewhere
> - * else to map the overlap, but page granularity is only
> - * a recommendation, not a requirement, so the user needs
> - * to know which bits are real. Requiring them to mmap
> - * around the table makes that clear.
> - */
> -
> - /* If neither entirely above nor below, then it overlaps */
> - if (!(req_start >= vdev->msix_offset + vdev->msix_size ||
> - req_start + req_len <= vdev->msix_offset))
> - return -EINVAL;
> - }
> -
> /*
> * Even though we don't make use of the barmap for the mmap,
> * we need to request the region and the barmap tracks that.
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index f5a86f6..376b4e7 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1898,6 +1898,7 @@ int vfio_info_add_capability(struct vfio_info_cap *caps, int cap_type_id,
> void *cap_type)
> {
> int ret = -EINVAL;
> + struct vfio_info_cap_header *header, *cap;
>
> if (!cap_type)
> return 0;
> @@ -1910,6 +1911,14 @@ int vfio_info_add_capability(struct vfio_info_cap *caps, int cap_type_id,
> case VFIO_REGION_INFO_CAP_TYPE:
> ret = region_type_cap(caps, cap_type);
> break;
> +
> + case VFIO_REGION_INFO_CAP_MSIX_MAPPABLE:
> + cap = cap_type;
> + header = vfio_info_cap_add(caps, sizeof(*header), cap->id,
> + cap->version);
> + if (IS_ERR(header))
> + return PTR_ERR(header);
> + return 0;
> }
>
> return ret;
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2017-12-15 5:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-08 4:18 [PATCH kernel] vfio-pci: Allow mapping MSIX BAR Alexey Kardashevskiy
2017-12-12 0:09 ` Alex Williamson
2017-12-12 2:00 ` Alexey Kardashevskiy
2017-12-15 4:13 ` David Gibson [this message]
2017-12-15 15:56 ` Alex Williamson
2017-12-18 2:46 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171215041322.GF7753@umbus.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=eric.auger@redhat.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox