From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Gibson Subject: Re: [PATCH kernel] vfio-pci: Allow mapping MSIX BAR Date: Fri, 15 Dec 2017 15:13:22 +1100 Message-ID: <20171215041322.GF7753@umbus.fritz.box> References: <20171208041829.27520-1-aik@ozlabs.ru> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="qftxBdZWiueWNAVY" Cc: kvm@vger.kernel.org, Alex Williamson , Eric Auger To: Alexey Kardashevskiy Return-path: Received: from ozlabs.org ([103.22.144.67]:40385 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752740AbdLOFvl (ORCPT ); Fri, 15 Dec 2017 00:51:41 -0500 Content-Disposition: inline In-Reply-To: <20171208041829.27520-1-aik@ozlabs.ru> Sender: kvm-owner@vger.kernel.org List-ID: --qftxBdZWiueWNAVY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Dec 08, 2017 at 03:18:29PM +1100, Alexey Kardashevskiy wrote: > By default VFIO disables mapping of MSIX BAR to the userspace as > the userspace may program it in a way allowing spurious interrupts; > instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl. > In order to eliminate guessing from the userspace about what is > mmapable, VFIO also advertises a sparse list of regions allowed to mmap. >=20 > This works fine as long as the system page size equals to the MSIX > alignment requirement which is 4KB. However with a bigger page size > the existing code prohibits mapping non-MSIX parts of a page with MSIX > structures so these parts have to be emulated via slow reads/writes on > a VFIO device fd. If these emulated bits are accessed often, this has > serious impact on performance. >=20 > This allows mmap of the entire BAR containing MSIX vector table. >=20 > This removes the sparse capability for PCI devices as it becomes useless. >=20 > As the userspace needs to know for sure whether mmapping of the MSIX > vector containing data can succeed, this adds a new capability - > VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - which explicitly tells the userspace > that the entire BAR can be mmapped. >=20 > This does not touch the MSIX mangling in the BAR read/write handlers as > we are doing this just to enable direct access to non MSIX registers. >=20 > Signed-off-by: Alexey Kardashevskiy This seems reasonable, except for the interface issue that Alex pointed out already. TBH, I'm not really sold on the need for a new kernel cap, rather than just having qemu attempt the mamp() and fall back if it fails. But I'm not really fussed either way. > --- > include/uapi/linux/vfio.h | 10 +++++++ > drivers/vfio/pci/vfio_pci.c | 65 +++++++--------------------------------= ------ > drivers/vfio/vfio.c | 9 +++++++ > 3 files changed, 29 insertions(+), 55 deletions(-) >=20 > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index ae46105..0fb4407 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -300,6 +300,16 @@ struct vfio_region_info_cap_type { > #define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG (2) > #define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG (3) > =20 > +/* > + * The MSIX mappable capability informs that MSIX data of a BAR can be m= mapped > + * which allows direct access to non-MSIX registers which happened to be= within > + * the same system page. > + * > + * Even though the userspace gets direct access to the MSIX data, the ex= isting > + * VFIO_DEVICE_SET_IRQS interface must still be used for MSIX configurat= ion. > + */ > +#define VFIO_REGION_INFO_CAP_MSIX_MAPPABLE 3 > + > /** > * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9, > * struct vfio_irq_info) > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index a98681d..9bcc1e1 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -565,46 +565,16 @@ static int vfio_pci_for_each_slot_or_bus(struct pci= _dev *pdev, > return walk.ret; > } > =20 > -static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev, > - struct vfio_info_cap *caps) > +static int msix_sparse_msix_mmappable_cap(struct vfio_pci_device *vdev, > + struct vfio_info_cap *caps) > { > - struct vfio_region_info_cap_sparse_mmap *sparse; > - size_t end, size; > - int nr_areas =3D 2, i =3D 0, ret; > + struct vfio_info_cap_header header =3D { > + .id =3D VFIO_REGION_INFO_CAP_MSIX_MAPPABLE, > + .version =3D 1 > + }; > =20 > - end =3D pci_resource_len(vdev->pdev, vdev->msix_bar); > - > - /* If MSI-X table is aligned to the start or end, only one area */ > - if (((vdev->msix_offset & PAGE_MASK) =3D=3D 0) || > - (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >=3D end)) > - nr_areas =3D 1; > - > - size =3D sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas)); > - > - sparse =3D kzalloc(size, GFP_KERNEL); > - if (!sparse) > - return -ENOMEM; > - > - sparse->nr_areas =3D nr_areas; > - > - if (vdev->msix_offset & PAGE_MASK) { > - sparse->areas[i].offset =3D 0; > - sparse->areas[i].size =3D vdev->msix_offset & PAGE_MASK; > - i++; > - } > - > - if (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) < end) { > - sparse->areas[i].offset =3D PAGE_ALIGN(vdev->msix_offset + > - vdev->msix_size); > - sparse->areas[i].size =3D end - sparse->areas[i].offset; > - i++; > - } > - > - ret =3D vfio_info_add_capability(caps, VFIO_REGION_INFO_CAP_SPARSE_MMAP, > - sparse); > - kfree(sparse); > - > - return ret; > + return vfio_info_add_capability(caps, > + VFIO_REGION_INFO_CAP_MSIX_MAPPABLE, &header); > } > =20 > int vfio_pci_register_dev_region(struct vfio_pci_device *vdev, > @@ -695,7 +665,8 @@ static long vfio_pci_ioctl(void *device_data, > if (vdev->bar_mmap_supported[info.index]) { > info.flags |=3D VFIO_REGION_INFO_FLAG_MMAP; > if (info.index =3D=3D vdev->msix_bar) { > - ret =3D msix_sparse_mmap_cap(vdev, &caps); > + ret =3D msix_sparse_msix_mmappable_cap( > + vdev, &caps); > if (ret) > return ret; > } > @@ -1125,22 +1096,6 @@ static int vfio_pci_mmap(void *device_data, struct= vm_area_struct *vma) > if (req_start + req_len > phys_len) > return -EINVAL; > =20 > - if (index =3D=3D vdev->msix_bar) { > - /* > - * Disallow mmaps overlapping the MSI-X table; users don't > - * get to touch this directly. We could find somewhere > - * else to map the overlap, but page granularity is only > - * a recommendation, not a requirement, so the user needs > - * to know which bits are real. Requiring them to mmap > - * around the table makes that clear. > - */ > - > - /* If neither entirely above nor below, then it overlaps */ > - if (!(req_start >=3D vdev->msix_offset + vdev->msix_size || > - req_start + req_len <=3D vdev->msix_offset)) > - return -EINVAL; > - } > - > /* > * Even though we don't make use of the barmap for the mmap, > * we need to request the region and the barmap tracks that. > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > index f5a86f6..376b4e7 100644 > --- a/drivers/vfio/vfio.c > +++ b/drivers/vfio/vfio.c > @@ -1898,6 +1898,7 @@ int vfio_info_add_capability(struct vfio_info_cap *= caps, int cap_type_id, > void *cap_type) > { > int ret =3D -EINVAL; > + struct vfio_info_cap_header *header, *cap; > =20 > if (!cap_type) > return 0; > @@ -1910,6 +1911,14 @@ int vfio_info_add_capability(struct vfio_info_cap = *caps, int cap_type_id, > case VFIO_REGION_INFO_CAP_TYPE: > ret =3D region_type_cap(caps, cap_type); > break; > + > + case VFIO_REGION_INFO_CAP_MSIX_MAPPABLE: > + cap =3D cap_type; > + header =3D vfio_info_cap_add(caps, sizeof(*header), cap->id, > + cap->version); > + if (IS_ERR(header)) > + return PTR_ERR(header); > + return 0; > } > =20 > return ret; --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --qftxBdZWiueWNAVY Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlozS+IACgkQbDjKyiDZ s5Kddw//djyrdPnJQjCClByg6IYwjQQw9Q2PffJoA5GBJl6rYiVTgwdPF1KBOE71 pR+EcTe+Rmpxl06KJEp4pt2RKNHZ9LiNTw8C+j1LLgDr4Q/wceDXIakj6om2hW4S hdrMmmCzzHWv/SM8nA4DGCWoCa5JrMJfc5qprWPfVvyZ4L/1vt1MPDySoenPKw7l B3UD0yAZaWQdPcmQ/U+S4m1NN6LQsxz2sO8LYeMuf59WeAGj7po2dnnH8rIUX+ad MRi42MeuA/QhOZKQTx4Y1Wcl/0K69Ma2IohdPhgYuoYKKDqFXU6wTpczDNjwxosX SxawmAKY08/jHt1pn0YB/na1075AByDEIiURkWwOmyENufijQirkPOqNOi6F1D8W i99fPbgW9NDAU6UeqB89gwrFvgpPAB8/jCvK6hBvynNMik+EW9J2SflBV9+kAYYf gS4Vy+CmtXPFWsboj+AqyUxOhtVwhjRM1YtLYhYPHCEZy8UTsPRCUCW/mB7huMlD 3UdifnoMuik6ojhLSYZQRovZPRk6S8zv3HuhK19PFeFhCSGiYdwWSJgvzs+ePMxm p4ms8WrQitHdU5DUByGuwov+hkqlu/k6N2ZbyqXcDsp8qI1ZvJoSV+QVmUKq7Bga KxoNJ/k+rDcL5NhBtUup4lzX3kjek+Cw1xv4t9QVxeql4lQWf/E= =7FKx -----END PGP SIGNATURE----- --qftxBdZWiueWNAVY--