From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Gibson Subject: Re: [RFC PATCH kernel] vfio-pci: Allow mapping MSIX BAR Date: Wed, 22 Nov 2017 17:51:23 +1100 Message-ID: <20171122065123.GS2380@umbus.fritz.box> References: <20171122040932.970-1-aik@ozlabs.ru> <20171121212846.4d567f2e@t450s.home> <20171122044455.GR2380@umbus.fritz.box> <20171121221438.5d1021f6@t450s.home> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="vEk28Nl/eckWL8CC" Cc: Alexey Kardashevskiy , kvm@vger.kernel.org, Yongji Xie , Eric Auger To: Alex Williamson Return-path: Received: from ozlabs.org ([103.22.144.67]:49679 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751240AbdKVGv2 (ORCPT ); Wed, 22 Nov 2017 01:51:28 -0500 Content-Disposition: inline In-Reply-To: <20171121221438.5d1021f6@t450s.home> Sender: kvm-owner@vger.kernel.org List-ID: --vEk28Nl/eckWL8CC Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Nov 21, 2017 at 10:14:45PM -0700, Alex Williamson wrote: > On Wed, 22 Nov 2017 15:44:55 +1100 > David Gibson wrote: >=20 > > On Tue, Nov 21, 2017 at 09:28:46PM -0700, Alex Williamson wrote: > > > On Wed, 22 Nov 2017 15:09:32 +1100 > > > Alexey Kardashevskiy wrote: > > > =20 > > > > By default VFIO disables mapping of MSIX BAR to the userspace as > > > > the userspace may program it in a way allowing spurious interrupts; > > > > instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl. > > > >=20 > > > > This works fine as long as the system page size equals to the MSIX > > > > alignment requirement which is 4KB. However with a bigger page size > > > > the existing code prohibits mapping non-MSIX parts of a page with M= SIX > > > > structures so these parts have to be emulated via slow reads/writes= on > > > > a VFIO device fd. If these emulated bits are accessed often, this h= as > > > > serious impact on performance. > > > >=20 > > > > This adds an ioctl to the vfio-pci device which hides the sparse > > > > capability and allows the userspace to map a BAR with MSIX structur= es. =20 > > >=20 > > > So the user is in control of telling the kernel whether they're allow= ed > > > to mmap the msi-x vector table. That makes absolutely no sense. If > > > you're trying to figure out how userspace knows whether to implicitly > > > avoid mmap'ing the msix region, I think there are far better ways in > > > the existing region info ioctl. We could use a flag, or maybe the > > > existence of a capability chain pointer, or a new capability. But > > > absolutely not this. The kernel needs to decide whether it's going to > > > let the user do this, not the user. Thanks, =20 > >=20 > > No, it doesn't. This is actually the approach we discussed in Prague. > >=20 > > Remember that intercepting access to the MSI-X table is not a host > > safety / security issue. It's just that without that we won't wire up > > the device's MSI-X vectors properly so they won't work. > >=20 > > Basically the decision here is between > >=20 > > A) Allow MSI-X configuration via standard PCI mechanisms, at the > > cost of making access slow for any registers sharing a page with > > the MSI-X table. > >=20 > > or > >=20 > > B) Make access to BAR registers sharing a page with the MSI-X table > > fast, at the cost of requiring some alternative mechanism to > > configure MSI-X vectors. > >=20 > > And that is a tradeoff that it is reasonable for userspace to make. > >=20 > > In the case of KVM guests, the decision depends entirely on the > > *guest* platform. Usually we need (A) because the guest expects to be > > able to poke the MSI-X table in the usual way. However for PAPR > > guests, there's an alternative mechanism via an RTAS call, which means > > we can use (B). > >=20 > > The host kernel can't make this decision, because it doesn't know the > > guest platform (well, KVM might, but VFIO doesn't). > >=20 > > A userspace VFIO program could also elect for (B) if it does care > > about performance of access to registers in the same BAR as the MSI-X > > table, but doesn't need MSI-X for example. >=20 > You're asking for an ioctl to allow the kernel to allow the user to > mmap the page, when instead we could just allow the user to mmap the > page and whether the user does that and how they make use of it is up > to them... Duh. Sorry. For some reason I was thinking the magic MSI-X interception was happening in the host kernel rather than in qemu. > I understand that there are different virtualization techniques at play > here, it just doesn't seem relevant. In the case of (A), the user can > choose not to mmap the page overlapping the vector table even if the > kernel allows it. The user can also choose to mmap that page, but not > use the portion overlapping the vector table. QEMU already does this > by overlaying a MemoryRegion for vector table emulation. We might even > be able to get away with mmaping that page and emulating the vector > table elsewhere, which seems like the only option for a 64k page ARM > system. For (B), clearly it's just a nuisance that we can't currently > mmap this page, but I still don't see how the user allowing the kernel > to allow the user to mmap that page makes any sense. I can't even > describe it without it sounding ridiculous. Thanks, Right. Rethinking.. it seems to me we should just completely remove the logic from the kernel banning mmap()s overlapping the MSI-X table. All it does is poorly attempt to stop the user shooting themselves in the foot. Then we just need logic in qemu to avoid doing the overlapping memory region nonsense on a per-machine basis --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --vEk28Nl/eckWL8CC Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAloVHmkACgkQbDjKyiDZ s5IYfRAA4Fyzh5HDBTnpciq2VmPQQISoBwFFq/VCskeHciBzNTTRwQJldC/1tEOp mROi0YbmrES938PXZknfW1lHnfJAnk/toJvI/qR2wDOIjbXwulb+/hLoR0pLa/uj De91nplZTy1jQX3t4DmdtJWuCGVSV1ziuSVEEexDfXE4gyuPqbL6hi/NrwJBsgzd G/WFq+X5bmfYihm3VHD6ZLLOilpYsWhllmF9gAFe7eO+nhjTDRViFkjUSaikGsxo itc/25RP5CNcnJALzP/Dz8zd0bRiLVTdaT97LHdeT5dMmslQNwUXWUCQd9fW8AVr VRPGVgqm0W1llAR0t9KawuHIrOB6W0hUPUu+7t7/RD4iKXRLJbvxbtC5FdCEq2Le q7kKLZDYcSVLw58P+0nShmsylFuyjBl8lVNkcsZvM1e7TB6hE2PlRiC3oFW/t/1c bCwXWyS8JNlXsk6V/OI06hG0l2krQDsd+slAMUPIY7lj6X7f0Yc6r48VbOm41pAV zXBjC1DrRcsJN+NWqwO6NvUMaFe+GP8/ra26JqZBqjFcxKIa3Qo8K7V0OXpsV34i JmxSFF4zhyvxyTcy8+ve2Zq8ySBeephLLLjPNNK6yM80cyZRtzvpU2YGkyopEeM0 goNrG4+eFZJnH0K1aRQrGK1O+bOvWrJDjvdZHyj5pDHWptv1BVY= =6jT2 -----END PGP SIGNATURE----- --vEk28Nl/eckWL8CC--