From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Gibson Subject: Re: [RFC PATCH kernel] vfio-pci: Allow write combining Date: Thu, 30 Nov 2017 15:20:03 +1100 Message-ID: <20171130042003.GW3023@umbus.fritz.box> References: <7ad2ee73-80dc-69ef-c98a-63b020f2f48b@ozlabs.ru> <20171016060043.GC2776@umbus.fritz.box> <67742d45-a267-0363-7448-7f8945f84c95@ozlabs.ru> <20171016080135.GF2776@umbus.fritz.box> <672c7172-addd-a0ca-f810-5a8deca67ff5@ozlabs.ru> <20171114022350.GA20747@umbus.fritz.box> <1510626542.2397.8.camel@kernel.crashing.org> <20171114092835.046c5d61@t450s.home> <20171129114746.45d18a09@t450s.home> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="YHHegWJYV76VGRm9" Cc: Alexey Kardashevskiy , Benjamin Herrenschmidt , kvm@vger.kernel.org, Eric Auger To: Alex Williamson Return-path: Received: from ozlabs.org ([103.22.144.67]:48487 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750752AbdK3FEH (ORCPT ); Thu, 30 Nov 2017 00:04:07 -0500 Content-Disposition: inline In-Reply-To: <20171129114746.45d18a09@t450s.home> Sender: kvm-owner@vger.kernel.org List-ID: --YHHegWJYV76VGRm9 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Nov 29, 2017 at 11:47:46AM -0700, Alex Williamson wrote: > On Fri, 24 Nov 2017 15:58:09 +1100 > Alexey Kardashevskiy wrote: >=20 > > On 15/11/17 03:28, Alex Williamson wrote: > > > On Tue, 14 Nov 2017 13:29:02 +1100 > > > Benjamin Herrenschmidt wrote: > > > =20 > > >> On Tue, 2017-11-14 at 13:23 +1100, David Gibson wrote: =20 > > >>>>>> 1. Allow msix mapping to the userspace (to address non-64k-align= ed msix bar) =20 > > >>> > > >>> We have a new plan on this - I'll discuss it over IRC. > > >>> =20 > > >>>>>> 2. Allow write combining in vfio for the userspace (kvm guest is= kinda > > >>>>>> special and may simply ignore mapping flags in some configs but = PPC radix > > >>>>>> guests still rely on this) =20 > > >>> > > >>> AIUI this isn't for radix, but for DPDK things that we need this. = Ben > > >>> talked about it a bit, but I don't know what the outcome was. =20 > > >> > > >> So this is not a powerpc specific issue. Other archs similarily want= to > > >> be able to do write combine mappings. > > >> > > >> The way sysfs does it is that for prefetchable BARs, it exposes both > > >> a resourceN and a resourceN_wc file. > > >> > > >> For VFIO it's a bit more tricky, maybe we need to game the offset us= ing > > >> some of it as flags but that's very fishy, or maybe we do some kind = of > > >> ioctl that selects the attributes used for that fd instance for > > >> subsequent mappings... > > >> > > >> I'll let Alex chose what he feels most appropriate here. =20 > > >=20 > > > My order of preference would be something like: > > >=20 > > > - mmap flags provide some way for the user to specify a wc mapping > > > within existing regions =20 > >=20 > > There are plenty of flags but none really matches, checked with Paul. >=20 > Is MAP_NONBLOCK off the table? Why? > =20 > > > - some other mechanism of using the existing regions =20 > >=20 > > I can only think of madvise but it does not have appropriate flags eith= er. >=20 > Is it worth the process to define something that is appropriate? Would > either of the above be the obvious architectural/implementation choice > if we could define a flag for it? >=20 > > > - additional regions provided for use exclusively with wc attributes > > > (generalizing PCI BAR wc regions within device specific regions) = =20 > >=20 > >=20 > > Adding VFIO_PCI_BAR0_WC_REGION_INDEX for VFIO_PCI_BAR0_REGION_INDEX (an= d so > > on for other BARs) seems a viable option. > >=20 > > However the comment for VFIO_PCI_xxx_REGION_INDEX says: > >=20 > > VFIO_PCI_NUM_REGIONS =3D 9 /* Fixed user ABI, region indexes >=3D9 us= e */ > > /* device specific cap to define content. */ > >=20 > >=20 > > which limits me in where I can add new indexes, I cannot just add new _= WC > > indexes to that enum, can I? I cannot see any existing regions above 9 = yet > > though. >=20 > The comment explains how to do this, you'd add a device specific region > with the type identifying it as a PCI MMIO WC region and the sub-type > probably defining the BAR index. >=20 > > > - additional file descriptors provided for wc access =20 > >=20 > > It could be a capability + iocti(VFIO_DEVICE_GET_WC_RESOURCE) which wou= ld > > take a BAR index, check if the BAR is prefetchable and if so - return a= n fd > > which the userspace then could mmap(). This is won't break that ABI wit= h 9 > > regions but it is the least favourable in the list... >=20 > Do the kernel mechanics require it to be a separate file descriptor? A > separate fd is my last choice as well, but the interfaces your were > attempting to use previously seemed to have fd granularity. >=20 > > > This isn't at the top of my priority list to figure out the solution, > > > so whoever implements it will need to provide justification as they > > > move down the list from more to less preferred solutions. Thanks, = =20 > >=20 > > I am trying... I was really counting on you guys having this discussed = in > > Prague :( >=20 > Should have been there to push your agenda... Thanks, We discussed it briefly, BenH seemed to think there wasn't a big difficulty, IIRC, which is why we didn't spend much time on this (compared to the other issues). So, talk to him. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --YHHegWJYV76VGRm9 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlofhvAACgkQbDjKyiDZ s5I7fxAAzGGxpKdwDbmf04UhXfP33GDcETbi2JDgCj90ecEsz70pp4m3l8cpvrZo e/gklEil5+YyOvTX1qeKOU9deunvSQImiLIgK+FhoSOzYCge40bk4MoQGyuwbz3t MJQ9rZ2U6bMpj46gjQR9WDF2gvBxX6yjKdEkXMDyGZ3fnCxattVziK4pgERZLp1w y32NnMq5l2tOCBrZ8pvhQQl4CUfWkkCTilsc3g6aJwOkeyZtzMyBs1PsrAa7xdMH /zcIXosPfcpMG9vMlppckjF3Hoe+0HgMO8SY5lZ6XnVKNqRLHKJpAAEX5y5+Dlh5 Yls/ObYeQsGrzp96z86h5aYhTkzclm2KY583p8JxHlre61jj0FLAwAXsi2PUTu5m QVSTranjIFoeoUH1rgMwVigkryLqLPbkMb8BllzULeCaNw2T8xBMSrUBSsf5KGfa PupUr3q8IcGNEyDisvm+EPF/Aw1zySD5P7eMsdvyf8mSXsnvE5B82O1fRFhbnDrf HOchl3iyvguC+iM6+WOr/y8lJvn9Dm5TqC34pBn7g4KXcMdrvnF67bNupPgiKPm6 OHmbdqIL5ypqnOVfyCNgsy6ufQYnVHxE4uxjQpfozDwaRbX/g5o7sDstAhXjzYrz BK/Pdm/ZyFOYvU5v9lcft6lT8RYwEtV9vp77uxhgWffStV/aqmA= =0FUt -----END PGP SIGNATURE----- --YHHegWJYV76VGRm9--