From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34870) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1avagm-0008ED-9r for qemu-devel@nongnu.org; Wed, 27 Apr 2016 21:24:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1avagk-0006QG-CS for qemu-devel@nongnu.org; Wed, 27 Apr 2016 21:24:20 -0400 Date: Thu, 28 Apr 2016 11:02:35 +1000 From: David Gibson Message-ID: <20160428010235.GA2269@voom.redhat.com> References: <1459762426-18440-1-git-send-email-aik@ozlabs.ru> <1459762426-18440-15-git-send-email-aik@ozlabs.ru> <20160407004056.GE16485@voom.fritz.box> <571748A3.4070105@ozlabs.ru> <20160421035954.GH1133@voom> <57185569.3070405@ozlabs.ru> <20160427063931.GF18476@voom.redhat.com> <11043dbf-a5de-24c4-5cde-290d2f1a1c10@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6TrnltStXW4iwmi0" Content-Disposition: inline In-Reply-To: <11043dbf-a5de-24c4-5cde-290d2f1a1c10@ozlabs.ru> Subject: Re: [Qemu-devel] [PATCH qemu v15 14/17] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping being used by VFIO List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org, Alex Williamson , Alexander Graf , Paolo Bonzini --6TrnltStXW4iwmi0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Apr 27, 2016 at 07:14:15PM +1000, Alexey Kardashevskiy wrote: > On 04/27/2016 04:39 PM, David Gibson wrote: > >On Thu, Apr 21, 2016 at 02:22:01PM +1000, Alexey Kardashevskiy wrote: > >>On 04/21/2016 01:59 PM, David Gibson wrote: > >>>On Wed, Apr 20, 2016 at 07:15:15PM +1000, Alexey Kardashevskiy wrote: > >>>>On 04/07/2016 10:40 AM, David Gibson wrote: > >>>>>On Mon, Apr 04, 2016 at 07:33:43PM +1000, Alexey Kardashevskiy wrote: > >>>>>>The sPAPR TCE tables manage 2 copies when VFIO is using an IOMMU - > >>>>>>a guest view of the table and a hardware TCE table. If there is no = VFIO > >>>>>>presense in the address space, then just the guest view is used, if > >>>>>>this is the case, it is allocated in the KVM. However since there i= s no > >>>>>>support yet for VFIO in KVM TCE hypercalls, when we start using VFI= O, > >>>>>>we need to move the guest view from KVM to the userspace; and we ne= ed > >>>>>>to do this for every IOMMU on a bus with VFIO devices. > >>>>>> > >>>>>>This adds vfio_start/vfio_stop callbacks in MemoryRegionIOMMUOps to > >>>>>>notifiy IOMMU about changing environment so it can reallocate the t= able > >>>>>>to/from KVM or (when available) hook the IOMMU groups with the logi= cal > >>>>>>bus (LIOBN) in the KVM. > >>>>>> > >>>>>>This removes explicit spapr_tce_set_need_vfio() call from PCI hotpl= ug > >>>>>>path as the new callbacks do this better - they notify IOMMU at > >>>>>>the exact moment when the configuration is changed, and this also > >>>>>>includes the case of PCI hot unplug. > >>>>>> > >>>>>>As there can be multiple containers attached to the same PHB/LIOBN, > >>>>>>this replaces the @need_vfio flag in sPAPRTCETable with the counter > >>>>>>of VFIO users. > >>>>>> > >>>>>>Signed-off-by: Alexey Kardashevskiy > >>>>> > >>>>>This looks correct, but there's one remaining ugly. > >>>>> > >>>>>>--- > >>>>>>Changes: > >>>>>>v15: > >>>>>>* s/need_vfio/vfio-Users/g > >>>>>>--- > >>>>>> hw/ppc/spapr_iommu.c | 30 ++++++++++++++++++++---------- > >>>>>> hw/ppc/spapr_pci.c | 6 ------ > >>>>>> hw/vfio/common.c | 9 +++++++++ > >>>>>> include/exec/memory.h | 4 ++++ > >>>>>> include/hw/ppc/spapr.h | 2 +- > >>>>>> 5 files changed, 34 insertions(+), 17 deletions(-) > >>>>>> > >>>>>>diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c > >>>>>>index c945dba..ea09414 100644 > >>>>>>--- a/hw/ppc/spapr_iommu.c > >>>>>>+++ b/hw/ppc/spapr_iommu.c > >>>>>>@@ -155,6 +155,16 @@ static uint64_t spapr_tce_get_page_sizes(Memor= yRegion *iommu) > >>>>>> return 1ULL << tcet->page_shift; > >>>>>> } > >>>>>> > >>>>>>+static void spapr_tce_vfio_start(MemoryRegion *iommu) > >>>>>>+{ > >>>>>>+ spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iom= mu), true); > >>>>>>+} > >>>>>>+ > >>>>>>+static void spapr_tce_vfio_stop(MemoryRegion *iommu) > >>>>>>+{ > >>>>>>+ spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iom= mu), false); > >>>>>>+} > >>>>>>+ > >>>>>> static void spapr_tce_table_do_enable(sPAPRTCETable *tcet); > >>>>>> static void spapr_tce_table_do_disable(sPAPRTCETable *tcet); > >>>>>> > >>>>>>@@ -239,6 +249,8 @@ static const VMStateDescription vmstate_spapr_t= ce_table =3D { > >>>>>> static MemoryRegionIOMMUOps spapr_iommu_ops =3D { > >>>>>> .translate =3D spapr_tce_translate_iommu, > >>>>>> .get_page_sizes =3D spapr_tce_get_page_sizes, > >>>>>>+ .vfio_start =3D spapr_tce_vfio_start, > >>>>>>+ .vfio_stop =3D spapr_tce_vfio_stop, > >>>>> > >>>>>Ok, so AFAICT these callbacks are called whenever a VFIO context is > >>>>>added / removed from the gIOMMU's address space, and it's up to the > >>>>>gIOMMU code to ref count that to see if there are any current vfio > >>>>>users. That makes "vfio_start" and "vfio_stop" not great names. > >>>>> > >>>>>But.. better than changing the names would be to move the refcounting > >>>>>to the generic code if you can manage it, so the individual gIOMMU > >>>>>backends don't need to - they just told when they need to start / st= op > >>>>>providing VFIO support. > >>>> > >>>>Everything is manageable... > >>>> > >>>>This referencing is needed for the case of >=3D2 containers so > >>>>2xvfio_listener_region_add will create 2xVFIOGuestIOMMU as they are p= er > >>>>VFIOContainer so VFIOGuestIOMMU is not the right place for the refere= nce > >>>>counting, VFIOAddressSpace seems to be that place (=3D> add list of I= OMMU MRs > >>>>with refcounter). Or even IOMMU MR. Or move VFIOGuestIOMMU list from > >>>>VFIOContainer to VFIOAddressSpace and then gIOMMU can handle > >>>>refcounting? > >>> > >>>I'm having a lot of trouble parsing that. I think the ref parsing has > >>>to be per-giommu (because individual giommus could, in theory, be > >>>mapped or unmapped from an address space). > >> > >> > >>Example 1. > >>POWER8, no DDW, one QEMU PHB, 2 IOMMU groups, table sharing so just 1 > >>container, one TCE table (aka gIOMMU), one TCE table in KVM, no referen= ce > >>counting needed at all, simple. > >> > >>Example 2. > >>POWER7, no DDW, one QEMU PHB, 2 IOMMU groups, no table sharing so there= are > >>2 containers but still one IOMMU MR which is added to each container so > >>there are 2 gIOMMU objects. And there is still one TCE table in KVM (wh= ich > >>is a guest view). Where do I put the reference counter which will count= that > >>there are 2 gIOMMUs per KVM TCE table in this example? > > > >Ah.. I'd forgotten that the gIOMMU object is per guest IOMMU window > >*and* per container, not just per guest IOMMU window. > > > >Ultimately it's the code implementing the guest side IOMMU which needs > >to know if it is supporting VFIO or not, so in generic terms that > >means per IOMMU-type MemoryRegion. > > > >Essentially you need to count the number of VFIOGuestIOMMU objects > >associated with each (gIOMMU) MemoryRegion, and notify the > >MemoryRegion if that changes from zero to non-zero or vice versa. > > > >I'd prefer if we can maintain that count from just the VFIO code and > >just notify the gIOMMU code on zero / non-zero changes. But I guess > >we'd need approval from Paolo to add that count to the MemoryRegion. >=20 >=20 > Why MR? I could wrap MR to "VFIOIOMMUMR", add a counter and keep a list of > these VFIOIOMMUMRs in VFIOAddressSpace. Ah, yes I guess we could. It's just kinda ugly to have to keep another object with the same lifetime around for one extra counter. > I am adding Paolo, just for the case :) >=20 >=20 > >The fallback would be similar to what you have - instead the > >MemoryRegion gets notified whenever a VFIOGuestIOMMU is attached or > >removed, and the MR (i.e. the guest side IOMMU code) has to maintain > >the count itself. >=20 >=20 >=20 >=20 >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --6TrnltStXW4iwmi0 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJXIWErAAoJEGw4ysog2bOSaD0QANMkk83zgBVn6zRX+hi5pP9+ HOHeUNdJFsc3Xm/18pCaZlQbZrY5nGaYnb6k/V5s29TuH6MA26z4pKi+6yJO4I2C hlh8bJMzxHWzqhgD/ys67v/r+8HKCS29RF0nFZ625NCc2U4rhd9Hl1ztfSxT38fZ 7zi+YSmzXBnxQQdLa04cxRpFX8cuEDh9JaoSNHrc05oF1asAGFD/8B7HWYgje9QG ZHeS5MiwpGoSVP+5k6qx4x6jz+1tLP4pfS7U0TreHMNfdiKQfauzTQNEXw09k5BM YtibjppronyXAyVUpLOo2gmW9VIfsAw9UMKqqAeJ6iMR3B+qraHbN+zBpcze1uav +NPbLgW99O4vjo4+uRnzYbTKiHKnshoHcx74luGklgUcPJdxJf95D1eULljxbm+e G3Am3DDdPOGGKp0iVFs34tA4IHt095DsMpqaHs0ddqh8bxX/2FCSgiV+2h2sL4vF bJmpwtJpV/6x2OXo6hOaHt0vIk+qhHxHsrxP4MrAcUBc0mCFOXt1MOjUhdT0S81S 7796n+dHSAGR/1XO41WbYBJgsEOtT0l69H+OwbkTAm8zB6a0bVh7O8CueUqUBZzr aNifAhPOOuwoaonwP/EBKG1soYbuyuGgU1U60Td6F0fyIkjDUPnGGcZyEBVT+tk1 j4ns9g9O3IrZXQ4xj3pu =SwIb -----END PGP SIGNATURE----- --6TrnltStXW4iwmi0--