From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55641) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bAsES-0001Oy-NM for qemu-devel@nongnu.org; Thu, 09 Jun 2016 01:10:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bAsEN-00055I-Hp for qemu-devel@nongnu.org; Thu, 09 Jun 2016 01:10:15 -0400 Date: Thu, 9 Jun 2016 14:28:56 +1000 From: David Gibson Message-ID: <20160609042856.GJ9226@voom.fritz.box> References: <1464771463-37214-1-git-send-email-aik@ozlabs.ru> <201606010901.u518x1wF023568@mx0a-001b2d01.pphosted.com> <20160606055759.GL9226@voom.fritz.box> <20160608055743.GX9226@voom.fritz.box> <20d3189c-ac47-3cdb-137b-079a93eaaff2@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="LkBseQZEL/3d2JPd" Content-Disposition: inline In-Reply-To: <20d3189c-ac47-3cdb-137b-079a93eaaff2@ozlabs.ru> Subject: Re: [Qemu-devel] [PATCH qemu v17 11/12] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org, Alexander Graf , Alex Williamson --LkBseQZEL/3d2JPd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jun 08, 2016 at 04:09:57PM +1000, Alexey Kardashevskiy wrote: > On 08/06/16 15:57, David Gibson wrote: > > On Mon, Jun 06, 2016 at 06:12:58PM +1000, Alexey Kardashevskiy wrote: > >> On 06/06/16 15:57, David Gibson wrote: > >>> On Wed, Jun 01, 2016 at 06:57:42PM +1000, Alexey Kardashevskiy wrote: > >>>> This adds support for Dynamic DMA Windows (DDW) option defined by > >>>> the SPAPR specification which allows to have additional DMA window(s) > >>>> > >>>> The "ddw" property is enabled by default on a PHB but for compatibil= ity > >>>> the pseries-2.5 machine (TODO: update version) and older disable it. > >>> > >>> Looks like your todo is now todone, but you need to update the commit > >>> message. > >>> > >>>> This also creates a single DMA window for the older machines to > >>>> maintain backward migration. > >>>> > >>>> This implements DDW for PHB with emulated and VFIO devices. The host > >>>> kernel support is required. The advertised IOMMU page sizes are 4K a= nd > >>>> 64K; 16M pages are supported but not advertised by default, in order= to > >>>> enable them, the user has to specify "pgsz" property for PHB and > >>>> enable huge pages for RAM. > >>>> > >>>> The existing linux guests try creating one additional huge DMA window > >>>> with 64K or 16MB pages and map the entire guest RAM to. If succeeded, > >>>> the guest switches to dma_direct_ops and never calls TCE hypercalls > >>>> (H_PUT_TCE,...) again. This enables VFIO devices to use the entire R= AM > >>>> and not waste time on map/unmap later. This adds a "dma64_win_addr" > >>>> property which is a bus address for the 64bit window and by default > >>>> set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardwa= re > >>>> uses and this allows having emulated and VFIO devices on the same bu= s. > >>>> > >>>> This adds 4 RTAS handlers: > >>>> * ibm,query-pe-dma-window > >>>> * ibm,create-pe-dma-window > >>>> * ibm,remove-pe-dma-window > >>>> * ibm,reset-pe-dma-window > >>>> These are registered from type_init() callback. > >>>> > >>>> These RTAS handlers are implemented in a separate file to avoid poll= uting > >>>> spapr_iommu.c with PCI. > >>>> > >>>> This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs. > >>>> > >>>> Signed-off-by: Alexey Kardashevskiy > >>> > >>> Looks pretty close to ready. > >>> > >>> There are a handful of nits and one real error noted below. > >>> > >>>> --- > >>>> Changes: > >>>> v17: > >>>> * fixed: "query" did return non-page-shifted value when memory hotpl= ug is enabled > >>>> > >>>> v16: > >>>> * s/dma_liobn/dma_liobn[SPAPR_PCI_DMA_MAX_WINDOWS]/ > >>>> * s/SPAPR_PCI_LIOBN()/dma_liobn[]/ > >>>> > >>>> v15: > >>>> * moved page mask filtering to PHB realize(), use "-mempath" to know > >>>> if there are huge pages > >>>> * fixed error reporting in RTAS handlers > >>>> * max window size accounts now hotpluggable memory boundaries > >>>> --- > >>>> hw/ppc/Makefile.objs | 1 + > >>>> hw/ppc/spapr.c | 5 + > >>>> hw/ppc/spapr_pci.c | 77 +++++++++--- > >>>> hw/ppc/spapr_rtas_ddw.c | 293 +++++++++++++++++++++++++++++++++= +++++++++++ > >>>> include/hw/pci-host/spapr.h | 8 +- > >>>> include/hw/ppc/spapr.h | 16 ++- > >>>> trace-events | 4 + > >>>> 7 files changed, 383 insertions(+), 21 deletions(-) > >>>> create mode 100644 hw/ppc/spapr_rtas_ddw.c > >>>> > >>>> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs > >>>> index c1ffc77..986b36f 100644 > >>>> --- a/hw/ppc/Makefile.objs > >>>> +++ b/hw/ppc/Makefile.objs > >>>> @@ -7,6 +7,7 @@ obj-$(CONFIG_PSERIES) +=3D spapr_pci.o spapr_rtc.o s= papr_drc.o spapr_rng.o > >>>> ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy) > >>>> obj-y +=3D spapr_pci_vfio.o > >>>> endif > >>>> +obj-$(CONFIG_PSERIES) +=3D spapr_rtas_ddw.o > >>>> # PowerPC 4xx boards > >>>> obj-y +=3D ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o > >>>> obj-y +=3D ppc4xx_pci.o > >>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > >>>> index 44e401a..6ddcda9 100644 > >>>> --- a/hw/ppc/spapr.c > >>>> +++ b/hw/ppc/spapr.c > >>>> @@ -2366,6 +2366,11 @@ DEFINE_SPAPR_MACHINE(2_6, "2.6", true); > >>>> .driver =3D "spapr-vlan", \ > >>>> .property =3D "use-rx-buffer-pools", \ > >>>> .value =3D "off", \ > >>>> + }, \ > >>>> + {\ > >>>> + .driver =3D TYPE_SPAPR_PCI_HOST_BRIDGE,\ > >>>> + .property =3D "ddw",\ > >>>> + .value =3D stringify(off),\ > >>>> }, > >>>> =20 > >>>> static void spapr_machine_2_5_instance_options(MachineState *machin= e) > >>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > >>>> index 68de523..bcf0360 100644 > >>>> --- a/hw/ppc/spapr_pci.c > >>>> +++ b/hw/ppc/spapr_pci.c > >>>> @@ -35,6 +35,7 @@ > >>>> #include "hw/ppc/spapr.h" > >>>> #include "hw/pci-host/spapr.h" > >>>> #include "exec/address-spaces.h" > >>>> +#include "exec/ram_addr.h" > >>>> #include > >>>> #include "trace.h" > >>>> #include "qemu/error-report.h" > >>>> @@ -45,6 +46,7 @@ > >>>> #include "hw/ppc/spapr_drc.h" > >>>> #include "sysemu/device_tree.h" > >>>> #include "sysemu/kvm.h" > >>>> +#include "sysemu/hostmem.h" > >>>> =20 > >>>> #include "hw/vfio/vfio.h" > >>>> =20 > >>>> @@ -1088,7 +1090,7 @@ static void spapr_phb_add_pci_device(sPAPRDRCo= nnector *drc, > >>>> int fdt_start_offset =3D 0, fdt_size; > >>>> =20 > >>>> if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) { > >>>> - sPAPRTCETable *tcet =3D spapr_tce_find_by_liobn(phb->dma_li= obn); > >>>> + sPAPRTCETable *tcet =3D spapr_tce_find_by_liobn(phb->dma_li= obn[0]); > >>>> =20 > >>>> spapr_tce_set_need_vfio(tcet, true); > >>>> } > >>> > >>> Hang on.. I thought you'd got rid of the need for this explicit > >>> set_need_vfio() stuff. > >> > >> > >> It is in 12/12 (which I'll split in 2 halves when I respin this), I mo= ved > >> it to the end as it is not essential for DDW itself. > >=20 > > Yes, sorry, I saw that shortly after writing this. > >=20 > >>>> @@ -1310,11 +1312,14 @@ static void spapr_phb_realize(DeviceState *d= ev, Error **errp) > >>>> PCIBus *bus; > >>>> uint64_t msi_window_size =3D 4096; > >>>> sPAPRTCETable *tcet; > >>>> + const unsigned windows_supported =3D > >>>> + sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1; > >>>> =20 > >>>> if (sphb->index !=3D (uint32_t)-1) { > >>>> hwaddr windows_base; > >>>> =20 > >>>> - if ((sphb->buid !=3D (uint64_t)-1) || (sphb->dma_liobn !=3D= (uint32_t)-1) > >>>> + if ((sphb->buid !=3D (uint64_t)-1) || (sphb->dma_liobn[0] != =3D (uint32_t)-1) > >>>> + || ((sphb->dma_liobn[1] !=3D (uint32_t)-1) && (windows_= supported > 1)) > >>>> || (sphb->mem_win_addr !=3D (hwaddr)-1) > >>>> || (sphb->io_win_addr !=3D (hwaddr)-1)) { > >>>> error_setg(errp, "Either \"index\" or other parameters = must" > >>>> @@ -1329,7 +1334,9 @@ static void spapr_phb_realize(DeviceState *dev= , Error **errp) > >>>> } > >>>> =20 > >>>> sphb->buid =3D SPAPR_PCI_BASE_BUID + sphb->index; > >>>> - sphb->dma_liobn =3D SPAPR_PCI_LIOBN(sphb->index, 0); > >>>> + for (i =3D 0; i < windows_supported; ++i) { > >>>> + sphb->dma_liobn[i] =3D SPAPR_PCI_LIOBN(sphb->index, i); > >>>> + } > >>>> =20 > >>>> windows_base =3D SPAPR_PCI_WINDOW_BASE > >>>> + sphb->index * SPAPR_PCI_WINDOW_SPACING; > >>>> @@ -1342,8 +1349,9 @@ static void spapr_phb_realize(DeviceState *dev= , Error **errp) > >>>> return; > >>>> } > >>>> =20 > >>>> - if (sphb->dma_liobn =3D=3D (uint32_t)-1) { > >>>> - error_setg(errp, "LIOBN not specified for PHB"); > >>>> + if ((sphb->dma_liobn[0] =3D=3D (uint32_t)-1) || > >>>> + ((sphb->dma_liobn[1] =3D=3D (uint32_t)-1) && (windows_suppo= rted > 1))) { > >>>> + error_setg(errp, "LIOBN(s) not specified for PHB"); > >>>> return; > >>>> } > >>> > >>> Hrm.. there's a bit of false generality here, since this would break > >>> if windows_supported > 2, and dma_liobn[2] was not specified. Not > >>> urgent for the initial commit though. > >> > >> > >> Is s/windows_supported > 1/windows_supported =3D=3D 2/ any better here? > >=20 > > Not really. Unless you also have a windows_supported > 2 case (which > > could just error / abort / whatever). >=20 >=20 > It cannot be >2 as SPAPR_PCI_DMA_MAX_WINDOWS is defined as "2" now and it > is quite unlikely to change in the future. Yes, I know, but the value of MAX_WINDOWS isn't obvious if you're just looking at this code. My point is that a casual look at this code suggests it will handle arbitrary windows_supported values, but that's not actually the case. This is generally bad practice. > What do I do right now? :) Well, as I said it's not urgent for the first commit. Just leave it as it is and we'll deal with the consequences later. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --LkBseQZEL/3d2JPd Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJXWPCIAAoJEGw4ysog2bOShWEP/3gOUDFtfT7CcrhXB9DVDf2o K1/9BHRJWf7DDrJ6ihQ5UWI7xh01BwfdjTcbdEEtgjdZBv0mr+QYEkCiPzrLo/zi UKSPtnCfwVef/+VKmjGUC65wvSJQwlDPdWGr925/+gEtPXfULEE/C/nn36blfmwW 9evA0pEujZ2yE0U+hokzYpcD0K6kFVe3oUgncrJy0gG8YU6V0nWCJxnKMMmQlBsf 96uOmpQ6GW06fxwBCBlJ1o1ETMr6bxpzFIcDQ5LXaseNj1uQmEvnQHSVl/9QPU+x nXRbmCsH1uqWIIanbwm7HcjJwIRM2QMeYhA8APP8qqSumAh0eWGbE/mc2hJud905 BWFtvBaRSV8JQjcm9GGcWs39MWTILMLFDFcAI1dMKYUR9feCW+oaHchqXqPKZxCa vTsc7gEo6uBOIVsT/0cjc7GtuMhL1+yPetqk6Qaq3o2mB26vXncvOREe/op3UHT5 apDS2nBtyvyr38GmETv5Lt4wpPCk/pzNrIWISs+wgenY4X7rcThd9FNAZ46TxKqe vdbHecl6ReegKjyKS3glatotqvm/YQkSstXL40ojf40KEtUWRl/EnerWgTB450cf TtoyXDVb28zoTssexKzk9WE9xd4UHDUm0Oslgtss3M3PvwxTbvMnfP7yaxk3sx2O RQXkd6fOXtIKiCDUaOS1 =zrQa -----END PGP SIGNATURE----- --LkBseQZEL/3d2JPd--