From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39628) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bsRxN-0001cp-Uw for qemu-devel@nongnu.org; Fri, 07 Oct 2016 06:00:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bsRxK-0005fl-8o for qemu-devel@nongnu.org; Fri, 07 Oct 2016 06:00:44 -0400 Date: Fri, 7 Oct 2016 20:17:54 +1100 From: David Gibson Message-ID: <20161007091754.GX18490@umbus.fritz.box> References: <1475722987-18644-1-git-send-email-david@gibson.dropbear.id.au> <1475722987-18644-2-git-send-email-david@gibson.dropbear.id.au> <7507c5a4-85b9-8759-b12c-8f4fcaf1bfb2@ozlabs.ru> <20161007051008.GV18490@umbus.fritz.box> <4cb9f978-26a9-871a-b9bb-65f945f73eae@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="6pjpgtPchv3bUQQn" Content-Disposition: inline In-Reply-To: <4cb9f978-26a9-871a-b9bb-65f945f73eae@ozlabs.ru> Subject: Re: [Qemu-devel] [RFC 1/4] spapr_pci: Delegate placement of PCI host bridges to machine type List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, benh@kernel.crashing.org, thuth@redhat.com, lvivier@redhat.com, agraf@suse.de, mst@redhat.com, mdroth@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com, bharata@linux.vnet.ibm.com, abologna@redhat.com, mpolednik@redhat.com --6pjpgtPchv3bUQQn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Oct 07, 2016 at 04:34:59PM +1100, Alexey Kardashevskiy wrote: > On 07/10/16 16:10, David Gibson wrote: > > On Fri, Oct 07, 2016 at 02:57:43PM +1100, Alexey Kardashevskiy wrote: > >> On 06/10/16 14:03, David Gibson wrote: > >>> The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (P= HB) > >>> for a PAPR guest. Unlike on x86, it's routine on Power (both bare me= tal > >>> and PAPR guests) to have numerous independent PHBs, each controlling a > >>> separate PCI domain. > >>> > >>> There are two ways of configuring the spapr-pci-host-bridge device: f= irst > >>> it can be done fully manually, specifying the locations and sizes of = all > >>> the IO windows. This gives the most control, but is very awkward wit= h 6 > >>> mandatory parameters. Alternatively just an "index" can be specified > >>> which essentially selects from an array of predefined PHB locations. > >>> The PHB at index 0 is automatically created as the default PHB. > >>> > >>> The current set of default locations causes some problems for guests = with > >>> large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVi= dia > >>> GPGPU cards via VFIO). Obviously, for migration we can only change t= he > >>> locations on a new machine type, however. > >>> > >>> This is awkward, because the placement is currently decided within the > >>> spapr-pci-host-bridge code, so it breaks abstraction to look inside t= he > >>> machine type version. > >>> > >>> So, this patch delegates the "default mode" PHB placement from the > >>> spapr-pci-host-bridge device back to the machine type via a public me= thod > >>> in sPAPRMachineClass. It's still a bit ugly, but it's about the best= we > >>> can do. > >>> > >>> For now, this just changes where the calculation is done. It doesn't > >>> change the actual location of the host bridges, or any other behaviou= r. > >>> > >>> Signed-off-by: David Gibson > >>> --- > >>> hw/ppc/spapr.c | 34 ++++++++++++++++++++++++++++++++++ > >>> hw/ppc/spapr_pci.c | 22 ++++++++-------------- > >>> include/hw/pci-host/spapr.h | 11 +---------- > >>> include/hw/ppc/spapr.h | 4 ++++ > >>> 4 files changed, 47 insertions(+), 24 deletions(-) > >>> > >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > >>> index 03e3803..f6e9c2a 100644 > >>> --- a/hw/ppc/spapr.c > >>> +++ b/hw/ppc/spapr.c > >>> @@ -2370,6 +2370,39 @@ static HotpluggableCPUList *spapr_query_hotplu= ggable_cpus(MachineState *machine) > >>> return head; > >>> } > >>> =20 > >>> +static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t i= ndex, > >>> + uint64_t *buid, hwaddr *pio, hwaddr = *pio_size, > >>> + hwaddr *mmio, hwaddr *mmio_size, > >>> + unsigned n_dma, uint32_t *liobns, Er= ror **errp) > >>> +{ > >>> + const uint64_t base_buid =3D 0x800000020000000ULL; > >>> + const hwaddr phb0_base =3D 0x10000000000ULL; /* 1 TiB */ > >>> + const hwaddr phb_spacing =3D 0x1000000000ULL; /* 64 GiB */ > >>> + const hwaddr mmio_offset =3D 0xa0000000; /* 2 GiB + 512 MiB */ > >>> + const hwaddr pio_offset =3D 0x80000000; /* 2 GiB */ > >>> + const uint32_t max_index =3D 255; > >>> + > >>> + hwaddr phb_base; > >>> + int i; > >>> + > >>> + if (index > max_index) { > >>> + error_setg(errp, "\"index\" for PAPR PHB is too large (max %= u)", > >>> + max_index); > >>> + return; > >>> + } > >>> + > >>> + *buid =3D base_buid + index; > >>> + for (i =3D 0; i < n_dma; ++i) { > >>> + liobns[i] =3D SPAPR_PCI_LIOBN(index, i); > >>> + } > >>> + > >>> + phb_base =3D phb0_base + index * phb_spacing; > >>> + *pio =3D phb_base + pio_offset; > >>> + *pio_size =3D SPAPR_PCI_IO_WIN_SIZE; > >>> + *mmio =3D phb_base + mmio_offset; > >>> + *mmio_size =3D SPAPR_PCI_MMIO_WIN_SIZE; > >>> +} > >>> + > >>> static void spapr_machine_class_init(ObjectClass *oc, void *data) > >>> { > >>> MachineClass *mc =3D MACHINE_CLASS(oc); > >>> @@ -2406,6 +2439,7 @@ static void spapr_machine_class_init(ObjectClas= s *oc, void *data) > >>> mc->query_hotpluggable_cpus =3D spapr_query_hotpluggable_cpus; > >>> fwc->get_dev_path =3D spapr_get_fw_dev_path; > >>> nc->nmi_monitor_handler =3D spapr_nmi; > >>> + smc->phb_placement =3D spapr_phb_placement; > >>> } > >>> =20 > >>> static const TypeInfo spapr_machine_info =3D { > >>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > >>> index 4f00865..c0fc964 100644 > >>> --- a/hw/ppc/spapr_pci.c > >>> +++ b/hw/ppc/spapr_pci.c > >>> @@ -1311,7 +1311,8 @@ static void spapr_phb_realize(DeviceState *dev,= Error **errp) > >>> sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1; > >>> =20 > >>> if (sphb->index !=3D (uint32_t)-1) { > >>> - hwaddr windows_base; > >>> + sPAPRMachineClass *smc =3D SPAPR_MACHINE_GET_CLASS(spapr); > >>> + Error *local_err =3D NULL; > >>> =20 > >>> if ((sphb->buid !=3D (uint64_t)-1) || (sphb->dma_liobn[0] != =3D (uint32_t)-1) > >>> || (sphb->dma_liobn[1] !=3D (uint32_t)-1 && windows_supp= orted =3D=3D 2) > >>> @@ -1322,21 +1323,14 @@ static void spapr_phb_realize(DeviceState *de= v, Error **errp) > >>> return; > >>> } > >>> =20 > >>> - if (sphb->index > SPAPR_PCI_MAX_INDEX) { > >>> - error_setg(errp, "\"index\" for PAPR PHB is too large (m= ax %u)", > >>> - SPAPR_PCI_MAX_INDEX); > >>> + smc->phb_placement(spapr, sphb->index, > >>> + &sphb->buid, &sphb->io_win_addr, &sphb->i= o_win_size, > >>> + &sphb->mem_win_addr, &sphb->mem_win_size, > >>> + windows_supported, sphb->dma_liobn, &loca= l_err); > >>> + if (local_err) { > >>> + error_propagate(errp, local_err); > >>> return; > >>> } > >>> - > >>> - sphb->buid =3D SPAPR_PCI_BASE_BUID + sphb->index; > >>> - for (i =3D 0; i < windows_supported; ++i) { > >>> - sphb->dma_liobn[i] =3D SPAPR_PCI_LIOBN(sphb->index, i); > >>> - } > >>> - > >>> - windows_base =3D SPAPR_PCI_WINDOW_BASE > >>> - + sphb->index * SPAPR_PCI_WINDOW_SPACING; > >>> - sphb->mem_win_addr =3D windows_base + SPAPR_PCI_MMIO_WIN_OFF; > >>> - sphb->io_win_addr =3D windows_base + SPAPR_PCI_IO_WIN_OFF; > >>> } > >>> =20 > >>> if (sphb->buid =3D=3D (uint64_t)-1) { > >>> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h > >>> index 30dbd46..8c9ebfd 100644 > >>> --- a/include/hw/pci-host/spapr.h > >>> +++ b/include/hw/pci-host/spapr.h > >>> @@ -79,18 +79,9 @@ struct sPAPRPHBState { > >>> uint32_t numa_node; > >>> }; > >>> =20 > >>> -#define SPAPR_PCI_MAX_INDEX 255 > >>> - > >>> -#define SPAPR_PCI_BASE_BUID 0x800000020000000ULL > >>> - > >>> #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL > >>> =20 > >>> -#define SPAPR_PCI_WINDOW_BASE 0x10000000000ULL > >>> -#define SPAPR_PCI_WINDOW_SPACING 0x1000000000ULL > >>> -#define SPAPR_PCI_MMIO_WIN_OFF 0xA0000000 > >>> -#define SPAPR_PCI_MMIO_WIN_SIZE (SPAPR_PCI_WINDOW_SPACING - \ > >>> - SPAPR_PCI_MEM_WIN_BUS_OFFSET) > >>> -#define SPAPR_PCI_IO_WIN_OFF 0x80000000 > >>> +#define SPAPR_PCI_MMIO_WIN_SIZE 0xf80000000 > >>> #define SPAPR_PCI_IO_WIN_SIZE 0x10000 > >>> =20 > >>> #define SPAPR_PCI_MSI_WINDOW 0x40000000000ULL > >>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > >>> index 39dadaa..9e1c5a5 100644 > >>> --- a/include/hw/ppc/spapr.h > >>> +++ b/include/hw/ppc/spapr.h > >>> @@ -40,6 +40,10 @@ struct sPAPRMachineClass { > >>> bool dr_lmb_enabled; /* enable dynamic-reconfig/hotplug of= LMBs */ > >>> bool use_ohci_by_default; /* use USB-OHCI instead of XHCI */ > >>> const char *tcg_default_cpu; /* which (TCG) CPU to simulate by d= efault */ > >>> + void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index, > >>> + uint64_t *buid, hwaddr *pio, hwaddr *pio_s= ize, > >>> + hwaddr *mmio, hwaddr *mmio_size, > >>> + unsigned n_dma, uint32_t *liobns, Error **= errp); > >> > >> > >> I still like idea of getting rid of index better. In order to reduce t= he > >> command line length, you could not enable IO and MMIO32 by default, the > >> default size for MMIO64 could be set to 1TB or something like this, BU= ID > >> could be just a raw MMIO64 address value. > >=20 > > Hm, interesting idea. I do like the idea of making the BUID equal to > > the MMIO64 address. Obviously we'd still need addresses for the > > default PHB, including 32-bit and PIO windows. > >=20 > > Trickier, we'd still need to support 'index' for compatibility with > > older command lines. I guess we could reserve index purely for the > > "legacy" locations - 2.8+ machine types would create the default > > bridge with explicit windows, rather than just using index 0. > >=20 > >> So essentially you would only have to specify MMIO64 and 2xLIOBN in the > >> command line which does not seem that much of overkill assuming that a > >> second (third,...) PHB is almost never used and even if it is, it will= be > >> used for SRIOV VF (which can do 64bit) or virtio (the same). > >=20 > > Hm, maybe. 3 values is still a bit of a PITA. I guess liobn64 could > > theoretically be optional, although you're going to want it for > > basically every case where you're creating an additional PHB. > >=20 > > Or.. what if we insisted the mmio64 window had at least 4G alignment, > > then we might be able to use mmio64 >> 32 (=3D=3D buid >> 32) as the > > liobn. Hmm.. guess it would need to be 8G alignment, so we have one > > extra bit. >=20 >=20 > As you mentioned in 4/4, guests cannot access above 64 TiB, we could use > upper bits to store the window number. *Some* guests can't access above 64 TiB, that's not an inherent limitation. So I'd rather not do that. >=20 >=20 > >=20 > > ..and I wonder if this might be easier to manage if we introduced a > > new spapr-pci-host-bridge-2 subtype. > >=20 > > Let me think about this.. > >=20 > >> With index and phb_placement(), QEMU will still auto-generate these IO= /MMIO > >> addresses from the command line parameters which did hit us with migra= tion > >> before as the QEMU command line on the source side and the destination= side > >> is not exactly the same, and this might hit again... > >=20 > > Uh.. I'm not quite following you. What's the situation which would > > break migration with the proposed scheme? >=20 > Well, I cannot think of any other device available on SPAPR which we could > map to the CPU address space but if it ever appears, we will have a probl= em > unless this new imaginary device does not pick an address from the CPU > space automatically. I followed that even less that the previous one, I can't even parse that. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --6pjpgtPchv3bUQQn Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJX92hAAAoJEGw4ysog2bOSebIP/RJ12NItC9ap40E7ut13ipa/ SLkx2iimzyKcgw19W1TG9aBy8/ApL87ovKnN+XT0kD6ouLUsJgFY+1F7B0FrJ9Ec A7FGlAYb7unWd2aysV070EM98ERwSGbHXHSpDE4C4k69ntiLaQyf7in/i/RbS5ih 1pdtFUG7So9Ugf4VTQ6DU7tz6L2wpzTzAhrgVZPFz32J61FZzGTCK58PctIBn6OJ BSf67xXeyuFoudLJ4ObhNXuzsTO/wuenxMtDkINTAbhWStLyNqGUBBu9H+KdoWoZ Dbhxp6yfNlGi3jj/ERoHrIa64FfXkatXxWgBAbyRDiuQtcy2XtMlJK42r1Jzyk/L wwJbBxDgoA3GGycn8mVP9cHlWNNlGkArSkZ+9bUlnCCszgtZAIJKM9EZeVhmznSS Tfko3dRtCXcJQ7kjIDhPhRR2fF75YgKpeLqbPs1w7FKS5n2lVttPsa6vfXjgq/Wt q3fte6HeI4kgPu20D9INUKKEdlHnNq58tqLUH5plGCYOhQeCXREfW++hIuglLbCx Ta9IcIy5lUG8hFqDBenrJ7iymWyYhEnp4pCRQ4QSk2KiNbgjHVuqqonstU8PnDxg /ZJBXI2UFZTAOh6KLc2lo1ANxWM7Jvqd62Fp4XKup3OITLVLCCSvZMrI8XxcGea4 DtD+kDds+NcIKsi2O6dr =6kOs -----END PGP SIGNATURE----- --6pjpgtPchv3bUQQn--