From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37625) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bva9L-0007UH-L6 for qemu-devel@nongnu.org; Sat, 15 Oct 2016 21:22:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bva9I-0000L5-Ec for qemu-devel@nongnu.org; Sat, 15 Oct 2016 21:22:03 -0400 Date: Sun, 16 Oct 2016 12:06:42 +1100 From: David Gibson Message-ID: <20161016010642.GW28562@umbus> References: <1476316647-9433-1-git-send-email-david@gibson.dropbear.id.au> <1476316647-9433-8-git-send-email-david@gibson.dropbear.id.au> <9a92c0ce-f8e9-a39d-a7fa-e48e7f1e5f30@redhat.com> <20161013232954.GR18039@umbus.fritz.box> <8796bab4-44a7-739b-9b02-4e8bedb9f684@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="QVzQgM+zdZ3YWXqn" Content-Disposition: inline In-Reply-To: <8796bab4-44a7-739b-9b02-4e8bedb9f684@redhat.com> Subject: Re: [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laurent Vivier Cc: agraf@suse.de, mst@redhat.com, abologna@redhat.com, aik@ozlabs.ru, qemu-devel@nongnu.org, qemu-ppc@nongnu.org, mdroth@linux.vnet.ibm.com, benh@kernel.crashing.org --QVzQgM+zdZ3YWXqn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Oct 14, 2016 at 09:25:58AM +0200, Laurent Vivier wrote: > On 14/10/2016 01:29, David Gibson wrote: > > From 98f07c09c6d72218800d6cfbf44b973a88ece2aa Mon Sep 17 00:00:00 2001 > > From: David Gibson > > Date: Fri, 14 Oct 2016 10:21:00 +1100 > > Subject: [PATCH] spapr: Improved placement of PCI host bridges in guest= memory > > map > >=20 > > Currently, the MMIO space for accessing PCI on pseries guests begins at > > 1 TiB in guest address space. Each PCI host bridge (PHB) has a 64 GiB > > chunk of address space in which it places its outbound PIO and 32-bit a= nd > > 64-bit MMIO windows. > >=20 > > This scheme as several problems: > > - It limits guest RAM to 1 TiB (though we have a limited fix for this > > now) > > - It limits the total MMIO window to 64 GiB. This is not always enou= gh > > for some of the large nVidia GPGPU cards > > - Putting all the windows into a single 64 GiB area means that natura= lly > > aligning things within there will waste more address space. > > In addition there was a miscalculation in some of the defaults, which m= eant > > that the MMIO windows for each PHB actually slightly overran the 64 GiB > > region for that PHB. We got away without nasty consequences because > > the overrun fit within an unused area at the beginning of the next PHB's > > region, but it's not pretty. > >=20 > > This patch implements a new scheme which addresses those problems, and = is > > also closer to what bare metal hardware and pHyp guests generally use. > >=20 > > Because some guest versions (including most current distro kernels) can= 't > > access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB= and > > 64 TiB. This is broken into 1 TiB chunks. The 1 TiB contains the PIO > > (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs. Each > > subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for > > one PHB each. > >=20 > > This reduces the number of allowed PHBs (without full manual configurat= ion > > of all the windows) from 256 to 31, but this should still be plenty in > > practice. > >=20 > > We also change some of the default window sizes for manually configured > > PHBs to saner values. > >=20 > > Finally we adjust some tests and libqos so that it correctly uses the n= ew > > default locations. Ideally it would parse the device tree given to the > > guest, but that's a more complex problem for another time. > >=20 > > Signed-off-by: David Gibson >=20 >=20 > I really like this new version. >=20 > Reviewed-by: Laurent Vivier Great! I've merged it into ppc-for-2.8. >=20 > Laurent >=20 > > --- > > hw/ppc/spapr.c | 121 +++++++++++++++++++++++++++++++++---= -------- > > hw/ppc/spapr_pci.c | 5 +- > > include/hw/pci-host/spapr.h | 8 ++- > > tests/endianness-test.c | 3 +- > > tests/libqos/pci-spapr.c | 9 ++-- > > tests/spapr-phb-test.c | 2 +- > > 6 files changed, 108 insertions(+), 40 deletions(-) > >=20 > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index 8db3657..4bdf15b 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -2375,31 +2375,37 @@ static void spapr_phb_placement(sPAPRMachineSta= te *spapr, uint32_t index, > > hwaddr *mmio32, hwaddr *mmio64, > > unsigned n_dma, uint32_t *liobns, Erro= r **errp) > > { > > + /* > > + * New-style PHB window placement. > > + * > > + * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window > > + * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO > > + * windows. > > + * > > + * Some guest kernels can't work with MMIO windows above 1<<46 > > + * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB > > + * > > + * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all > > + * PHBs. 33..34TiB has the 64-bit MMIO window for PHB0, 34..35 > > + * has the 64-bit window for PHB1 and so forth. > > + */ > > const uint64_t base_buid =3D 0x800000020000000ULL; > > - const hwaddr phb_spacing =3D 0x1000000000ULL; /* 64 GiB */ > > - const hwaddr mmio_offset =3D 0xa0000000; /* 2 GiB + 512 MiB */ > > - const hwaddr pio_offset =3D 0x80000000; /* 2 GiB */ > > - const uint32_t max_index =3D 255; > > - const hwaddr phb0_alignment =3D 0x10000000000ULL; /* 1 TiB */ > > - > > - uint64_t ram_top =3D MACHINE(spapr)->ram_size; > > - hwaddr phb0_base, phb_base; > > + const int max_phbs =3D > > + (SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE = - 1; > > int i; > > =20 > > - /* Do we have hotpluggable memory? */ > > - if (MACHINE(spapr)->maxram_size > ram_top) { > > - /* Can't just use maxram_size, because there may be an > > - * alignment gap between normal and hotpluggable memory > > - * regions */ > > - ram_top =3D spapr->hotplug_memory.base + > > - memory_region_size(&spapr->hotplug_memory.mr); > > - } > > - > > - phb0_base =3D QEMU_ALIGN_UP(ram_top, phb0_alignment); > > + /* Sanity check natural alignments */ > > + QEMU_BUILD_BUG_ON((SPAPR_PCI_BASE % SPAPR_PCI_MEM64_WIN_SIZE) !=3D= 0); > > + QEMU_BUILD_BUG_ON((SPAPR_PCI_LIMIT % SPAPR_PCI_MEM64_WIN_SIZE) != =3D 0); > > + QEMU_BUILD_BUG_ON((SPAPR_PCI_MEM64_WIN_SIZE % SPAPR_PCI_MEM32_WIN_= SIZE) !=3D 0); > > + QEMU_BUILD_BUG_ON((SPAPR_PCI_MEM32_WIN_SIZE % SPAPR_PCI_IO_WIN_SIZ= E) !=3D 0); > > + /* Sanity check bounds */ > > + QEMU_BUILD_BUG_ON((max_phbs * SPAPR_PCI_IO_WIN_SIZE) > SPAPR_PCI_M= EM32_WIN_SIZE); > > + QEMU_BUILD_BUG_ON((max_phbs * SPAPR_PCI_MEM32_WIN_SIZE) > SPAPR_PC= I_MEM64_WIN_SIZE); > > =20 > > - if (index > max_index) { > > + if (index >=3D max_phbs) { > > error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)= ", > > - max_index); > > + max_phbs - 1); > > return; > > } > > =20 > > @@ -2408,14 +2414,9 @@ static void spapr_phb_placement(sPAPRMachineStat= e *spapr, uint32_t index, > > liobns[i] =3D SPAPR_PCI_LIOBN(index, i); > > } > > =20 > > - phb_base =3D phb0_base + index * phb_spacing; > > - *pio =3D phb_base + pio_offset; > > - *mmio32 =3D phb_base + mmio_offset; > > - /* > > - * We don't set the 64-bit MMIO window, relying on the PHB's > > - * fallback behaviour of automatically splitting a large "32-bit" > > - * window into contiguous 32-bit and 64-bit windows > > - */ > > + *pio =3D SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE; > > + *mmio32 =3D SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM32_WIN_SIZ= E; > > + *mmio64 =3D SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM64_WIN_SIZ= E; > > } > > =20 > > static void spapr_machine_class_init(ObjectClass *oc, void *data) > > @@ -2519,8 +2520,67 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true); > > /* > > * pseries-2.7 > > */ > > -#define SPAPR_COMPAT_2_7 \ > > - HW_COMPAT_2_7 \ > > +#define SPAPR_COMPAT_2_7 \ > > + HW_COMPAT_2_7 \ > > + { \ > > + .driver =3D TYPE_SPAPR_PCI_HOST_BRIDGE, \ > > + .property =3D "mem_win_size", \ > > + .value =3D stringify(SPAPR_PCI_2_7_MMIO_WIN_SIZE),\ > > + }, \ > > + { \ > > + .driver =3D TYPE_SPAPR_PCI_HOST_BRIDGE, \ > > + .property =3D "mem64_win_size", \ > > + .value =3D "0", \ > > + }, > > + > > +static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index, > > + uint64_t *buid, hwaddr *pio, > > + hwaddr *mmio32, hwaddr *mmio64, > > + unsigned n_dma, uint32_t *liobns, Error = **errp) > > +{ > > + /* Legacy PHB placement for pseries-2.7 and earlier machine types = */ > > + const uint64_t base_buid =3D 0x800000020000000ULL; > > + const hwaddr phb_spacing =3D 0x1000000000ULL; /* 64 GiB */ > > + const hwaddr mmio_offset =3D 0xa0000000; /* 2 GiB + 512 MiB */ > > + const hwaddr pio_offset =3D 0x80000000; /* 2 GiB */ > > + const uint32_t max_index =3D 255; > > + const hwaddr phb0_alignment =3D 0x10000000000ULL; /* 1 TiB */ > > + > > + uint64_t ram_top =3D MACHINE(spapr)->ram_size; > > + hwaddr phb0_base, phb_base; > > + int i; > > + > > + /* Do we have hotpluggable memory? */ > > + if (MACHINE(spapr)->maxram_size > ram_top) { > > + /* Can't just use maxram_size, because there may be an > > + * alignment gap between normal and hotpluggable memory > > + * regions */ > > + ram_top =3D spapr->hotplug_memory.base + > > + memory_region_size(&spapr->hotplug_memory.mr); > > + } > > + > > + phb0_base =3D QEMU_ALIGN_UP(ram_top, phb0_alignment); > > + > > + if (index > max_index) { > > + error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)= ", > > + max_index); > > + return; > > + } > > + > > + *buid =3D base_buid + index; > > + for (i =3D 0; i < n_dma; ++i) { > > + liobns[i] =3D SPAPR_PCI_LIOBN(index, i); > > + } > > + > > + phb_base =3D phb0_base + index * phb_spacing; > > + *pio =3D phb_base + pio_offset; > > + *mmio32 =3D phb_base + mmio_offset; > > + /* > > + * We don't set the 64-bit MMIO window, relying on the PHB's > > + * fallback behaviour of automatically splitting a large "32-bit" > > + * window into contiguous 32-bit and 64-bit windows > > + */ > > +} > > =20 > > static void spapr_machine_2_7_instance_options(MachineState *machine) > > { > > @@ -2533,6 +2593,7 @@ static void spapr_machine_2_7_class_options(Machi= neClass *mc) > > spapr_machine_2_8_class_options(mc); > > smc->tcg_default_cpu =3D "POWER7"; > > SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_7); > > + smc->phb_placement =3D phb_placement_2_7; > > } > > =20 > > DEFINE_SPAPR_MACHINE(2_7, "2.7", false); > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > > index 31ca6fa..3ef6a26 100644 > > --- a/hw/ppc/spapr_pci.c > > +++ b/hw/ppc/spapr_pci.c > > @@ -1565,9 +1565,10 @@ static Property spapr_phb_properties[] =3D { > > DEFINE_PROP_UINT32("liobn64", sPAPRPHBState, dma_liobn[1], -1), > > DEFINE_PROP_UINT64("mem_win_addr", sPAPRPHBState, mem_win_addr, -1= ), > > DEFINE_PROP_UINT64("mem_win_size", sPAPRPHBState, mem_win_size, > > - SPAPR_PCI_MMIO_WIN_SIZE), > > + SPAPR_PCI_MEM32_WIN_SIZE), > > DEFINE_PROP_UINT64("mem64_win_addr", sPAPRPHBState, mem64_win_addr= , -1), > > - DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size= , 0), > > + DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size, > > + SPAPR_PCI_MEM64_WIN_SIZE), > > DEFINE_PROP_UINT64("mem64_win_pciaddr", sPAPRPHBState, mem64_win_p= ciaddr, > > -1), > > DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1), > > diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h > > index 23dfb09..b92c1b5 100644 > > --- a/include/hw/pci-host/spapr.h > > +++ b/include/hw/pci-host/spapr.h > > @@ -84,8 +84,14 @@ struct sPAPRPHBState { > > #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL > > #define SPAPR_PCI_MEM32_WIN_SIZE \ > > ((1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET) > > +#define SPAPR_PCI_MEM64_WIN_SIZE 0x10000000000ULL /* 1 TiB */ > > =20 > > -#define SPAPR_PCI_MMIO_WIN_SIZE 0xf80000000 > > +/* Without manual configuration, all PCI outbound windows will be > > + * within this range */ > > +#define SPAPR_PCI_BASE (1ULL << 45) /* 32 TiB */ > > +#define SPAPR_PCI_LIMIT (1ULL << 46) /* 64 TiB */ > > + > > +#define SPAPR_PCI_2_7_MMIO_WIN_SIZE 0xf80000000 > > #define SPAPR_PCI_IO_WIN_SIZE 0x10000 > > =20 > > #define SPAPR_PCI_MSI_WINDOW 0x40000000000ULL > > diff --git a/tests/endianness-test.c b/tests/endianness-test.c > > index b7a120e..cf8d41b 100644 > > --- a/tests/endianness-test.c > > +++ b/tests/endianness-test.c > > @@ -38,7 +38,8 @@ static const TestCase test_cases[] =3D { > > { "ppc", "prep", 0x80000000, .bswap =3D true }, > > { "ppc", "bamboo", 0xe8000000, .bswap =3D true, .superio =3D "i823= 78" }, > > { "ppc64", "mac99", 0xf2000000, .bswap =3D true, .superio =3D "i82= 378" }, > > - { "ppc64", "pseries", 0x10080000000ULL, > > + { "ppc64", "pseries", (1ULL << 45), .bswap =3D true, .superio =3D = "i82378" }, > > + { "ppc64", "pseries-2.7", 0x10080000000ULL, > > .bswap =3D true, .superio =3D "i82378" }, > > { "sh4", "r2d", 0xfe240000, .superio =3D "i82378" }, > > { "sh4eb", "r2d", 0xfe240000, .bswap =3D true, .superio =3D "i8237= 8" }, > > diff --git a/tests/libqos/pci-spapr.c b/tests/libqos/pci-spapr.c > > index 558dfc3..2eaaf91 100644 > > --- a/tests/libqos/pci-spapr.c > > +++ b/tests/libqos/pci-spapr.c > > @@ -235,10 +235,9 @@ static void qpci_spapr_iounmap(QPCIBus *bus, void = *data) > > /* FIXME */ > > } > > =20 > > -#define SPAPR_PCI_WINDOW_BASE 0x10000000000ULL > > -#define SPAPR_PCI_MMIO32_WIN_OFF 0xA0000000 > > +#define SPAPR_PCI_BASE (1ULL << 45) > > + > > #define SPAPR_PCI_MMIO32_WIN_SIZE 0x80000000 /* 2 GiB */ > > -#define SPAPR_PCI_IO_WIN_OFF 0x80000000 > > #define SPAPR_PCI_IO_WIN_SIZE 0x10000 > > =20 > > QPCIBus *qpci_init_spapr(QGuestAllocator *alloc) > > @@ -273,12 +272,12 @@ QPCIBus *qpci_init_spapr(QGuestAllocator *alloc) > > * get the window locations */ > > ret->buid =3D 0x800000020000000ULL; > > =20 > > - ret->pio_cpu_base =3D SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_IO_WIN_OFF; > > + ret->pio_cpu_base =3D SPAPR_PCI_BASE; > > ret->pio.pci_base =3D 0; > > ret->pio.size =3D SPAPR_PCI_IO_WIN_SIZE; > > =20 > > /* 32-bit portion of the MMIO window is at PCI address 2..4 GiB */ > > - ret->mmio32_cpu_base =3D SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_MMIO32_= WIN_OFF; > > + ret->mmio32_cpu_base =3D SPAPR_PCI_BASE + SPAPR_PCI_MMIO32_WIN_SIZ= E; > > ret->mmio32.pci_base =3D 0x80000000; /* 2 GiB */ > > ret->mmio32.size =3D SPAPR_PCI_MMIO32_WIN_SIZE; > > =20 > > diff --git a/tests/spapr-phb-test.c b/tests/spapr-phb-test.c > > index 21004a7..d3522ea 100644 > > --- a/tests/spapr-phb-test.c > > +++ b/tests/spapr-phb-test.c > > @@ -25,7 +25,7 @@ int main(int argc, char **argv) > > g_test_init(&argc, &argv, NULL); > > qtest_add_func("/spapr-phb/device", test_phb_device); > > =20 > > - qtest_start("-device " TYPE_SPAPR_PCI_HOST_BRIDGE ",index=3D100"); > > + qtest_start("-device " TYPE_SPAPR_PCI_HOST_BRIDGE ",index=3D30"); > > =20 > > ret =3D g_test_run(); > > =20 > >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --QVzQgM+zdZ3YWXqn Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJYAtKfAAoJEGw4ysog2bOSvmIP/3b8BJsgejWfH4czwowAyvWP R4oDAZG2fUpckpvjkhdTeAjB3GwCqRrktS1ld4TJz/j93zNgvUYdM8MeDTvo6WZY iVv8BYqrTgcn2WPVSvoW4VQ8UgOYLPWTqTmu6vdXPCPcLbHEHQgWnKRZLRo2r+7i ssePmxWTlLbFf81XwuY/LooXmRIH+vcQ+Mz/T+242+oFmH1BGgQrUB5WR4bEDyjg IH76ITCHPq8fGaSf4l/VKMNJRNKt2fqwyJu5xnjqtvR1buIkqgS+6za7bWRuCC81 mxD1YEtOYztXXyU8G1m34x1wrwrh8nuYRhtXeOmMnRxRkP9v5OPIunE3q2WHvmFl s+rkERJGgGvF+6H9iaozqxiPI/hOtvIjoJ3n9N4RQ/hMUqREq5cuIRIk7/33ekLH pjeeEGsDQ2hkP5OmDZKQp7FyAo6dRsvn+MfTmd5GuD24sV2CXcOgT7LYShNYJaTd KW80OB+/roRIQ0dDQyy2mxTpkB6wRFsY+mPKRV+jnctQQ1DeGBBlAkAMWGzOPFay RVduFFY9MC7xt/aL6QbhwFPN3p5gBp4wvgp51Ale+pziLHlrLwpo+fjXrJSCjlzE gnm54dkFpsbxwdMs4l5zHiTNb4MblqfazFAOFirCN8L/AdBW6B5UgqzEbwF01B7z 5u8tN/c8kJBJkurP+KKp =BAPW -----END PGP SIGNATURE----- --QVzQgM+zdZ3YWXqn--