From: David Gibson <david@gibson.dropbear.id.au>
To: Laurent Vivier <lvivier@redhat.com>
Cc: agraf@suse.de, mst@redhat.com, abologna@redhat.com,
aik@ozlabs.ru, qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
mdroth@linux.vnet.ibm.com, benh@kernel.crashing.org
Subject: Re: [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map
Date: Sun, 16 Oct 2016 12:06:42 +1100 [thread overview]
Message-ID: <20161016010642.GW28562@umbus> (raw)
In-Reply-To: <8796bab4-44a7-739b-9b02-4e8bedb9f684@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 15312 bytes --]
On Fri, Oct 14, 2016 at 09:25:58AM +0200, Laurent Vivier wrote:
> On 14/10/2016 01:29, David Gibson wrote:
> > From 98f07c09c6d72218800d6cfbf44b973a88ece2aa Mon Sep 17 00:00:00 2001
> > From: David Gibson <david@gibson.dropbear.id.au>
> > Date: Fri, 14 Oct 2016 10:21:00 +1100
> > Subject: [PATCH] spapr: Improved placement of PCI host bridges in guest memory
> > map
> >
> > Currently, the MMIO space for accessing PCI on pseries guests begins at
> > 1 TiB in guest address space. Each PCI host bridge (PHB) has a 64 GiB
> > chunk of address space in which it places its outbound PIO and 32-bit and
> > 64-bit MMIO windows.
> >
> > This scheme as several problems:
> > - It limits guest RAM to 1 TiB (though we have a limited fix for this
> > now)
> > - It limits the total MMIO window to 64 GiB. This is not always enough
> > for some of the large nVidia GPGPU cards
> > - Putting all the windows into a single 64 GiB area means that naturally
> > aligning things within there will waste more address space.
> > In addition there was a miscalculation in some of the defaults, which meant
> > that the MMIO windows for each PHB actually slightly overran the 64 GiB
> > region for that PHB. We got away without nasty consequences because
> > the overrun fit within an unused area at the beginning of the next PHB's
> > region, but it's not pretty.
> >
> > This patch implements a new scheme which addresses those problems, and is
> > also closer to what bare metal hardware and pHyp guests generally use.
> >
> > Because some guest versions (including most current distro kernels) can't
> > access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
> > 64 TiB. This is broken into 1 TiB chunks. The 1 TiB contains the PIO
> > (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs. Each
> > subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
> > one PHB each.
> >
> > This reduces the number of allowed PHBs (without full manual configuration
> > of all the windows) from 256 to 31, but this should still be plenty in
> > practice.
> >
> > We also change some of the default window sizes for manually configured
> > PHBs to saner values.
> >
> > Finally we adjust some tests and libqos so that it correctly uses the new
> > default locations. Ideally it would parse the device tree given to the
> > guest, but that's a more complex problem for another time.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>
>
> I really like this new version.
>
> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Great! I've merged it into ppc-for-2.8.
>
> Laurent
>
> > ---
> > hw/ppc/spapr.c | 121 +++++++++++++++++++++++++++++++++-----------
> > hw/ppc/spapr_pci.c | 5 +-
> > include/hw/pci-host/spapr.h | 8 ++-
> > tests/endianness-test.c | 3 +-
> > tests/libqos/pci-spapr.c | 9 ++--
> > tests/spapr-phb-test.c | 2 +-
> > 6 files changed, 108 insertions(+), 40 deletions(-)
> >
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 8db3657..4bdf15b 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2375,31 +2375,37 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
> > hwaddr *mmio32, hwaddr *mmio64,
> > unsigned n_dma, uint32_t *liobns, Error **errp)
> > {
> > + /*
> > + * New-style PHB window placement.
> > + *
> > + * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
> > + * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
> > + * windows.
> > + *
> > + * Some guest kernels can't work with MMIO windows above 1<<46
> > + * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
> > + *
> > + * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
> > + * PHBs. 33..34TiB has the 64-bit MMIO window for PHB0, 34..35
> > + * has the 64-bit window for PHB1 and so forth.
> > + */
> > const uint64_t base_buid = 0x800000020000000ULL;
> > - const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
> > - const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
> > - const hwaddr pio_offset = 0x80000000; /* 2 GiB */
> > - const uint32_t max_index = 255;
> > - const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
> > -
> > - uint64_t ram_top = MACHINE(spapr)->ram_size;
> > - hwaddr phb0_base, phb_base;
> > + const int max_phbs =
> > + (SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
> > int i;
> >
> > - /* Do we have hotpluggable memory? */
> > - if (MACHINE(spapr)->maxram_size > ram_top) {
> > - /* Can't just use maxram_size, because there may be an
> > - * alignment gap between normal and hotpluggable memory
> > - * regions */
> > - ram_top = spapr->hotplug_memory.base +
> > - memory_region_size(&spapr->hotplug_memory.mr);
> > - }
> > -
> > - phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
> > + /* Sanity check natural alignments */
> > + QEMU_BUILD_BUG_ON((SPAPR_PCI_BASE % SPAPR_PCI_MEM64_WIN_SIZE) != 0);
> > + QEMU_BUILD_BUG_ON((SPAPR_PCI_LIMIT % SPAPR_PCI_MEM64_WIN_SIZE) != 0);
> > + QEMU_BUILD_BUG_ON((SPAPR_PCI_MEM64_WIN_SIZE % SPAPR_PCI_MEM32_WIN_SIZE) != 0);
> > + QEMU_BUILD_BUG_ON((SPAPR_PCI_MEM32_WIN_SIZE % SPAPR_PCI_IO_WIN_SIZE) != 0);
> > + /* Sanity check bounds */
> > + QEMU_BUILD_BUG_ON((max_phbs * SPAPR_PCI_IO_WIN_SIZE) > SPAPR_PCI_MEM32_WIN_SIZE);
> > + QEMU_BUILD_BUG_ON((max_phbs * SPAPR_PCI_MEM32_WIN_SIZE) > SPAPR_PCI_MEM64_WIN_SIZE);
> >
> > - if (index > max_index) {
> > + if (index >= max_phbs) {
> > error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
> > - max_index);
> > + max_phbs - 1);
> > return;
> > }
> >
> > @@ -2408,14 +2414,9 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
> > liobns[i] = SPAPR_PCI_LIOBN(index, i);
> > }
> >
> > - phb_base = phb0_base + index * phb_spacing;
> > - *pio = phb_base + pio_offset;
> > - *mmio32 = phb_base + mmio_offset;
> > - /*
> > - * We don't set the 64-bit MMIO window, relying on the PHB's
> > - * fallback behaviour of automatically splitting a large "32-bit"
> > - * window into contiguous 32-bit and 64-bit windows
> > - */
> > + *pio = SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE;
> > + *mmio32 = SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM32_WIN_SIZE;
> > + *mmio64 = SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM64_WIN_SIZE;
> > }
> >
> > static void spapr_machine_class_init(ObjectClass *oc, void *data)
> > @@ -2519,8 +2520,67 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
> > /*
> > * pseries-2.7
> > */
> > -#define SPAPR_COMPAT_2_7 \
> > - HW_COMPAT_2_7 \
> > +#define SPAPR_COMPAT_2_7 \
> > + HW_COMPAT_2_7 \
> > + { \
> > + .driver = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> > + .property = "mem_win_size", \
> > + .value = stringify(SPAPR_PCI_2_7_MMIO_WIN_SIZE),\
> > + }, \
> > + { \
> > + .driver = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> > + .property = "mem64_win_size", \
> > + .value = "0", \
> > + },
> > +
> > +static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
> > + uint64_t *buid, hwaddr *pio,
> > + hwaddr *mmio32, hwaddr *mmio64,
> > + unsigned n_dma, uint32_t *liobns, Error **errp)
> > +{
> > + /* Legacy PHB placement for pseries-2.7 and earlier machine types */
> > + const uint64_t base_buid = 0x800000020000000ULL;
> > + const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
> > + const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
> > + const hwaddr pio_offset = 0x80000000; /* 2 GiB */
> > + const uint32_t max_index = 255;
> > + const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
> > +
> > + uint64_t ram_top = MACHINE(spapr)->ram_size;
> > + hwaddr phb0_base, phb_base;
> > + int i;
> > +
> > + /* Do we have hotpluggable memory? */
> > + if (MACHINE(spapr)->maxram_size > ram_top) {
> > + /* Can't just use maxram_size, because there may be an
> > + * alignment gap between normal and hotpluggable memory
> > + * regions */
> > + ram_top = spapr->hotplug_memory.base +
> > + memory_region_size(&spapr->hotplug_memory.mr);
> > + }
> > +
> > + phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
> > +
> > + if (index > max_index) {
> > + error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
> > + max_index);
> > + return;
> > + }
> > +
> > + *buid = base_buid + index;
> > + for (i = 0; i < n_dma; ++i) {
> > + liobns[i] = SPAPR_PCI_LIOBN(index, i);
> > + }
> > +
> > + phb_base = phb0_base + index * phb_spacing;
> > + *pio = phb_base + pio_offset;
> > + *mmio32 = phb_base + mmio_offset;
> > + /*
> > + * We don't set the 64-bit MMIO window, relying on the PHB's
> > + * fallback behaviour of automatically splitting a large "32-bit"
> > + * window into contiguous 32-bit and 64-bit windows
> > + */
> > +}
> >
> > static void spapr_machine_2_7_instance_options(MachineState *machine)
> > {
> > @@ -2533,6 +2593,7 @@ static void spapr_machine_2_7_class_options(MachineClass *mc)
> > spapr_machine_2_8_class_options(mc);
> > smc->tcg_default_cpu = "POWER7";
> > SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_7);
> > + smc->phb_placement = phb_placement_2_7;
> > }
> >
> > DEFINE_SPAPR_MACHINE(2_7, "2.7", false);
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 31ca6fa..3ef6a26 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1565,9 +1565,10 @@ static Property spapr_phb_properties[] = {
> > DEFINE_PROP_UINT32("liobn64", sPAPRPHBState, dma_liobn[1], -1),
> > DEFINE_PROP_UINT64("mem_win_addr", sPAPRPHBState, mem_win_addr, -1),
> > DEFINE_PROP_UINT64("mem_win_size", sPAPRPHBState, mem_win_size,
> > - SPAPR_PCI_MMIO_WIN_SIZE),
> > + SPAPR_PCI_MEM32_WIN_SIZE),
> > DEFINE_PROP_UINT64("mem64_win_addr", sPAPRPHBState, mem64_win_addr, -1),
> > - DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size, 0),
> > + DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size,
> > + SPAPR_PCI_MEM64_WIN_SIZE),
> > DEFINE_PROP_UINT64("mem64_win_pciaddr", sPAPRPHBState, mem64_win_pciaddr,
> > -1),
> > DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
> > diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> > index 23dfb09..b92c1b5 100644
> > --- a/include/hw/pci-host/spapr.h
> > +++ b/include/hw/pci-host/spapr.h
> > @@ -84,8 +84,14 @@ struct sPAPRPHBState {
> > #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
> > #define SPAPR_PCI_MEM32_WIN_SIZE \
> > ((1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET)
> > +#define SPAPR_PCI_MEM64_WIN_SIZE 0x10000000000ULL /* 1 TiB */
> >
> > -#define SPAPR_PCI_MMIO_WIN_SIZE 0xf80000000
> > +/* Without manual configuration, all PCI outbound windows will be
> > + * within this range */
> > +#define SPAPR_PCI_BASE (1ULL << 45) /* 32 TiB */
> > +#define SPAPR_PCI_LIMIT (1ULL << 46) /* 64 TiB */
> > +
> > +#define SPAPR_PCI_2_7_MMIO_WIN_SIZE 0xf80000000
> > #define SPAPR_PCI_IO_WIN_SIZE 0x10000
> >
> > #define SPAPR_PCI_MSI_WINDOW 0x40000000000ULL
> > diff --git a/tests/endianness-test.c b/tests/endianness-test.c
> > index b7a120e..cf8d41b 100644
> > --- a/tests/endianness-test.c
> > +++ b/tests/endianness-test.c
> > @@ -38,7 +38,8 @@ static const TestCase test_cases[] = {
> > { "ppc", "prep", 0x80000000, .bswap = true },
> > { "ppc", "bamboo", 0xe8000000, .bswap = true, .superio = "i82378" },
> > { "ppc64", "mac99", 0xf2000000, .bswap = true, .superio = "i82378" },
> > - { "ppc64", "pseries", 0x10080000000ULL,
> > + { "ppc64", "pseries", (1ULL << 45), .bswap = true, .superio = "i82378" },
> > + { "ppc64", "pseries-2.7", 0x10080000000ULL,
> > .bswap = true, .superio = "i82378" },
> > { "sh4", "r2d", 0xfe240000, .superio = "i82378" },
> > { "sh4eb", "r2d", 0xfe240000, .bswap = true, .superio = "i82378" },
> > diff --git a/tests/libqos/pci-spapr.c b/tests/libqos/pci-spapr.c
> > index 558dfc3..2eaaf91 100644
> > --- a/tests/libqos/pci-spapr.c
> > +++ b/tests/libqos/pci-spapr.c
> > @@ -235,10 +235,9 @@ static void qpci_spapr_iounmap(QPCIBus *bus, void *data)
> > /* FIXME */
> > }
> >
> > -#define SPAPR_PCI_WINDOW_BASE 0x10000000000ULL
> > -#define SPAPR_PCI_MMIO32_WIN_OFF 0xA0000000
> > +#define SPAPR_PCI_BASE (1ULL << 45)
> > +
> > #define SPAPR_PCI_MMIO32_WIN_SIZE 0x80000000 /* 2 GiB */
> > -#define SPAPR_PCI_IO_WIN_OFF 0x80000000
> > #define SPAPR_PCI_IO_WIN_SIZE 0x10000
> >
> > QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
> > @@ -273,12 +272,12 @@ QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
> > * get the window locations */
> > ret->buid = 0x800000020000000ULL;
> >
> > - ret->pio_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_IO_WIN_OFF;
> > + ret->pio_cpu_base = SPAPR_PCI_BASE;
> > ret->pio.pci_base = 0;
> > ret->pio.size = SPAPR_PCI_IO_WIN_SIZE;
> >
> > /* 32-bit portion of the MMIO window is at PCI address 2..4 GiB */
> > - ret->mmio32_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_MMIO32_WIN_OFF;
> > + ret->mmio32_cpu_base = SPAPR_PCI_BASE + SPAPR_PCI_MMIO32_WIN_SIZE;
> > ret->mmio32.pci_base = 0x80000000; /* 2 GiB */
> > ret->mmio32.size = SPAPR_PCI_MMIO32_WIN_SIZE;
> >
> > diff --git a/tests/spapr-phb-test.c b/tests/spapr-phb-test.c
> > index 21004a7..d3522ea 100644
> > --- a/tests/spapr-phb-test.c
> > +++ b/tests/spapr-phb-test.c
> > @@ -25,7 +25,7 @@ int main(int argc, char **argv)
> > g_test_init(&argc, &argv, NULL);
> > qtest_add_func("/spapr-phb/device", test_phb_device);
> >
> > - qtest_start("-device " TYPE_SPAPR_PCI_HOST_BRIDGE ",index=100");
> > + qtest_start("-device " TYPE_SPAPR_PCI_HOST_BRIDGE ",index=30");
> >
> > ret = g_test_run();
> >
> >
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2016-10-16 1:22 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-12 23:57 [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries David Gibson
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 1/7] libqos: Isolate knowledge of spapr memory map to qpci_init_spapr() David Gibson
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 2/7] libqos: Correct error in PCI hole sizing for spapr David Gibson
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 3/7] libqos: Limit spapr-pci to 32-bit MMIO for now David Gibson
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 4/7] spapr_pci: Delegate placement of PCI host bridges to machine type David Gibson
2016-10-13 7:33 ` Laurent Vivier
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 5/7] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM David Gibson
2016-10-13 7:35 ` Laurent Vivier
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 6/7] spapr_pci: Add a 64-bit MMIO window David Gibson
2016-10-13 8:06 ` Laurent Vivier
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map David Gibson
2016-10-13 8:40 ` Laurent Vivier
2016-10-13 23:29 ` David Gibson
2016-10-14 7:25 ` Laurent Vivier
2016-10-16 1:06 ` David Gibson [this message]
2016-10-14 4:10 ` [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries no-reply
2016-10-14 5:41 ` David Gibson
2016-10-14 7:07 ` Laurent Vivier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161016010642.GW28562@umbus \
--to=david@gibson.dropbear.id.au \
--cc=abologna@redhat.com \
--cc=agraf@suse.de \
--cc=aik@ozlabs.ru \
--cc=benh@kernel.crashing.org \
--cc=lvivier@redhat.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.