qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: peter.maydell@linaro.org, agraf@suse.de, qemu-ppc@nongnu.org,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PULL 15/16] spapr_pci: Add a 64-bit MMIO window
Date: Tue, 8 Nov 2016 12:16:45 +1100	[thread overview]
Message-ID: <20161108011645.GC28688@umbus.fritz.box> (raw)
In-Reply-To: <f0d1ad47-00f7-3887-f4cd-d4fd79c455cf@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 15482 bytes --]

On Fri, Nov 04, 2016 at 04:03:31PM +1100, Alexey Kardashevskiy wrote:
> On 17/10/16 13:43, David Gibson wrote:
> > On real hardware, and under pHyp, the PCI host bridges on Power machines
> > typically advertise two outbound MMIO windows from the guest's physical
> > memory space to PCI memory space:
> >   - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
> >   - A 64-bit window which maps onto a large region somewhere high in PCI
> >     address space (traditionally this used an identity mapping from guest
> >     physical address to PCI address, but that's not always the case)
> > 
> > The qemu implementation in spapr-pci-host-bridge, however, only supports a
> > single outbound MMIO window, however.  At least some Linux versions expect
> > the two windows however, so we arranged this window to map onto the PCI
> > memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
> > windows, the "32-bit" window from 2G..4G and the "64-bit" window from
> > 4G..~64G.
> > 
> > This approach means, however, that the 64G window is not naturally aligned.
> > In turn this limits the size of the largest BAR we can map (which does have
> > to be naturally aligned) to roughly half of the total window.  With some
> > large nVidia GPGPU cards which have huge memory BARs, this is starting to
> > be a problem.
> > 
> > This patch adds true support for separate 32-bit and 64-bit outbound MMIO
> > windows to the spapr-pci-host-bridge implementation, each of which can
> > be independently configured.  The 32-bit window always maps to 2G.. in PCI
> > space, but the PCI address of the 64-bit window can be configured (it
> > defaults to the same as the guest physical address).
> > 
> > So as not to break possible existing configurations, as long as a 64-bit
> > window is not specified, a large single window can be specified.  This
> > will appear the same way to the guest as the old approach, although it's
> > now implemented by two contiguous memory regions rather than a single one.
> > 
> > For now, this only adds the possibility of 64-bit windows.  The default
> > configuration still uses the legacy mode.
> 
> 
> This breaks migration to QEMU v2.7, the destination reports:
> 
> 22901@1478235261.799031:vmstate_load spapr_pci, spapr_pci
> 22901@1478235261.799040:vmstate_load_field_error field "mem_win_size" load
> failed, ret = -22
> qemu-hostos1: error while loading state for instance 0x0 of device 'spapr_pci'
> 22901@1478235261.801324:migrate_set_state new state 7
> qemu-hostos1: load of migration failed: Invalid argument
> 
> 
> mem_win_size decreased from 0xf80000000 to 0x80000000.
> 
> I'd think it should be allowed to migrate like this.

AIUI, we don't generally care (upstream) about migration from newer to
older qemu, only from older to newer.  Trying to maintain backwards
migration makes it almost impossible to fix anything at all, ever.

> 
> 
> The source PHB is:
> 
> (qemu) info qtree
> bus: main-system-bus
>   type System
>   dev: spapr-pci-host-bridge, id ""
>     index = 0 (0x0)
>     buid = 576460752840294400 (0x800000020000000)
>     liobn = 2147483648 (0x80000000)
>     liobn64 = 4294967295 (0xffffffff)
>     mem_win_addr = 1102195982336 (0x100a0000000)
>     mem_win_size = 2147483648 (0x80000000)
>     mem64_win_addr = 1104343465984 (0x10120000000)
>     mem64_win_size = 64424509440 (0xf00000000)
>     mem64_win_pciaddr = 4294967296 (0x100000000)
> 
> 
> The destination PHB is:
> 
> (qemu) info qtree
> bus: main-system-bus
>   type System
>   dev: spapr-pci-host-bridge, id ""
>     index = 0 (0x0)
>     buid = 576460752840294400 (0x800000020000000)
>     liobn = 2147483648 (0x80000000)
>     liobn64 = 4294967295 (0xffffffff)
>     mem_win_addr = 1102195982336 (0x100a0000000)
>     mem_win_size = 66571993088 (0xf80000000)
> 
> 
> 
> The source QEMU cmdline:
> 
> /home/aik/p/qemu/ppc64-softmmu/qemu-system-ppc64 -nodefaults \
> -chardev stdio,id=STDIO0,signal=off,mux=on \
> -device spapr-vty,id=svty0,chardev=STDIO0,reg=0x71000100 \
> -mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none \
> -kernel /home/aik/t/vml450le \
> -initrd /home/aik/t/le.cpio -m 4G \
> -machine pseries-2.6 -enable-kvm \
> 
> 
> The destination (./qemu-hostos1 is v2.7.0 from
> https://github.com/open-power-host-os/qemu/commits/hostos-stable )
> 
> ./qemu-hostos1 -nodefaults \
> -chardev stdio,id=STDIO0,signal=off,mux=on \
> -device spapr-vty,id=svty0,chardev=STDIO0,reg=0x71000100 \
> -mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none -m 4G \
> -machine pseries-2.6 -enable-kvm \
> -mon chardev=SOCKET0,mode=readline -incoming "tcp:fstn1:22222"
> 
> 
> 
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > Reviewed-by: Laurent Vivier <lvivier@redhat.com>
> > ---
> >  hw/ppc/spapr.c              | 10 +++++--
> >  hw/ppc/spapr_pci.c          | 70 ++++++++++++++++++++++++++++++++++++---------
> >  include/hw/pci-host/spapr.h |  8 ++++--
> >  include/hw/ppc/spapr.h      |  3 +-
> >  4 files changed, 72 insertions(+), 19 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index d747e58..ea5d0e6 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2371,7 +2371,8 @@ static HotpluggableCPUList *spapr_query_hotpluggable_cpus(MachineState *machine)
> >  }
> >  
> >  static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
> > -                                uint64_t *buid, hwaddr *pio, hwaddr *mmio,
> > +                                uint64_t *buid, hwaddr *pio,
> > +                                hwaddr *mmio32, hwaddr *mmio64,
> >                                  unsigned n_dma, uint32_t *liobns, Error **errp)
> >  {
> >      const uint64_t base_buid = 0x800000020000000ULL;
> > @@ -2409,7 +2410,12 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
> >  
> >      phb_base = phb0_base + index * phb_spacing;
> >      *pio = phb_base + pio_offset;
> > -    *mmio = phb_base + mmio_offset;
> > +    *mmio32 = phb_base + mmio_offset;
> > +    /*
> > +     * We don't set the 64-bit MMIO window, relying on the PHB's
> > +     * fallback behaviour of automatically splitting a large "32-bit"
> > +     * window into contiguous 32-bit and 64-bit windows
> > +     */
> >  }
> >  
> >  static void spapr_machine_class_init(ObjectClass *oc, void *data)
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 8bd7f59..10d5de2 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1317,14 +1317,16 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
> >          if ((sphb->buid != (uint64_t)-1) || (sphb->dma_liobn[0] != (uint32_t)-1)
> >              || (sphb->dma_liobn[1] != (uint32_t)-1 && windows_supported == 2)
> >              || (sphb->mem_win_addr != (hwaddr)-1)
> > +            || (sphb->mem64_win_addr != (hwaddr)-1)
> >              || (sphb->io_win_addr != (hwaddr)-1)) {
> >              error_setg(errp, "Either \"index\" or other parameters must"
> >                         " be specified for PAPR PHB, not both");
> >              return;
> >          }
> >  
> > -        smc->phb_placement(spapr, sphb->index, &sphb->buid,
> > -                           &sphb->io_win_addr, &sphb->mem_win_addr,
> > +        smc->phb_placement(spapr, sphb->index,
> > +                           &sphb->buid, &sphb->io_win_addr,
> > +                           &sphb->mem_win_addr, &sphb->mem64_win_addr,
> >                             windows_supported, sphb->dma_liobn, &local_err);
> >          if (local_err) {
> >              error_propagate(errp, local_err);
> > @@ -1353,6 +1355,38 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
> >          return;
> >      }
> >  
> > +    if (sphb->mem64_win_size != 0) {
> > +        if (sphb->mem64_win_addr == (hwaddr)-1) {
> > +            error_setg(errp,
> > +                       "64-bit memory window address not specified for PHB");
> > +            return;
> > +        }
> > +
> > +        if (sphb->mem_win_size > SPAPR_PCI_MEM32_WIN_SIZE) {
> > +            error_setg(errp, "32-bit memory window of size 0x%"HWADDR_PRIx
> > +                       " (max 2 GiB)", sphb->mem_win_size);
> > +            return;
> > +        }
> > +
> > +        if (sphb->mem64_win_pciaddr == (hwaddr)-1) {
> > +            /* 64-bit window defaults to identity mapping */
> > +            sphb->mem64_win_pciaddr = sphb->mem64_win_addr;
> > +        }
> > +    } else if (sphb->mem_win_size > SPAPR_PCI_MEM32_WIN_SIZE) {
> > +        /*
> > +         * For compatibility with old configuration, if no 64-bit MMIO
> > +         * window is specified, but the ordinary (32-bit) memory
> > +         * window is specified as > 2GiB, we treat it as a 2GiB 32-bit
> > +         * window, with a 64-bit MMIO window following on immediately
> > +         * afterwards
> > +         */
> > +        sphb->mem64_win_size = sphb->mem_win_size - SPAPR_PCI_MEM32_WIN_SIZE;
> > +        sphb->mem64_win_addr = sphb->mem_win_addr + SPAPR_PCI_MEM32_WIN_SIZE;
> > +        sphb->mem64_win_pciaddr =
> > +            SPAPR_PCI_MEM_WIN_BUS_OFFSET + SPAPR_PCI_MEM32_WIN_SIZE;
> > +        sphb->mem_win_size = SPAPR_PCI_MEM32_WIN_SIZE;
> > +    }
> > +
> >      if (spapr_pci_find_phb(spapr, sphb->buid)) {
> >          error_setg(errp, "PCI host bridges must have unique BUIDs");
> >          return;
> > @@ -1366,12 +1400,19 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
> >      sprintf(namebuf, "%s.mmio", sphb->dtbusname);
> >      memory_region_init(&sphb->memspace, OBJECT(sphb), namebuf, UINT64_MAX);
> >  
> > -    sprintf(namebuf, "%s.mmio-alias", sphb->dtbusname);
> > -    memory_region_init_alias(&sphb->memwindow, OBJECT(sphb),
> > +    sprintf(namebuf, "%s.mmio32-alias", sphb->dtbusname);
> > +    memory_region_init_alias(&sphb->mem32window, OBJECT(sphb),
> >                               namebuf, &sphb->memspace,
> >                               SPAPR_PCI_MEM_WIN_BUS_OFFSET, sphb->mem_win_size);
> >      memory_region_add_subregion(get_system_memory(), sphb->mem_win_addr,
> > -                                &sphb->memwindow);
> > +                                &sphb->mem32window);
> > +
> > +    sprintf(namebuf, "%s.mmio64-alias", sphb->dtbusname);
> > +    memory_region_init_alias(&sphb->mem64window, OBJECT(sphb),
> > +                             namebuf, &sphb->memspace,
> > +                             sphb->mem64_win_pciaddr, sphb->mem64_win_size);
> > +    memory_region_add_subregion(get_system_memory(), sphb->mem64_win_addr,
> > +                                &sphb->mem64window);
> >  
> >      /* Initialize IO regions */
> >      sprintf(namebuf, "%s.io", sphb->dtbusname);
> > @@ -1524,6 +1565,10 @@ static Property spapr_phb_properties[] = {
> >      DEFINE_PROP_UINT64("mem_win_addr", sPAPRPHBState, mem_win_addr, -1),
> >      DEFINE_PROP_UINT64("mem_win_size", sPAPRPHBState, mem_win_size,
> >                         SPAPR_PCI_MMIO_WIN_SIZE),
> > +    DEFINE_PROP_UINT64("mem64_win_addr", sPAPRPHBState, mem64_win_addr, -1),
> > +    DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size, 0),
> > +    DEFINE_PROP_UINT64("mem64_win_pciaddr", sPAPRPHBState, mem64_win_pciaddr,
> > +                       -1),
> >      DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
> >      DEFINE_PROP_UINT64("io_win_size", sPAPRPHBState, io_win_size,
> >                         SPAPR_PCI_IO_WIN_SIZE),
> > @@ -1759,10 +1804,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> >      int bus_off, i, j, ret;
> >      char nodename[FDT_NAME_MAX];
> >      uint32_t bus_range[] = { cpu_to_be32(0), cpu_to_be32(0xff) };
> > -    const uint64_t mmiosize = memory_region_size(&phb->memwindow);
> > -    const uint64_t w32max = (1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET;
> > -    const uint64_t w32size = MIN(w32max, mmiosize);
> > -    const uint64_t w64size = (mmiosize > w32size) ? (mmiosize - w32size) : 0;
> >      struct {
> >          uint32_t hi;
> >          uint64_t child;
> > @@ -1777,15 +1818,16 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> >          {
> >              cpu_to_be32(b_ss(2)), cpu_to_be64(SPAPR_PCI_MEM_WIN_BUS_OFFSET),
> >              cpu_to_be64(phb->mem_win_addr),
> > -            cpu_to_be64(w32size),
> > +            cpu_to_be64(phb->mem_win_size),
> >          },
> >          {
> > -            cpu_to_be32(b_ss(3)), cpu_to_be64(1ULL << 32),
> > -            cpu_to_be64(phb->mem_win_addr + w32size),
> > -            cpu_to_be64(w64size)
> > +            cpu_to_be32(b_ss(3)), cpu_to_be64(phb->mem64_win_pciaddr),
> > +            cpu_to_be64(phb->mem64_win_addr),
> > +            cpu_to_be64(phb->mem64_win_size),
> >          },
> >      };
> > -    const unsigned sizeof_ranges = (w64size ? 3 : 2) * sizeof(ranges[0]);
> > +    const unsigned sizeof_ranges =
> > +        (phb->mem64_win_size ? 3 : 2) * sizeof(ranges[0]);
> >      uint64_t bus_reg[] = { cpu_to_be64(phb->buid), 0 };
> >      uint32_t interrupt_map_mask[] = {
> >          cpu_to_be32(b_ddddd(-1)|b_fff(0)), 0x0, 0x0, cpu_to_be32(-1)};
> > diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> > index 8c9ebfd..23dfb09 100644
> > --- a/include/hw/pci-host/spapr.h
> > +++ b/include/hw/pci-host/spapr.h
> > @@ -53,8 +53,10 @@ struct sPAPRPHBState {
> >      bool dr_enabled;
> >  
> >      MemoryRegion memspace, iospace;
> > -    hwaddr mem_win_addr, mem_win_size, io_win_addr, io_win_size;
> > -    MemoryRegion memwindow, iowindow, msiwindow;
> > +    hwaddr mem_win_addr, mem_win_size, mem64_win_addr, mem64_win_size;
> > +    uint64_t mem64_win_pciaddr;
> > +    hwaddr io_win_addr, io_win_size;
> > +    MemoryRegion mem32window, mem64window, iowindow, msiwindow;
> >  
> >      uint32_t dma_liobn[SPAPR_PCI_DMA_MAX_WINDOWS];
> >      hwaddr dma_win_addr, dma_win_size;
> > @@ -80,6 +82,8 @@ struct sPAPRPHBState {
> >  };
> >  
> >  #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
> > +#define SPAPR_PCI_MEM32_WIN_SIZE     \
> > +    ((1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET)
> >  
> >  #define SPAPR_PCI_MMIO_WIN_SIZE      0xf80000000
> >  #define SPAPR_PCI_IO_WIN_SIZE        0x10000
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index a05783c..aeaba3e 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -41,7 +41,8 @@ struct sPAPRMachineClass {
> >      bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
> >      const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default */
> >      void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
> > -                          uint64_t *buid, hwaddr *pio, hwaddr *mmio,
> > +                          uint64_t *buid, hwaddr *pio, 
> > +                          hwaddr *mmio32, hwaddr *mmio64,
> >                            unsigned n_dma, uint32_t *liobns, Error **errp);
> >  };
> >  
> > 
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2016-11-08  2:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-17  2:43 [Qemu-devel] [PULL 00/16] ppc-for-2.8 queue 20161017 David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 01/16] tests: minor cleanups in usb-hcd-uhci-test David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 02/16] qtest: ask endianness of the target in qtest_init() David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 03/16] tests/boot-sector: Use minimum length for the Forth boot script David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 04/16] tests/boot-sector: Use mkstemp() to create a unique file name David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 05/16] tests/boot-sector: Increase time-out to 90 seconds David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 06/16] target-ppc: implement vexts[bh]2w and vexts[bhw]2d David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 07/16] spapr: fix inheritance chain for default machine options David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 08/16] ppc/xics: Make the ICSState a list David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 09/16] ppc/xics: Split ICS into ics-base and ics class David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 10/16] libqos: Isolate knowledge of spapr memory map to qpci_init_spapr() David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 11/16] libqos: Correct error in PCI hole sizing for spapr David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 12/16] libqos: Limit spapr-pci to 32-bit MMIO for now David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 13/16] spapr_pci: Delegate placement of PCI host bridges to machine type David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 14/16] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 15/16] spapr_pci: Add a 64-bit MMIO window David Gibson
2016-11-04  5:03   ` Alexey Kardashevskiy
2016-11-08  1:16     ` David Gibson [this message]
2016-11-08  3:59       ` Alexey Kardashevskiy
2016-11-09  3:42         ` David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 16/16] spapr: Improved placement of PCI host bridges in guest memory map David Gibson
2016-10-17  3:16 ` [Qemu-devel] [PULL 00/16] ppc-for-2.8 queue 20161017 no-reply
2016-10-17 12:55 ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161108011645.GC28688@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).