qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: peter.maydell@linaro.org, agraf@suse.de, qemu-ppc@nongnu.org,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PULL 15/16] spapr_pci: Add a 64-bit MMIO window
Date: Tue, 8 Nov 2016 14:59:30 +1100	[thread overview]
Message-ID: <4424b1f9-669e-9d6b-022d-e9c3c86c117e@ozlabs.ru> (raw)
In-Reply-To: <20161108011645.GC28688@umbus.fritz.box>

[-- Attachment #1: Type: text/plain, Size: 15536 bytes --]

On 08/11/16 12:16, David Gibson wrote:
> On Fri, Nov 04, 2016 at 04:03:31PM +1100, Alexey Kardashevskiy wrote:
>> On 17/10/16 13:43, David Gibson wrote:
>>> On real hardware, and under pHyp, the PCI host bridges on Power machines
>>> typically advertise two outbound MMIO windows from the guest's physical
>>> memory space to PCI memory space:
>>>   - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
>>>   - A 64-bit window which maps onto a large region somewhere high in PCI
>>>     address space (traditionally this used an identity mapping from guest
>>>     physical address to PCI address, but that's not always the case)
>>>
>>> The qemu implementation in spapr-pci-host-bridge, however, only supports a
>>> single outbound MMIO window, however.  At least some Linux versions expect
>>> the two windows however, so we arranged this window to map onto the PCI
>>> memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
>>> windows, the "32-bit" window from 2G..4G and the "64-bit" window from
>>> 4G..~64G.
>>>
>>> This approach means, however, that the 64G window is not naturally aligned.
>>> In turn this limits the size of the largest BAR we can map (which does have
>>> to be naturally aligned) to roughly half of the total window.  With some
>>> large nVidia GPGPU cards which have huge memory BARs, this is starting to
>>> be a problem.
>>>
>>> This patch adds true support for separate 32-bit and 64-bit outbound MMIO
>>> windows to the spapr-pci-host-bridge implementation, each of which can
>>> be independently configured.  The 32-bit window always maps to 2G.. in PCI
>>> space, but the PCI address of the 64-bit window can be configured (it
>>> defaults to the same as the guest physical address).
>>>
>>> So as not to break possible existing configurations, as long as a 64-bit
>>> window is not specified, a large single window can be specified.  This
>>> will appear the same way to the guest as the old approach, although it's
>>> now implemented by two contiguous memory regions rather than a single one.
>>>
>>> For now, this only adds the possibility of 64-bit windows.  The default
>>> configuration still uses the legacy mode.
>>
>>
>> This breaks migration to QEMU v2.7, the destination reports:
>>
>> 22901@1478235261.799031:vmstate_load spapr_pci, spapr_pci
>> 22901@1478235261.799040:vmstate_load_field_error field "mem_win_size" load
>> failed, ret = -22
>> qemu-hostos1: error while loading state for instance 0x0 of device 'spapr_pci'
>> 22901@1478235261.801324:migrate_set_state new state 7
>> qemu-hostos1: load of migration failed: Invalid argument
>>
>>
>> mem_win_size decreased from 0xf80000000 to 0x80000000.
>>
>> I'd think it should be allowed to migrate like this.
> 
> AIUI, we don't generally care (upstream) about migration from newer to
> older qemu, only from older to newer. 

Older (v2.7.0) to newer (current upstream with -machine pseries-2.7) does
not work either with the exact same symptom.



> Trying to maintain backwards
> migration makes it almost impossible to fix anything at all, ever.
> 
>>
>>
>> The source PHB is:
>>
>> (qemu) info qtree
>> bus: main-system-bus
>>   type System
>>   dev: spapr-pci-host-bridge, id ""
>>     index = 0 (0x0)
>>     buid = 576460752840294400 (0x800000020000000)
>>     liobn = 2147483648 (0x80000000)
>>     liobn64 = 4294967295 (0xffffffff)
>>     mem_win_addr = 1102195982336 (0x100a0000000)
>>     mem_win_size = 2147483648 (0x80000000)
>>     mem64_win_addr = 1104343465984 (0x10120000000)
>>     mem64_win_size = 64424509440 (0xf00000000)
>>     mem64_win_pciaddr = 4294967296 (0x100000000)
>>
>>
>> The destination PHB is:
>>
>> (qemu) info qtree
>> bus: main-system-bus
>>   type System
>>   dev: spapr-pci-host-bridge, id ""
>>     index = 0 (0x0)
>>     buid = 576460752840294400 (0x800000020000000)
>>     liobn = 2147483648 (0x80000000)
>>     liobn64 = 4294967295 (0xffffffff)
>>     mem_win_addr = 1102195982336 (0x100a0000000)
>>     mem_win_size = 66571993088 (0xf80000000)
>>
>>
>>
>> The source QEMU cmdline:
>>
>> /home/aik/p/qemu/ppc64-softmmu/qemu-system-ppc64 -nodefaults \
>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>> -device spapr-vty,id=svty0,chardev=STDIO0,reg=0x71000100 \
>> -mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none \
>> -kernel /home/aik/t/vml450le \
>> -initrd /home/aik/t/le.cpio -m 4G \
>> -machine pseries-2.6 -enable-kvm \
>>
>>
>> The destination (./qemu-hostos1 is v2.7.0 from
>> https://github.com/open-power-host-os/qemu/commits/hostos-stable )
>>
>> ./qemu-hostos1 -nodefaults \
>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>> -device spapr-vty,id=svty0,chardev=STDIO0,reg=0x71000100 \
>> -mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none -m 4G \
>> -machine pseries-2.6 -enable-kvm \
>> -mon chardev=SOCKET0,mode=readline -incoming "tcp:fstn1:22222"
>>
>>
>>
>>>
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
>>> ---
>>>  hw/ppc/spapr.c              | 10 +++++--
>>>  hw/ppc/spapr_pci.c          | 70 ++++++++++++++++++++++++++++++++++++---------
>>>  include/hw/pci-host/spapr.h |  8 ++++--
>>>  include/hw/ppc/spapr.h      |  3 +-
>>>  4 files changed, 72 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index d747e58..ea5d0e6 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -2371,7 +2371,8 @@ static HotpluggableCPUList *spapr_query_hotpluggable_cpus(MachineState *machine)
>>>  }
>>>  
>>>  static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>>> -                                uint64_t *buid, hwaddr *pio, hwaddr *mmio,
>>> +                                uint64_t *buid, hwaddr *pio,
>>> +                                hwaddr *mmio32, hwaddr *mmio64,
>>>                                  unsigned n_dma, uint32_t *liobns, Error **errp)
>>>  {
>>>      const uint64_t base_buid = 0x800000020000000ULL;
>>> @@ -2409,7 +2410,12 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>>>  
>>>      phb_base = phb0_base + index * phb_spacing;
>>>      *pio = phb_base + pio_offset;
>>> -    *mmio = phb_base + mmio_offset;
>>> +    *mmio32 = phb_base + mmio_offset;
>>> +    /*
>>> +     * We don't set the 64-bit MMIO window, relying on the PHB's
>>> +     * fallback behaviour of automatically splitting a large "32-bit"
>>> +     * window into contiguous 32-bit and 64-bit windows
>>> +     */
>>>  }
>>>  
>>>  static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>> index 8bd7f59..10d5de2 100644
>>> --- a/hw/ppc/spapr_pci.c
>>> +++ b/hw/ppc/spapr_pci.c
>>> @@ -1317,14 +1317,16 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>>>          if ((sphb->buid != (uint64_t)-1) || (sphb->dma_liobn[0] != (uint32_t)-1)
>>>              || (sphb->dma_liobn[1] != (uint32_t)-1 && windows_supported == 2)
>>>              || (sphb->mem_win_addr != (hwaddr)-1)
>>> +            || (sphb->mem64_win_addr != (hwaddr)-1)
>>>              || (sphb->io_win_addr != (hwaddr)-1)) {
>>>              error_setg(errp, "Either \"index\" or other parameters must"
>>>                         " be specified for PAPR PHB, not both");
>>>              return;
>>>          }
>>>  
>>> -        smc->phb_placement(spapr, sphb->index, &sphb->buid,
>>> -                           &sphb->io_win_addr, &sphb->mem_win_addr,
>>> +        smc->phb_placement(spapr, sphb->index,
>>> +                           &sphb->buid, &sphb->io_win_addr,
>>> +                           &sphb->mem_win_addr, &sphb->mem64_win_addr,
>>>                             windows_supported, sphb->dma_liobn, &local_err);
>>>          if (local_err) {
>>>              error_propagate(errp, local_err);
>>> @@ -1353,6 +1355,38 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>>>          return;
>>>      }
>>>  
>>> +    if (sphb->mem64_win_size != 0) {
>>> +        if (sphb->mem64_win_addr == (hwaddr)-1) {
>>> +            error_setg(errp,
>>> +                       "64-bit memory window address not specified for PHB");
>>> +            return;
>>> +        }
>>> +
>>> +        if (sphb->mem_win_size > SPAPR_PCI_MEM32_WIN_SIZE) {
>>> +            error_setg(errp, "32-bit memory window of size 0x%"HWADDR_PRIx
>>> +                       " (max 2 GiB)", sphb->mem_win_size);
>>> +            return;
>>> +        }
>>> +
>>> +        if (sphb->mem64_win_pciaddr == (hwaddr)-1) {
>>> +            /* 64-bit window defaults to identity mapping */
>>> +            sphb->mem64_win_pciaddr = sphb->mem64_win_addr;
>>> +        }
>>> +    } else if (sphb->mem_win_size > SPAPR_PCI_MEM32_WIN_SIZE) {
>>> +        /*
>>> +         * For compatibility with old configuration, if no 64-bit MMIO
>>> +         * window is specified, but the ordinary (32-bit) memory
>>> +         * window is specified as > 2GiB, we treat it as a 2GiB 32-bit
>>> +         * window, with a 64-bit MMIO window following on immediately
>>> +         * afterwards
>>> +         */
>>> +        sphb->mem64_win_size = sphb->mem_win_size - SPAPR_PCI_MEM32_WIN_SIZE;
>>> +        sphb->mem64_win_addr = sphb->mem_win_addr + SPAPR_PCI_MEM32_WIN_SIZE;
>>> +        sphb->mem64_win_pciaddr =
>>> +            SPAPR_PCI_MEM_WIN_BUS_OFFSET + SPAPR_PCI_MEM32_WIN_SIZE;
>>> +        sphb->mem_win_size = SPAPR_PCI_MEM32_WIN_SIZE;
>>> +    }
>>> +
>>>      if (spapr_pci_find_phb(spapr, sphb->buid)) {
>>>          error_setg(errp, "PCI host bridges must have unique BUIDs");
>>>          return;
>>> @@ -1366,12 +1400,19 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>>>      sprintf(namebuf, "%s.mmio", sphb->dtbusname);
>>>      memory_region_init(&sphb->memspace, OBJECT(sphb), namebuf, UINT64_MAX);
>>>  
>>> -    sprintf(namebuf, "%s.mmio-alias", sphb->dtbusname);
>>> -    memory_region_init_alias(&sphb->memwindow, OBJECT(sphb),
>>> +    sprintf(namebuf, "%s.mmio32-alias", sphb->dtbusname);
>>> +    memory_region_init_alias(&sphb->mem32window, OBJECT(sphb),
>>>                               namebuf, &sphb->memspace,
>>>                               SPAPR_PCI_MEM_WIN_BUS_OFFSET, sphb->mem_win_size);
>>>      memory_region_add_subregion(get_system_memory(), sphb->mem_win_addr,
>>> -                                &sphb->memwindow);
>>> +                                &sphb->mem32window);
>>> +
>>> +    sprintf(namebuf, "%s.mmio64-alias", sphb->dtbusname);
>>> +    memory_region_init_alias(&sphb->mem64window, OBJECT(sphb),
>>> +                             namebuf, &sphb->memspace,
>>> +                             sphb->mem64_win_pciaddr, sphb->mem64_win_size);
>>> +    memory_region_add_subregion(get_system_memory(), sphb->mem64_win_addr,
>>> +                                &sphb->mem64window);
>>>  
>>>      /* Initialize IO regions */
>>>      sprintf(namebuf, "%s.io", sphb->dtbusname);
>>> @@ -1524,6 +1565,10 @@ static Property spapr_phb_properties[] = {
>>>      DEFINE_PROP_UINT64("mem_win_addr", sPAPRPHBState, mem_win_addr, -1),
>>>      DEFINE_PROP_UINT64("mem_win_size", sPAPRPHBState, mem_win_size,
>>>                         SPAPR_PCI_MMIO_WIN_SIZE),
>>> +    DEFINE_PROP_UINT64("mem64_win_addr", sPAPRPHBState, mem64_win_addr, -1),
>>> +    DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size, 0),
>>> +    DEFINE_PROP_UINT64("mem64_win_pciaddr", sPAPRPHBState, mem64_win_pciaddr,
>>> +                       -1),
>>>      DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
>>>      DEFINE_PROP_UINT64("io_win_size", sPAPRPHBState, io_win_size,
>>>                         SPAPR_PCI_IO_WIN_SIZE),
>>> @@ -1759,10 +1804,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>>>      int bus_off, i, j, ret;
>>>      char nodename[FDT_NAME_MAX];
>>>      uint32_t bus_range[] = { cpu_to_be32(0), cpu_to_be32(0xff) };
>>> -    const uint64_t mmiosize = memory_region_size(&phb->memwindow);
>>> -    const uint64_t w32max = (1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET;
>>> -    const uint64_t w32size = MIN(w32max, mmiosize);
>>> -    const uint64_t w64size = (mmiosize > w32size) ? (mmiosize - w32size) : 0;
>>>      struct {
>>>          uint32_t hi;
>>>          uint64_t child;
>>> @@ -1777,15 +1818,16 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>>>          {
>>>              cpu_to_be32(b_ss(2)), cpu_to_be64(SPAPR_PCI_MEM_WIN_BUS_OFFSET),
>>>              cpu_to_be64(phb->mem_win_addr),
>>> -            cpu_to_be64(w32size),
>>> +            cpu_to_be64(phb->mem_win_size),
>>>          },
>>>          {
>>> -            cpu_to_be32(b_ss(3)), cpu_to_be64(1ULL << 32),
>>> -            cpu_to_be64(phb->mem_win_addr + w32size),
>>> -            cpu_to_be64(w64size)
>>> +            cpu_to_be32(b_ss(3)), cpu_to_be64(phb->mem64_win_pciaddr),
>>> +            cpu_to_be64(phb->mem64_win_addr),
>>> +            cpu_to_be64(phb->mem64_win_size),
>>>          },
>>>      };
>>> -    const unsigned sizeof_ranges = (w64size ? 3 : 2) * sizeof(ranges[0]);
>>> +    const unsigned sizeof_ranges =
>>> +        (phb->mem64_win_size ? 3 : 2) * sizeof(ranges[0]);
>>>      uint64_t bus_reg[] = { cpu_to_be64(phb->buid), 0 };
>>>      uint32_t interrupt_map_mask[] = {
>>>          cpu_to_be32(b_ddddd(-1)|b_fff(0)), 0x0, 0x0, cpu_to_be32(-1)};
>>> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
>>> index 8c9ebfd..23dfb09 100644
>>> --- a/include/hw/pci-host/spapr.h
>>> +++ b/include/hw/pci-host/spapr.h
>>> @@ -53,8 +53,10 @@ struct sPAPRPHBState {
>>>      bool dr_enabled;
>>>  
>>>      MemoryRegion memspace, iospace;
>>> -    hwaddr mem_win_addr, mem_win_size, io_win_addr, io_win_size;
>>> -    MemoryRegion memwindow, iowindow, msiwindow;
>>> +    hwaddr mem_win_addr, mem_win_size, mem64_win_addr, mem64_win_size;
>>> +    uint64_t mem64_win_pciaddr;
>>> +    hwaddr io_win_addr, io_win_size;
>>> +    MemoryRegion mem32window, mem64window, iowindow, msiwindow;
>>>  
>>>      uint32_t dma_liobn[SPAPR_PCI_DMA_MAX_WINDOWS];
>>>      hwaddr dma_win_addr, dma_win_size;
>>> @@ -80,6 +82,8 @@ struct sPAPRPHBState {
>>>  };
>>>  
>>>  #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
>>> +#define SPAPR_PCI_MEM32_WIN_SIZE     \
>>> +    ((1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET)
>>>  
>>>  #define SPAPR_PCI_MMIO_WIN_SIZE      0xf80000000
>>>  #define SPAPR_PCI_IO_WIN_SIZE        0x10000
>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>> index a05783c..aeaba3e 100644
>>> --- a/include/hw/ppc/spapr.h
>>> +++ b/include/hw/ppc/spapr.h
>>> @@ -41,7 +41,8 @@ struct sPAPRMachineClass {
>>>      bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
>>>      const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default */
>>>      void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
>>> -                          uint64_t *buid, hwaddr *pio, hwaddr *mmio,
>>> +                          uint64_t *buid, hwaddr *pio, 
>>> +                          hwaddr *mmio32, hwaddr *mmio64,
>>>                            unsigned n_dma, uint32_t *liobns, Error **errp);
>>>  };
>>>  
>>>
>>
>>
> 


-- 
Alexey


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 839 bytes --]

  reply	other threads:[~2016-11-08  3:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-17  2:43 [Qemu-devel] [PULL 00/16] ppc-for-2.8 queue 20161017 David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 01/16] tests: minor cleanups in usb-hcd-uhci-test David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 02/16] qtest: ask endianness of the target in qtest_init() David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 03/16] tests/boot-sector: Use minimum length for the Forth boot script David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 04/16] tests/boot-sector: Use mkstemp() to create a unique file name David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 05/16] tests/boot-sector: Increase time-out to 90 seconds David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 06/16] target-ppc: implement vexts[bh]2w and vexts[bhw]2d David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 07/16] spapr: fix inheritance chain for default machine options David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 08/16] ppc/xics: Make the ICSState a list David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 09/16] ppc/xics: Split ICS into ics-base and ics class David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 10/16] libqos: Isolate knowledge of spapr memory map to qpci_init_spapr() David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 11/16] libqos: Correct error in PCI hole sizing for spapr David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 12/16] libqos: Limit spapr-pci to 32-bit MMIO for now David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 13/16] spapr_pci: Delegate placement of PCI host bridges to machine type David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 14/16] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 15/16] spapr_pci: Add a 64-bit MMIO window David Gibson
2016-11-04  5:03   ` Alexey Kardashevskiy
2016-11-08  1:16     ` David Gibson
2016-11-08  3:59       ` Alexey Kardashevskiy [this message]
2016-11-09  3:42         ` David Gibson
2016-10-17  2:43 ` [Qemu-devel] [PULL 16/16] spapr: Improved placement of PCI host bridges in guest memory map David Gibson
2016-10-17  3:16 ` [Qemu-devel] [PULL 00/16] ppc-for-2.8 queue 20161017 no-reply
2016-10-17 12:55 ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4424b1f9-669e-9d6b-022d-e9c3c86c117e@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=agraf@suse.de \
    --cc=david@gibson.dropbear.id.au \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).