qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Laurent Vivier <lvivier@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>, agraf@suse.de
Cc: mst@redhat.com, abologna@redhat.com, aik@ozlabs.ru,
	qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
	mdroth@linux.vnet.ibm.com, benh@kernel.crashing.org
Subject: Re: [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map
Date: Thu, 13 Oct 2016 10:40:32 +0200	[thread overview]
Message-ID: <9a92c0ce-f8e9-a39d-a7fa-e48e7f1e5f30@redhat.com> (raw)
In-Reply-To: <1476316647-9433-8-git-send-email-david@gibson.dropbear.id.au>



On 13/10/2016 01:57, David Gibson wrote:
> Currently, the MMIO space for accessing PCI on pseries guests begins at
> 1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
> chunk of address space in which it places its outbound PIO and 32-bit and
> 64-bit MMIO windows.
> 
> This scheme as several problems:
>   - It limits guest RAM to 1 TiB (though we have a limited fix for this
>     now)
>   - It limits the total MMIO window to 64 GiB.  This is not always enough
>     for some of the large nVidia GPGPU cards
>   - Putting all the windows into a single 64 GiB area means that naturally
>     aligning things within there will waste more address space.
> In addition there was a miscalculation in some of the defaults, which meant
> that the MMIO windows for each PHB actually slightly overran the 64 GiB
> region for that PHB.  We got away without nasty consequences because
> the overrun fit within an unused area at the beginning of the next PHB's
> region, but it's not pretty.
> 
> This patch implements a new scheme which addresses those problems, and is
> also closer to what bare metal hardware and pHyp guests generally use.
> 
> Because some guest versions (including most current distro kernels) can't
> access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
> 64 TiB.  This is broken into 1 TiB chunks.  The 1 TiB contains the PIO
> (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
> subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
> one PHB each.
> 
> This reduces the number of allowed PHBs (without full manual configuration
> of all the windows) from 256 to 31, but this should still be plenty in
> practice.
> 
> We also change some of the default window sizes for manually configured
> PHBs to saner values.
> 
> Finally we adjust some tests and libqos so that it correctly uses the new
> default locations.  Ideally it would parse the device tree given to the
> guest, but that's a more complex problem for another time.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr.c              | 126 +++++++++++++++++++++++++++++++++-----------
>  hw/ppc/spapr_pci.c          |   5 +-
>  include/hw/pci-host/spapr.h |   8 ++-
>  tests/endianness-test.c     |   3 +-
>  tests/libqos/pci-spapr.c    |   9 ++--
>  tests/spapr-phb-test.c      |   2 +-
>  6 files changed, 113 insertions(+), 40 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 8db3657..2d952a8 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2375,31 +2375,42 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>                                  hwaddr *mmio32, hwaddr *mmio64,
>                                  unsigned n_dma, uint32_t *liobns, Error **errp)
>  {
> +    /*
> +     * New-style PHB window placement.
> +     *
> +     * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
> +     * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
> +     * windows.
> +     *
> +     * Some guest kernels can't work with MMIO windows above 1<<46
> +     * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
> +     *
> +     * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
> +     * PHBs.  33..34TiB has the 64-bit MMIO window for PHB0, 34..35
> +     * has the 64-bit window for PHB1 and so forth.
> +     */
>      const uint64_t base_buid = 0x800000020000000ULL;
> -    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
> -    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
> -    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
> -    const uint32_t max_index = 255;
> -    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
>  
> -    uint64_t ram_top = MACHINE(spapr)->ram_size;
> -    hwaddr phb0_base, phb_base;
> +    int max_phbs =
> +        (SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
> +    hwaddr mmio32_base = SPAPR_PCI_BASE + SPAPR_PCI_MEM32_WIN_SIZE;

The result is the same but I would add SPAPR_PCI_MEM_WIN_BUS_OFFSET
instead of SPAPR_PCI_MEM32_WIN_SIZE.

As SPAPR_PCI_MEM32_WIN_SIZE is defined as "((1ULL << 32) -
SPAPR_PCI_MEM_WIN_BUS_OFFSET)", I guess 0..SPAPR_PCI_MEM_WIN_BUS_OFFSET
is for PIO and SPAPR_PCI_MEM_WIN_BUS_OFFSET..(1<<32) is for MMIO.

This is what we have below with:

    *pio = SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE;
    *mmio32 = mmio32_base + index * SPAPR_PCI_MEM32_WIN_SIZE;

Perhaps we can see *mmio32 as

    SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM32_WIN_SIZE

But in this case you should directly defined SPAPR_PCI_MEM32_WIN_SIZE as
0x80000000, not as "SIZE - OFFSET". It's confusing...

If it is only a misunderstanding from my side, you can add my:

Reviewed-by: Laurent Vivier <lvivier@redhat.com>

Thanks,
Laurent

  reply	other threads:[~2016-10-13  8:40 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-12 23:57 [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries David Gibson
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 1/7] libqos: Isolate knowledge of spapr memory map to qpci_init_spapr() David Gibson
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 2/7] libqos: Correct error in PCI hole sizing for spapr David Gibson
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 3/7] libqos: Limit spapr-pci to 32-bit MMIO for now David Gibson
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 4/7] spapr_pci: Delegate placement of PCI host bridges to machine type David Gibson
2016-10-13  7:33   ` Laurent Vivier
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 5/7] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM David Gibson
2016-10-13  7:35   ` Laurent Vivier
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 6/7] spapr_pci: Add a 64-bit MMIO window David Gibson
2016-10-13  8:06   ` Laurent Vivier
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map David Gibson
2016-10-13  8:40   ` Laurent Vivier [this message]
2016-10-13 23:29     ` David Gibson
2016-10-14  7:25       ` Laurent Vivier
2016-10-16  1:06         ` David Gibson
2016-10-14  4:10 ` [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries no-reply
2016-10-14  5:41   ` David Gibson
2016-10-14  7:07     ` Laurent Vivier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9a92c0ce-f8e9-a39d-a7fa-e48e7f1e5f30@redhat.com \
    --to=lvivier@redhat.com \
    --cc=abologna@redhat.com \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).