From: Bjorn Helgaas <bhelgaas@google.com>
To: Wei Yang <weiyang@linux.vnet.ibm.com>
Cc: linux-pci@vger.kernel.org, benh@au1.ibm.com,
linuxppc-dev@lists.ozlabs.org, gwshan@linux.vnet.ibm.com
Subject: Re: [PATCH V11 16/17] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported
Date: Wed, 4 Feb 2015 16:05:18 -0600 [thread overview]
Message-ID: <20150204220518.GB20072@google.com> (raw)
In-Reply-To: <1421288887-7765-17-git-send-email-weiyang@linux.vnet.ibm.com>
On Thu, Jan 15, 2015 at 10:28:06AM +0800, Wei Yang wrote:
> M64 aperture size is limited on PHB3. When the IOV BAR is too big, this
> will exceed the limitation and failed to be assigned.
>
> This patch introduce a different mechanism based on the IOV BAR size:
>
> IOV BAR size is smaller than 64M, expand to total_pe.
> IOV BAR size is bigger than 64M, roundup power2.
Can you elaborate on this a little more? I assume this is talking about an
M64 BAR. What is the size limit for this?
If I understand correctly, hardware always splits an M64 BAR into 256
segments. If you wanted each 128MB VF BAR in a separate PE, the M64 BAR
would have to be 256 * 128MB = 32GB. So maybe the idea is that instead of
consuming 32GB of address space, you let each VF BAR span several segments.
If each 128MB VF BAR spanned 4 segments, each segment would be 32MB and the
whole M64 BAR would be 256 * 32MB = 8GB. But you would have to use
"companion" PEs so all 4 segments would be in the same "domain," and that
would reduce the number of VFs you could support from 256 to 256/4 = 64.
If that were the intent, I would think TotalVFs would be involved, but it's
not. For a device with TotalVFs=8 and a 128MB VF BAR, it would make sense
to dedicate 4 segments to each of those BARs because you could only use 8*4
= 32 segments total. If the device had TotalVFs=128, your only choices
would be 1 segment per BAR (256 segments * 128MB/segment = 32GB) or 2
segments per BAR (256 segments * 64MB/segment = 16GB).
If you use 2 segments per BAR and you want NumVFs=128, you can't shift
anything, so the PE assignments are completely fixed (VF0 in PE0, VF1 in
PE1, etc.)
It seems like you'd make different choices about pdn->max_vfs for these two
devices, but currently you only look at the VF BAR size, not at TotalVFs.
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> ---
> arch/powerpc/include/asm/pci-bridge.h | 2 ++
> arch/powerpc/platforms/powernv/pci-ioda.c | 33 ++++++++++++++++++++++++++---
> 2 files changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index d61c384..7156486 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -174,6 +174,8 @@ struct pci_dn {
> u16 max_vfs; /* number of VFs IOV BAR expended */
> u16 vf_pes; /* VF PE# under this PF */
> int offset; /* PE# for the first VF PE */
> +#define M64_PER_IOV 4
> + int m64_per_iov;
> #define IODA_INVALID_M64 (-1)
> int m64_wins[PCI_SRIOV_NUM_BARS];
> #endif /* CONFIG_PCI_IOV */
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 94fe6e1..23ea873 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2180,6 +2180,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> int i;
> resource_size_t size;
> struct pci_dn *pdn;
> + int mul, total_vfs;
>
> if (!pdev->is_physfn || pdev->is_added)
> return;
> @@ -2190,6 +2191,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> pdn = pci_get_pdn(pdev);
> pdn->max_vfs = 0;
>
> + total_vfs = pci_sriov_get_totalvfs(pdev);
> + pdn->m64_per_iov = 1;
> + mul = phb->ioda.total_pe;
> +
> + for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
> + res = &pdev->resource[i];
> + if (!res->flags || res->parent)
> + continue;
> + if (!pnv_pci_is_mem_pref_64(res->flags)) {
> + dev_warn(&pdev->dev, " non M64 IOV BAR %pR on %s\n",
> + res, pci_name(pdev));
> + continue;
> + }
> +
> + size = pci_iov_resource_size(pdev, i);
> +
> + /* bigger than 64M */
> + if (size > (1 << 26)) {
> + dev_info(&pdev->dev, "PowerNV: VF BAR[%d] size "
> + "is bigger than 64M, roundup power2\n", i);
> + pdn->m64_per_iov = M64_PER_IOV;
> + mul = __roundup_pow_of_two(total_vfs);
I think this might deserve more comment in dmesg. "roundup power2" doesn't
really tell me anything, especially since you mention a BAR, but you're
actually rounding up total_vfs, not the BAR size.
Does this reduce the number of possible VFs? We already can't set NumVFs
higher than TotalVFs. Does this make it so we can't set NumVFs higher than
pdn->max_vfs?
> + break;
> + }
> + }
> +
> for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
> res = &pdev->resource[i];
> if (!res->flags || res->parent)
> @@ -2202,13 +2229,13 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>
> dev_dbg(&pdev->dev, " Fixing VF BAR[%d] %pR to\n", i, res);
> size = pci_iov_resource_size(pdev, i);
> - res->end = res->start + size * phb->ioda.total_pe - 1;
> + res->end = res->start + size * mul - 1;
> dev_dbg(&pdev->dev, " %pR\n", res);
> dev_info(&pdev->dev, "VF BAR%d: %pR (expanded to %d VFs for PE alignment)",
> i - PCI_IOV_RESOURCES,
> - res, phb->ioda.total_pe);
> + res, mul);
> }
> - pdn->max_vfs = phb->ioda.total_pe;
> + pdn->max_vfs = mul;
Maybe this is the part that makes it hard to compute the size in
sriov_init() -- you reduce pdn->max_vfs in some cases, and maybe that can
affect the space you reserve for IOV BARs? E.g., maybe you reduce
pdn->max_vfs to 128 because VF BAR 3 is larger than 64MB, but you've
already expanded the IOV space for VF BAR 1 based on 256 PEs?
But then I'm still confused because the loop here in
pnv_pci_ioda_fixup_iov_resources() always expands the resource based on
phb->ioda.total_pe; that part doesn't depend on "mul" or pdn->max_vfs at
all.
Or maybe this is just a bug in the fixup loop, and it *should* depend on
"mul"?
> }
>
> static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
> --
> 1.7.9.5
>
next prev parent reply other threads:[~2015-02-04 22:05 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-22 5:54 [PATCH V10 00/17] Enable SRIOV on Power8 Wei Yang
2014-12-22 5:54 ` [PATCH V10 01/17] PCI/IOV: Export interface for retrieve VF's BDF Wei Yang
2014-12-22 5:54 ` [PATCH V10 02/17] PCI/IOV: add VF enable/disable hook Wei Yang
2014-12-22 5:54 ` [PATCH V10 03/17] PCI: Add weak pcibios_iov_resource_alignment() interface Wei Yang
2014-12-22 5:54 ` [PATCH V10 04/17] PCI: Store VF BAR size in pci_sriov Wei Yang
2014-12-22 5:54 ` [PATCH V10 05/17] PCI: Take additional PF's IOV BAR alignment in sizing and assigning Wei Yang
2014-12-22 5:54 ` [PATCH V10 06/17] powerpc/pci: Add PCI resource alignment documentation Wei Yang
2014-12-22 5:54 ` [PATCH V10 07/17] powerpc/pci: Don't unset pci resources for VFs Wei Yang
2014-12-22 5:54 ` [PATCH V10 08/17] powrepc/pci: Refactor pci_dn Wei Yang
2014-12-22 5:54 ` [PATCH V10 09/17] powerpc/pci: remove pci_dn->pcidev field Wei Yang
2014-12-22 5:54 ` [PATCH V10 10/17] powerpc/powernv: Use pci_dn in PCI config accessor Wei Yang
2014-12-22 5:54 ` [PATCH V10 11/17] powerpc/powernv: Allocate pe->iommu_table dynamically Wei Yang
2014-12-22 5:54 ` [PATCH V10 12/17] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe Wei Yang
2014-12-22 5:54 ` [PATCH V10 13/17] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv Wei Yang
2014-12-22 5:54 ` [PATCH V10 14/17] powerpc/powernv: Shift VF resource with an offset Wei Yang
2014-12-22 5:54 ` [PATCH V10 15/17] powerpc/powernv: Allocate VF PE Wei Yang
2014-12-22 5:54 ` [PATCH V10 16/17] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported Wei Yang
2014-12-22 5:54 ` [PATCH V10 17/17] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3 Wei Yang
2014-12-22 6:05 ` [PATCH V10 00/17] Enable SRIOV on Power8 Wei Yang
2015-01-13 18:05 ` Bjorn Helgaas
2015-01-15 2:27 ` [PATCH V11 " Wei Yang
2015-01-15 2:27 ` [PATCH V11 01/17] PCI/IOV: Export interface for retrieve VF's BDF Wei Yang
2015-02-20 23:09 ` Bjorn Helgaas
2015-03-02 6:05 ` Wei Yang
2015-01-15 2:27 ` [PATCH V11 02/17] PCI/IOV: add VF enable/disable hook Wei Yang
2015-02-10 0:26 ` Benjamin Herrenschmidt
2015-02-10 1:35 ` Wei Yang
2015-02-10 2:13 ` Benjamin Herrenschmidt
2015-02-10 6:18 ` Wei Yang
2015-01-15 2:27 ` [PATCH V11 03/17] PCI: Add weak pcibios_iov_resource_alignment() interface Wei Yang
2015-02-10 0:32 ` Benjamin Herrenschmidt
2015-02-10 1:44 ` Wei Yang
2015-01-15 2:27 ` [PATCH V11 04/17] PCI: Store VF BAR size in pci_sriov Wei Yang
2015-01-15 2:27 ` [PATCH V11 05/17] PCI: Take additional PF's IOV BAR alignment in sizing and assigning Wei Yang
2015-01-15 2:27 ` [PATCH V11 06/17] powerpc/pci: Add PCI resource alignment documentation Wei Yang
2015-02-04 23:44 ` Bjorn Helgaas
2015-02-10 1:02 ` Benjamin Herrenschmidt
2015-02-20 0:56 ` Bjorn Helgaas
2015-02-20 2:41 ` Benjamin Herrenschmidt
2015-01-15 2:27 ` [PATCH V11 07/17] powerpc/pci: Don't unset pci resources for VFs Wei Yang
2015-02-10 0:36 ` Benjamin Herrenschmidt
2015-02-10 1:51 ` Wei Yang
2015-02-10 2:14 ` Benjamin Herrenschmidt
2015-02-10 6:25 ` Wei Yang
2015-02-10 8:14 ` Benjamin Herrenschmidt
2015-02-20 23:47 ` Bjorn Helgaas
2015-03-02 6:09 ` Wei Yang
2015-01-15 2:27 ` [PATCH V11 08/17] powrepc/pci: Refactor pci_dn Wei Yang
2015-02-20 23:19 ` Bjorn Helgaas
2015-02-23 0:13 ` Gavin Shan
2015-02-24 8:13 ` Bjorn Helgaas
2015-02-24 8:25 ` Benjamin Herrenschmidt
2015-01-15 2:27 ` [PATCH V11 09/17] powerpc/pci: remove pci_dn->pcidev field Wei Yang
2015-01-15 2:28 ` [PATCH V11 10/17] powerpc/powernv: Use pci_dn in PCI config accessor Wei Yang
2015-01-15 2:28 ` [PATCH V11 11/17] powerpc/powernv: Allocate pe->iommu_table dynamically Wei Yang
2015-01-15 2:28 ` [PATCH V11 12/17] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe Wei Yang
2015-02-04 21:26 ` Bjorn Helgaas
2015-02-04 23:08 ` Wei Yang
2015-01-15 2:28 ` [PATCH V11 13/17] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv Wei Yang
2015-02-04 21:26 ` Bjorn Helgaas
2015-02-04 22:45 ` Wei Yang
2015-01-15 2:28 ` [PATCH V11 14/17] powerpc/powernv: Shift VF resource with an offset Wei Yang
2015-01-30 23:08 ` Bjorn Helgaas
2015-02-03 1:30 ` Wei Yang
2015-02-03 7:01 ` [PATCH] powerpc/powernv: make sure the IOV BAR will not exceed limit after shifting Wei Yang
2015-02-04 0:19 ` Bjorn Helgaas
2015-02-04 3:34 ` Wei Yang
2015-02-04 14:19 ` Bjorn Helgaas
2015-02-04 15:20 ` Wei Yang
2015-02-04 16:08 ` [PATCH] pci/iov: fix memory leak introduced in "PCI: Store individual VF BAR size in struct pci_sriov" Wei Yang
2015-02-04 16:28 ` Bjorn Helgaas
2015-02-04 20:53 ` [PATCH] powerpc/powernv: make sure the IOV BAR will not exceed limit after shifting Bjorn Helgaas
2015-02-05 3:01 ` Wei Yang
2015-01-15 2:28 ` [PATCH V11 15/17] powerpc/powernv: Allocate VF PE Wei Yang
2015-01-15 2:28 ` [PATCH V11 16/17] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported Wei Yang
2015-02-04 22:05 ` Bjorn Helgaas [this message]
2015-02-05 0:07 ` Wei Yang
2015-01-15 2:28 ` [PATCH V11 17/17] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3 Wei Yang
2015-02-04 23:44 ` [PATCH V11 00/17] Enable SRIOV on Power8 Bjorn Helgaas
2015-02-05 0:13 ` Wei Yang
2015-02-05 6:34 ` [PATCH 0/3] Code adjustment on pci/virtualization Wei Yang
2015-02-05 6:34 ` [PATCH 1/3] fix on Store individual VF BAR size in struct pci_sriov Wei Yang
2015-02-05 6:34 ` [PATCH 2/3] fix Reserve additional space for IOV BAR, with m64_per_iov supported Wei Yang
2015-02-05 6:34 ` [PATCH 3/3] remove the unused end in pnv_pci_vf_resource_shift() Wei Yang
2015-02-10 0:25 ` [PATCH V11 00/17] Enable SRIOV on Power8 Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150204220518.GB20072@google.com \
--to=bhelgaas@google.com \
--cc=benh@au1.ibm.com \
--cc=gwshan@linux.vnet.ibm.com \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=weiyang@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).