linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <bhelgaas@google.com>
To: Wei Yang <weiyang@linux.vnet.ibm.com>
Cc: linux-pci@vger.kernel.org, benh@au1.ibm.com,
	linuxppc-dev@lists.ozlabs.org, gwshan@linux.vnet.ibm.com
Subject: Re: [PATCH] powerpc/powernv: make sure the IOV BAR will not exceed limit after shifting
Date: Tue, 3 Feb 2015 18:19:26 -0600	[thread overview]
Message-ID: <20150204001926.GA19540@google.com> (raw)
In-Reply-To: <1422946903-12958-1-git-send-email-weiyang@linux.vnet.ibm.com>

On Tue, Feb 03, 2015 at 03:01:43PM +0800, Wei Yang wrote:
> The actual IOV BAR range is determined by the start address and the actual
> size for vf_num VFs BAR. After shifting the IOV BAR, there would be a
> chance the actual end address exceed the limit and overlap with other
> devices.
> 
> This patch adds a check to make sure after shifting, the range will not
> overlap with other devices.

I folded this into the previous patch (the one that adds
pnv_pci_vf_resource_shift()).  And I think that needs to be folded together
with the following one ("powerpc/powernv: Allocate VF PE") because this one
references pdn->vf_pes, which is added by "Allocate VF PE".

> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c |   53 ++++++++++++++++++++++++++---
>  1 file changed, 48 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 8456ae8..1a1e74b 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -854,16 +854,18 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
>  }
>  
>  #ifdef CONFIG_PCI_IOV
> -static void pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
> +static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>  {
>  	struct pci_dn *pdn = pci_get_pdn(dev);
>  	int i;
>  	struct resource *res;
>  	resource_size_t size;
> +	u16 vf_num;
>  
>  	if (!dev->is_physfn)
> -		return;
> +		return -EINVAL;
>  
> +	vf_num = pdn->vf_pes;

I can't actually build this, but I don't think pdn->vf_pes is defined yet.

>  	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
>  		res = &dev->resource[i];
>  		if (!res->flags || !res->parent)
> @@ -875,11 +877,49 @@ static void pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>  		dev_info(&dev->dev, " Shifting VF BAR %pR to\n", res);
>  		size = pci_iov_resource_size(dev, i);
>  		res->start += size*offset;
> -
>  		dev_info(&dev->dev, "                 %pR\n", res);
> +
> +		/*
> +		 * The actual IOV BAR range is determined by the start address
> +		 * and the actual size for vf_num VFs BAR. The check here is
> +		 * to make sure after shifting, the range will not overlap
> +		 * with other device.
> +		 */
> +		if ((res->start + (size * vf_num)) > res->end) {
> +			dev_err(&dev->dev, "VF BAR%d: %pR will conflict with"
> +					" other device after shift\n");

sriov_init() sets up "res" with enough space to contain TotalVF copies
of the VF BAR.  By the time we get here, that "res" is in the resource
tree, and you should be able to see it in /proc/iomem.

For example, if TotalVFs is 128 and VF BAR0 is 1MB in size, the
resource size would be 128 * 1MB = 0x800_0000.  If the VF BAR0 in the
SR-IOV Capability contains a base address of 0x8000_0000, the resource
would be:

  [mem 0x8000_0000-0x87ff_ffff]

We have to assume there's another resource starting immediately after
this one, i.e., at 0x8800_0000, and we have to make sure that when we
change this resource and turn on SR-IOV, we don't overlap with it.

The shifted resource will start at 0x8000_0000 + 1MB * "offset".  The
hardware will respond to a range whose size is 1MB * NumVFs (NumVFs
may be smaller than TotalVFs).

If we enable 16 VFs and shift by 23, we set VF BAR0 to 0x8000_0000 +
1MB * 23 = 0x8170_0000, and the size is 1MB * 16 = 0x100_0000, so the
new resource will be:

  [mem 0x8170_0000-0x826f_ffff]

That's fine; it doesn't extend past the original end of 0x87ff_ffff.
But if we enable those same 16 VFs with a shift of 120, we set VF BAR0
to 0x8000_0000 + 1MB * 120 = 0x8780_0000, and the size stays the same,
so the new resource will be:

  [mem 0x8780_0000-0x887f_ffff]

and that's a problem because we have two devices responding at
0x8800_0000.

Your test of "res->start + (size * vf_num)) > res->end" is not strict
enough to catch this problem.

I think we need something like the patch below.  I restructured it so
we don't have to back out any resource changes if we fail.

This shifting strategy seems to imply that the closer NumVFs is to
TotalVFs, the less flexibility you have to assign PEs, e.g., if NumVFs
== TotalVFs, you wouldn't be able to shift at all.  In this example,
you could shift by anything from 0 to 128 - 16 = 112, but if you
wanted NumVFs = 64, you could only shift by 0 to 64.  Is that true?

I think your M64 BAR gets split into 256 segments, regardless of what
TotalVFs is, so if you expanded the resource to 256 * 1MB for this
example, you would be able to shift by up to 256 - NumVFs.  Do you
actually do this somewhere?

I pushed an updated pci/virtualization branch with these updates.  I
think there's also a leak that needs to be fixed that Dan Carpenter
pointed out.

Bjorn

> +			goto failed;
> +		}
> +	}
> +
> +	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
> +		res = &dev->resource[i];
> +		if (!res->flags || !res->parent)
> +			continue;
> +
> +		if (!pnv_pci_is_mem_pref_64(res->flags))
> +			continue;
> +
>  		pci_update_resource(dev, i);
>  	}
>  	pdn->max_vfs -= offset;
> +	return 0;
> +
> +failed:
> +	for (; i >= PCI_IOV_RESOURCES; i--) {
> +		res = &dev->resource[i];
> +		if (!res->flags || !res->parent)
> +			continue;
> +
> +		if (!pnv_pci_is_mem_pref_64(res->flags))
> +			continue;
> +
> +		dev_info(&dev->dev, " Shifting VF BAR %pR to\n", res);
> +		size = pci_iov_resource_size(dev, i);
> +		res->start += size*(-offset);
> +		dev_info(&dev->dev, "                 %pR\n", res);
> +	}
> +	return -EBUSY;
>  }
>  #endif /* CONFIG_PCI_IOV */
>  
> @@ -1480,8 +1520,11 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 vf_num)
>  		}
>  
>  		/* Do some magic shift */
> -		if (pdn->m64_per_iov == 1)
> -			pnv_pci_vf_resource_shift(pdev, pdn->offset);
> +		if (pdn->m64_per_iov == 1) {
> +			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
> +			if (ret)
> +				goto m64_failed;
> +		}
>  	}
>  
>  	/* Setup VF PEs */

commit 9849fc0c807d3544dbd682354325b2454f139ca4
Author: Wei Yang <weiyang@linux.vnet.ibm.com>
Date:   Tue Feb 3 13:09:30 2015 -0600

    powerpc/powernv: Shift VF resource with an offset
    
    On PowerNV platform, resource position in M64 implies the PE# the resource
    belongs to.  In some cases, adjustment of a resource is necessary to locate
    it to a correct position in M64.
    
    Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address
    according to an offset.
    
    [bhelgaas: rework loops, rework overlap check, index resource[]
    conventionally, remove pci_regs.h include]
    Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 74de05e58e1d..6ffedcc291a8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -749,6 +749,74 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
 	return 10;
 }
 
+#ifdef CONFIG_PCI_IOV
+static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
+{
+	struct pci_dn *pdn = pci_get_pdn(dev);
+	int i;
+	struct resource *res, res2;
+	resource_size_t size, end;
+	u16 vf_num;
+
+	if (!dev->is_physfn)
+		return -EINVAL;
+
+	/*
+	 * "offset" is in VFs.  The M64 BARs are sized so that when they
+	 * are segmented, each segment is the same size as the IOV BAR.
+	 * Each segment is in a separate PE, and the high order bits of the
+	 * address are the PE number.  Therefore, each VF's BAR is in a
+	 * separate PE, and changing the IOV BAR start address changes the
+	 * range of PEs the VFs are in.
+	 */
+	vf_num = pdn->vf_pes;		// FIXME not defined yet
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+		res = &dev->resource[i + PCI_IOV_RESOURCES];
+		if (!res->flags || !res->parent)
+			continue;
+
+		if (!pnv_pci_is_mem_pref_64(res->flags))
+			continue;
+
+		/*
+		 * The actual IOV BAR range is determined by the start address
+		 * and the actual size for vf_num VFs BAR.  This check is to
+		 * make sure that after shifting, the range will not overlap
+		 * with another device.
+		 */
+		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
+		res2.flags = res->flags;
+		res2.start = res->start + (size * offset);
+		res2.end = res2.start + (size * vf_num) - 1;
+
+		if (res2.end > res->end) {
+			dev_err(&dev->dev, "VF BAR%d: %pR would extend past %pR (trying to enable %d VFs shifted by %d)\n",
+				i, &res2, res, vf_num, offset);
+			return -EBUSY;
+		}
+	}
+
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+		res = &dev->resource[i + PCI_IOV_RESOURCES];
+		if (!res->flags || !res->parent)
+			continue;
+
+		if (!pnv_pci_is_mem_pref_64(res->flags))
+			continue;
+
+		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
+		res2 = *res;
+		res->start += size * offset;
+
+		dev_info(&dev->dev, "VF BAR%d: %pR shifted to %pR (enabling %d VFs shifted by %d)\n",
+			 i, &res2, res, vf_num, offset);
+		pci_update_resource(dev, i + PCI_IOV_RESOURCES);
+	}
+	pdn->max_vfs -= offset;
+	return 0;
+}
+#endif /* CONFIG_PCI_IOV */
+
 #if 0
 static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 {

  reply	other threads:[~2015-02-04  0:19 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-22  5:54 [PATCH V10 00/17] Enable SRIOV on Power8 Wei Yang
2014-12-22  5:54 ` [PATCH V10 01/17] PCI/IOV: Export interface for retrieve VF's BDF Wei Yang
2014-12-22  5:54 ` [PATCH V10 02/17] PCI/IOV: add VF enable/disable hook Wei Yang
2014-12-22  5:54 ` [PATCH V10 03/17] PCI: Add weak pcibios_iov_resource_alignment() interface Wei Yang
2014-12-22  5:54 ` [PATCH V10 04/17] PCI: Store VF BAR size in pci_sriov Wei Yang
2014-12-22  5:54 ` [PATCH V10 05/17] PCI: Take additional PF's IOV BAR alignment in sizing and assigning Wei Yang
2014-12-22  5:54 ` [PATCH V10 06/17] powerpc/pci: Add PCI resource alignment documentation Wei Yang
2014-12-22  5:54 ` [PATCH V10 07/17] powerpc/pci: Don't unset pci resources for VFs Wei Yang
2014-12-22  5:54 ` [PATCH V10 08/17] powrepc/pci: Refactor pci_dn Wei Yang
2014-12-22  5:54 ` [PATCH V10 09/17] powerpc/pci: remove pci_dn->pcidev field Wei Yang
2014-12-22  5:54 ` [PATCH V10 10/17] powerpc/powernv: Use pci_dn in PCI config accessor Wei Yang
2014-12-22  5:54 ` [PATCH V10 11/17] powerpc/powernv: Allocate pe->iommu_table dynamically Wei Yang
2014-12-22  5:54 ` [PATCH V10 12/17] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe Wei Yang
2014-12-22  5:54 ` [PATCH V10 13/17] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv Wei Yang
2014-12-22  5:54 ` [PATCH V10 14/17] powerpc/powernv: Shift VF resource with an offset Wei Yang
2014-12-22  5:54 ` [PATCH V10 15/17] powerpc/powernv: Allocate VF PE Wei Yang
2014-12-22  5:54 ` [PATCH V10 16/17] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported Wei Yang
2014-12-22  5:54 ` [PATCH V10 17/17] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3 Wei Yang
2014-12-22  6:05 ` [PATCH V10 00/17] Enable SRIOV on Power8 Wei Yang
2015-01-13 18:05   ` Bjorn Helgaas
2015-01-15  2:27     ` [PATCH V11 " Wei Yang
2015-01-15  2:27       ` [PATCH V11 01/17] PCI/IOV: Export interface for retrieve VF's BDF Wei Yang
2015-02-20 23:09         ` Bjorn Helgaas
2015-03-02  6:05           ` Wei Yang
2015-01-15  2:27       ` [PATCH V11 02/17] PCI/IOV: add VF enable/disable hook Wei Yang
2015-02-10  0:26         ` Benjamin Herrenschmidt
2015-02-10  1:35           ` Wei Yang
2015-02-10  2:13             ` Benjamin Herrenschmidt
2015-02-10  6:18               ` Wei Yang
2015-01-15  2:27       ` [PATCH V11 03/17] PCI: Add weak pcibios_iov_resource_alignment() interface Wei Yang
2015-02-10  0:32         ` Benjamin Herrenschmidt
2015-02-10  1:44           ` Wei Yang
2015-01-15  2:27       ` [PATCH V11 04/17] PCI: Store VF BAR size in pci_sriov Wei Yang
2015-01-15  2:27       ` [PATCH V11 05/17] PCI: Take additional PF's IOV BAR alignment in sizing and assigning Wei Yang
2015-01-15  2:27       ` [PATCH V11 06/17] powerpc/pci: Add PCI resource alignment documentation Wei Yang
2015-02-04 23:44         ` Bjorn Helgaas
2015-02-10  1:02           ` Benjamin Herrenschmidt
2015-02-20  0:56             ` Bjorn Helgaas
2015-02-20  2:41               ` Benjamin Herrenschmidt
2015-01-15  2:27       ` [PATCH V11 07/17] powerpc/pci: Don't unset pci resources for VFs Wei Yang
2015-02-10  0:36         ` Benjamin Herrenschmidt
2015-02-10  1:51           ` Wei Yang
2015-02-10  2:14             ` Benjamin Herrenschmidt
2015-02-10  6:25               ` Wei Yang
2015-02-10  8:14                 ` Benjamin Herrenschmidt
2015-02-20 23:47                   ` Bjorn Helgaas
2015-03-02  6:09                     ` Wei Yang
2015-01-15  2:27       ` [PATCH V11 08/17] powrepc/pci: Refactor pci_dn Wei Yang
2015-02-20 23:19         ` Bjorn Helgaas
2015-02-23  0:13           ` Gavin Shan
2015-02-24  8:13             ` Bjorn Helgaas
2015-02-24  8:25               ` Benjamin Herrenschmidt
2015-01-15  2:27       ` [PATCH V11 09/17] powerpc/pci: remove pci_dn->pcidev field Wei Yang
2015-01-15  2:28       ` [PATCH V11 10/17] powerpc/powernv: Use pci_dn in PCI config accessor Wei Yang
2015-01-15  2:28       ` [PATCH V11 11/17] powerpc/powernv: Allocate pe->iommu_table dynamically Wei Yang
2015-01-15  2:28       ` [PATCH V11 12/17] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe Wei Yang
2015-02-04 21:26         ` Bjorn Helgaas
2015-02-04 23:08           ` Wei Yang
2015-01-15  2:28       ` [PATCH V11 13/17] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv Wei Yang
2015-02-04 21:26         ` Bjorn Helgaas
2015-02-04 22:45           ` Wei Yang
2015-01-15  2:28       ` [PATCH V11 14/17] powerpc/powernv: Shift VF resource with an offset Wei Yang
2015-01-30 23:08         ` Bjorn Helgaas
2015-02-03  1:30           ` Wei Yang
2015-02-03  7:01           ` [PATCH] powerpc/powernv: make sure the IOV BAR will not exceed limit after shifting Wei Yang
2015-02-04  0:19             ` Bjorn Helgaas [this message]
2015-02-04  3:34               ` Wei Yang
2015-02-04 14:19                 ` Bjorn Helgaas
2015-02-04 15:20                   ` Wei Yang
2015-02-04 16:08                   ` [PATCH] pci/iov: fix memory leak introduced in "PCI: Store individual VF BAR size in struct pci_sriov" Wei Yang
2015-02-04 16:28                     ` Bjorn Helgaas
2015-02-04 20:53                 ` [PATCH] powerpc/powernv: make sure the IOV BAR will not exceed limit after shifting Bjorn Helgaas
2015-02-05  3:01                   ` Wei Yang
2015-01-15  2:28       ` [PATCH V11 15/17] powerpc/powernv: Allocate VF PE Wei Yang
2015-01-15  2:28       ` [PATCH V11 16/17] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported Wei Yang
2015-02-04 22:05         ` Bjorn Helgaas
2015-02-05  0:07           ` Wei Yang
2015-01-15  2:28       ` [PATCH V11 17/17] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3 Wei Yang
2015-02-04 23:44       ` [PATCH V11 00/17] Enable SRIOV on Power8 Bjorn Helgaas
2015-02-05  0:13         ` Wei Yang
2015-02-05  6:34         ` [PATCH 0/3] Code adjustment on pci/virtualization Wei Yang
2015-02-05  6:34           ` [PATCH 1/3] fix on Store individual VF BAR size in struct pci_sriov Wei Yang
2015-02-05  6:34           ` [PATCH 2/3] fix Reserve additional space for IOV BAR, with m64_per_iov supported Wei Yang
2015-02-05  6:34           ` [PATCH 3/3] remove the unused end in pnv_pci_vf_resource_shift() Wei Yang
2015-02-10  0:25         ` [PATCH V11 00/17] Enable SRIOV on Power8 Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150204001926.GA19540@google.com \
    --to=bhelgaas@google.com \
    --cc=benh@au1.ibm.com \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=weiyang@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).