From: Bjorn Helgaas <bhelgaas@google.com>
To: Wei Yang <weiyang@linux.vnet.ibm.com>
Cc: benh@au1.ibm.com, gwshan@linux.vnet.ibm.com,
linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
Donald Dutile <ddutile@redhat.com>,
Myron Stowe <myron.stowe@redhat.com>
Subject: Re: [PATCH V9 03/18] PCI: Add weak pcibios_iov_resource_size() interface
Date: Tue, 18 Nov 2014 18:12:43 -0700 [thread overview]
Message-ID: <20141119011243.GA23467@google.com> (raw)
In-Reply-To: <1414942894-17034-4-git-send-email-weiyang@linux.vnet.ibm.com>
[+cc Don, Myron]
Hi Wei,
On Sun, Nov 02, 2014 at 11:41:19PM +0800, Wei Yang wrote:
> When retrieving VF IOV BAR in virtfn_add(), it will divide the total PF's IOV
> BAR size with the totalVF number. This is true for most cases, while may not
> be correct on some specific platform.
>
> For example on PowerNV platform, in order to fix PF's IOV BAR into a hardware
> alignment, the PF's IOV BAR size would be expended. This means the original
> method couldn't work.
>
> This patch introduces a weak pcibios_iov_resource_size() interface, which
> gives platform a chance to implement specific method to calculate the VF BAR
> resource size.
>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> ---
> drivers/pci/iov.c | 27 +++++++++++++++++++++++++--
> include/linux/pci.h | 5 +++++
> 2 files changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 4d1685d..6866830 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -61,6 +61,30 @@ static void virtfn_remove_bus(struct pci_bus *physbus, struct pci_bus *virtbus)
> pci_remove_bus(virtbus);
> }
>
> +resource_size_t __weak pcibios_iov_resource_size(struct pci_dev *dev, int resno)
> +{
> + return 0;
> +}
> +
> +resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
> +{
> + resource_size_t size;
> + struct pci_sriov *iov;
> +
> + if (!dev->is_physfn)
> + return 0;
> +
> + size = pcibios_iov_resource_size(dev, resno);
> + if (size != 0)
> + return size;
> +
> + iov = dev->sriov;
> + size = resource_size(dev->resource + resno);
> + do_div(size, iov->total_VFs);
> +
> + return size;
> +}
> +
> static int virtfn_add(struct pci_dev *dev, int id, int reset)
> {
> int i;
> @@ -96,8 +120,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
> continue;
> virtfn->resource[i].name = pci_name(virtfn);
> virtfn->resource[i].flags = res->flags;
> - size = resource_size(res);
> - do_div(size, iov->total_VFs);
> + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
> virtfn->resource[i].start = res->start + size * id;
Can you help me understand this?
We have previously called sriov_init() on the PF. There, we sized the VF
BARs, which are in the PF's SR-IOV Capability (SR-IOV spec sec 3.3.14).
The size we discover is the amount of space required by a single VF, so
sriov_init() adjusts PF->resource[PCI_IOV_RESOURCES + i] by multiplying
that size by PCI_SRIOV_TOTAL_VF, so this PF resource is now big enough to
hold the VF BAR[i] areas for all the possible VFs.
Now we're in virtfn_add(), setting up a new VF. The usual BARs at config
space 0x10, etc., are read-only zeroes for a VF, so we don't size them the
usual way. Instead, we carve out a slice of the
PF->resource[PCI_IOV_RESOURCES + i] area.
I thought the starting address of the VF BARn memory aperture was
prescribed by the spec in sec 2.1.1.1 and shown in figure 2-1:
BARx VFy starting address = VF BARx + (y - 1) * (VF BARx aperture size)
That's basically what the existing code does. We don't save the VF BARx
aperture size, so we recompute it by undoing the multiplication we did in
sriov_init().
But you're computing the starting address using a different VF BARx
aperture size. How does that work? I assumed this calculation was built
into the PCI device, but obviously I'm missing something.
To make it concrete, here's a made-up example:
PF SR-IOV Capability
TotalVFs = 4
NumVFs = 4
System Page Size = 4KB
VF BAR0 = [mem 0x00000000-0x00000fff] (4KB at address 0)
PF pci_dev->resource[7] = [mem 0x00000000-0x00003fff] (16KB)
VF1 pci_dev->resource[0] = [mem 0x00000000-0x00000fff]
VF2 pci_dev->resource[0] = [mem 0x00001000-0x00001fff]
VF3 pci_dev->resource[0] = [mem 0x00002000-0x00002fff]
VF4 pci_dev->resource[0] = [mem 0x00003000-0x00003fff]
You're changing this so we might use 16KB as the VF BAR0 aperture size
instead of 4KB, which would result in the following:
VF1 pci_dev->resource[0] = [mem 0x00000000-0x00003fff]
VF2 pci_dev->resource[0] = [mem 0x00004000-0x00007fff]
VF3 pci_dev->resource[0] = [mem 0x00008000-0x0000bfff]
VF4 pci_dev->resource[0] = [mem 0x0000c000-0x0000ffff]
But you didn't change sriov_init(), so the PF resource[7] is only 16KB, not
the 64KB the VFs need. And I assume the VF address decoder is wired to
claim the 4KB regions at 0x0000, 0x1000, 0x2000, 0x3000, not the ones at
0x0000, 0x4000, 0x8000, 0xc000.
Bjorn
> virtfn->resource[i].end = virtfn->resource[i].start + size - 1;
> rc = request_resource(res, &virtfn->resource[i]);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index bbf8058..2f5b454 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1162,6 +1162,8 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
> resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev,
> int resno,
> resource_size_t align);
> +resource_size_t pcibios_iov_resource_size(struct pci_dev *dev,
> + int resno);
>
> #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
> #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
> @@ -1666,6 +1668,7 @@ int pci_num_vf(struct pci_dev *dev);
> int pci_vfs_assigned(struct pci_dev *dev);
> int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
> int pci_sriov_get_totalvfs(struct pci_dev *dev);
> +resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
> #else
> static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
> {
> @@ -1685,6 +1688,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
> { return 0; }
> static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
> { return 0; }
> +static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
> +{ return 0; }
> #endif
>
> #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
> --
> 1.7.9.5
>
WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <bhelgaas@google.com>
To: Wei Yang <weiyang@linux.vnet.ibm.com>
Cc: benh@au1.ibm.com, linux-pci@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, gwshan@linux.vnet.ibm.com,
Donald Dutile <ddutile@redhat.com>,
Myron Stowe <myron.stowe@redhat.com>
Subject: Re: [PATCH V9 03/18] PCI: Add weak pcibios_iov_resource_size() interface
Date: Tue, 18 Nov 2014 18:12:43 -0700 [thread overview]
Message-ID: <20141119011243.GA23467@google.com> (raw)
In-Reply-To: <1414942894-17034-4-git-send-email-weiyang@linux.vnet.ibm.com>
[+cc Don, Myron]
Hi Wei,
On Sun, Nov 02, 2014 at 11:41:19PM +0800, Wei Yang wrote:
> When retrieving VF IOV BAR in virtfn_add(), it will divide the total PF's IOV
> BAR size with the totalVF number. This is true for most cases, while may not
> be correct on some specific platform.
>
> For example on PowerNV platform, in order to fix PF's IOV BAR into a hardware
> alignment, the PF's IOV BAR size would be expended. This means the original
> method couldn't work.
>
> This patch introduces a weak pcibios_iov_resource_size() interface, which
> gives platform a chance to implement specific method to calculate the VF BAR
> resource size.
>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> ---
> drivers/pci/iov.c | 27 +++++++++++++++++++++++++--
> include/linux/pci.h | 5 +++++
> 2 files changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 4d1685d..6866830 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -61,6 +61,30 @@ static void virtfn_remove_bus(struct pci_bus *physbus, struct pci_bus *virtbus)
> pci_remove_bus(virtbus);
> }
>
> +resource_size_t __weak pcibios_iov_resource_size(struct pci_dev *dev, int resno)
> +{
> + return 0;
> +}
> +
> +resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
> +{
> + resource_size_t size;
> + struct pci_sriov *iov;
> +
> + if (!dev->is_physfn)
> + return 0;
> +
> + size = pcibios_iov_resource_size(dev, resno);
> + if (size != 0)
> + return size;
> +
> + iov = dev->sriov;
> + size = resource_size(dev->resource + resno);
> + do_div(size, iov->total_VFs);
> +
> + return size;
> +}
> +
> static int virtfn_add(struct pci_dev *dev, int id, int reset)
> {
> int i;
> @@ -96,8 +120,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
> continue;
> virtfn->resource[i].name = pci_name(virtfn);
> virtfn->resource[i].flags = res->flags;
> - size = resource_size(res);
> - do_div(size, iov->total_VFs);
> + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
> virtfn->resource[i].start = res->start + size * id;
Can you help me understand this?
We have previously called sriov_init() on the PF. There, we sized the VF
BARs, which are in the PF's SR-IOV Capability (SR-IOV spec sec 3.3.14).
The size we discover is the amount of space required by a single VF, so
sriov_init() adjusts PF->resource[PCI_IOV_RESOURCES + i] by multiplying
that size by PCI_SRIOV_TOTAL_VF, so this PF resource is now big enough to
hold the VF BAR[i] areas for all the possible VFs.
Now we're in virtfn_add(), setting up a new VF. The usual BARs at config
space 0x10, etc., are read-only zeroes for a VF, so we don't size them the
usual way. Instead, we carve out a slice of the
PF->resource[PCI_IOV_RESOURCES + i] area.
I thought the starting address of the VF BARn memory aperture was
prescribed by the spec in sec 2.1.1.1 and shown in figure 2-1:
BARx VFy starting address = VF BARx + (y - 1) * (VF BARx aperture size)
That's basically what the existing code does. We don't save the VF BARx
aperture size, so we recompute it by undoing the multiplication we did in
sriov_init().
But you're computing the starting address using a different VF BARx
aperture size. How does that work? I assumed this calculation was built
into the PCI device, but obviously I'm missing something.
To make it concrete, here's a made-up example:
PF SR-IOV Capability
TotalVFs = 4
NumVFs = 4
System Page Size = 4KB
VF BAR0 = [mem 0x00000000-0x00000fff] (4KB at address 0)
PF pci_dev->resource[7] = [mem 0x00000000-0x00003fff] (16KB)
VF1 pci_dev->resource[0] = [mem 0x00000000-0x00000fff]
VF2 pci_dev->resource[0] = [mem 0x00001000-0x00001fff]
VF3 pci_dev->resource[0] = [mem 0x00002000-0x00002fff]
VF4 pci_dev->resource[0] = [mem 0x00003000-0x00003fff]
You're changing this so we might use 16KB as the VF BAR0 aperture size
instead of 4KB, which would result in the following:
VF1 pci_dev->resource[0] = [mem 0x00000000-0x00003fff]
VF2 pci_dev->resource[0] = [mem 0x00004000-0x00007fff]
VF3 pci_dev->resource[0] = [mem 0x00008000-0x0000bfff]
VF4 pci_dev->resource[0] = [mem 0x0000c000-0x0000ffff]
But you didn't change sriov_init(), so the PF resource[7] is only 16KB, not
the 64KB the VFs need. And I assume the VF address decoder is wired to
claim the 4KB regions at 0x0000, 0x1000, 0x2000, 0x3000, not the ones at
0x0000, 0x4000, 0x8000, 0xc000.
Bjorn
> virtfn->resource[i].end = virtfn->resource[i].start + size - 1;
> rc = request_resource(res, &virtfn->resource[i]);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index bbf8058..2f5b454 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1162,6 +1162,8 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
> resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev,
> int resno,
> resource_size_t align);
> +resource_size_t pcibios_iov_resource_size(struct pci_dev *dev,
> + int resno);
>
> #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
> #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
> @@ -1666,6 +1668,7 @@ int pci_num_vf(struct pci_dev *dev);
> int pci_vfs_assigned(struct pci_dev *dev);
> int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
> int pci_sriov_get_totalvfs(struct pci_dev *dev);
> +resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
> #else
> static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
> {
> @@ -1685,6 +1688,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
> { return 0; }
> static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
> { return 0; }
> +static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
> +{ return 0; }
> #endif
>
> #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
> --
> 1.7.9.5
>
next prev parent reply other threads:[~2014-11-19 1:12 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-02 15:41 [PATCH V9 00/18] Enable SRIOV on PowerNV Wei Yang
2014-11-02 15:41 ` [PATCH V9 01/18] PCI/IOV: Export interface for retrieve VF's BDF Wei Yang
2014-11-19 23:35 ` Bjorn Helgaas
2014-11-19 23:35 ` Bjorn Helgaas
2014-11-02 15:41 ` [PATCH V9 02/18] PCI: Add weak pcibios_iov_resource_alignment() interface Wei Yang
2014-11-02 15:41 ` [PATCH V9 03/18] PCI: Add weak pcibios_iov_resource_size() interface Wei Yang
2014-11-19 1:12 ` Bjorn Helgaas [this message]
2014-11-19 1:12 ` Bjorn Helgaas
2014-11-19 2:15 ` Benjamin Herrenschmidt
2014-11-19 2:15 ` Benjamin Herrenschmidt
2014-11-19 3:21 ` Wei Yang
2014-11-19 3:21 ` Wei Yang
2014-11-19 4:26 ` Bjorn Helgaas
2014-11-19 4:26 ` Bjorn Helgaas
2014-11-19 9:27 ` Wei Yang
2014-11-19 9:27 ` Wei Yang
2014-11-19 17:23 ` Bjorn Helgaas
2014-11-19 17:23 ` Bjorn Helgaas
2014-11-19 20:51 ` Benjamin Herrenschmidt
2014-11-19 20:51 ` Benjamin Herrenschmidt
2014-11-20 5:40 ` Wei Yang
2014-11-20 5:40 ` Wei Yang
2014-11-20 5:39 ` Wei Yang
2014-11-20 5:39 ` Wei Yang
2014-11-02 15:41 ` [PATCH V9 04/18] PCI: Take additional PF's IOV BAR alignment in sizing and assigning Wei Yang
2014-11-02 15:41 ` [PATCH V9 05/18] powerpc/pci: Add PCI resource alignment documentation Wei Yang
2014-11-02 15:41 ` [PATCH V9 06/18] powerpc/pci: Don't unset pci resources for VFs Wei Yang
2014-11-02 15:41 ` [PATCH V9 07/18] powerpc/pci: Define pcibios_disable_device() on powerpc Wei Yang
2014-11-02 15:41 ` [PATCH V9 08/18] powrepc/pci: Refactor pci_dn Wei Yang
2014-11-19 23:30 ` Bjorn Helgaas
2014-11-19 23:30 ` Bjorn Helgaas
2014-11-20 1:02 ` Gavin Shan
2014-11-20 1:02 ` Gavin Shan
2014-11-20 7:25 ` Wei Yang
2014-11-20 7:25 ` Wei Yang
2014-11-20 7:20 ` Wei Yang
2014-11-20 7:20 ` Wei Yang
2014-11-20 19:05 ` Bjorn Helgaas
2014-11-20 19:05 ` Bjorn Helgaas
2014-11-21 0:04 ` Gavin Shan
2014-11-21 0:04 ` Gavin Shan
2014-11-25 9:28 ` Wei Yang
2014-11-25 9:28 ` Wei Yang
2014-11-21 1:46 ` Wei Yang
2014-11-21 1:46 ` Wei Yang
2014-11-02 15:41 ` [PATCH V9 09/18] powerpc/pci: remove pci_dn->pcidev field Wei Yang
2014-11-02 15:41 ` [PATCH V9 10/18] powerpc/powernv: Use pci_dn in PCI config accessor Wei Yang
2014-11-02 15:41 ` [PATCH V9 11/18] powerpc/powernv: Allocate pe->iommu_table dynamically Wei Yang
2014-11-02 15:41 ` [PATCH V9 12/18] powerpc/powernv: Expand VF resources according to the number of total_pe Wei Yang
2014-11-02 15:41 ` [PATCH V9 13/18] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv Wei Yang
2014-11-02 15:41 ` [PATCH V9 14/18] powerpc/powernv: Implement pcibios_iov_resource_size() " Wei Yang
2014-11-02 15:41 ` [PATCH V9 15/18] powerpc/powernv: Shift VF resource with an offset Wei Yang
2014-11-02 15:41 ` [PATCH V9 16/18] powerpc/powernv: Allocate VF PE Wei Yang
2014-11-02 15:41 ` [PATCH V9 17/18] powerpc/powernv: Expanding IOV BAR, with m64_per_iov supported Wei Yang
2014-11-02 15:41 ` [PATCH V9 18/18] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3 Wei Yang
2014-11-18 23:11 ` [PATCH V9 00/18] Enable SRIOV on PowerNV Gavin Shan
2014-11-18 23:11 ` Gavin Shan
2014-11-18 23:40 ` Bjorn Helgaas
2014-11-18 23:40 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141119011243.GA23467@google.com \
--to=bhelgaas@google.com \
--cc=benh@au1.ibm.com \
--cc=ddutile@redhat.com \
--cc=gwshan@linux.vnet.ibm.com \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=myron.stowe@redhat.com \
--cc=weiyang@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.