linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: bhelgaas@google.com
Cc: Wei Yang <weiyang@linux.vnet.ibm.com>,
	linux-pci@vger.kernel.org, gwshan@linux.vnet.ibm.com,
	qiudayu@linux.vnet.ibm.com, yan@linux.vnet.ibm.com,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH V7 00/17] Enable SRIOV on POWER8
Date: Thu, 31 Jul 2014 16:35:10 +1000	[thread overview]
Message-ID: <1406788510.4935.180.camel@pasglop> (raw)
In-Reply-To: <1406182947-11302-1-git-send-email-weiyang@linux.vnet.ibm.com>

On Thu, 2014-07-24 at 14:22 +0800, Wei Yang wrote:
> This patch set enables the SRIOV on POWER8.

Hi Bjorn !

There are 4 patches in there to the generic code, but so far not much
review from your side of the fence :-)

How do you want to proceed ?

Cheers,
Ben.

> The gerneral idea is put each VF into one individual PE and allocate required
> resources like DMA/MSI.
> 
> One thing special for VF PE is we use M64BT to cover the IOV BAR. M64BT is one
> hardware on POWER platform to map MMIO address to PE. By using M64BT, we could
> map one individual VF to a VF PE, which introduce more flexiblity to users.
> 
> To achieve this effect, we need to do some hack on pci devices's resources.
> 1. Expand the IOV BAR properly.
>    Done by pnv_pci_ioda_fixup_iov_resources().
> 2. Shift the IOV BAR properly.
>    Done by pnv_pci_vf_resource_shift().
> 3. IOV BAR alignment is the total size instead of an individual size on
>    powernv platform.
>    Done by pnv_pcibios_sriov_resource_alignment().
> 4. Take the IOV BAR alignment into consideration in the sizing and assigning.
>    This is achieved by commit: "PCI: Take additional IOV BAR alignment in
>    sizing and assigning"
> 
> Test Environment:
>        The SRIOV device tested is Emulex Lancer and Mellanox ConnectX-3 on
>        POWER8.
> 
> Examples on pass through a VF to guest through vfio:
> 	1. install necessary modules
> 	   modprobe vfio
> 	   modprobe vfio-pci
> 	2. retrieve the iommu_group the device belongs to
> 	   readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group
> 	   ../../../../kernel/iommu_groups/26
> 	   This means it belongs to group 26
> 	3. see how many devices under this iommu_group
> 	   ls /sys/kernel/iommu_groups/26/devices/
> 	4. unbind the original driver and bind to vfio-pci driver
> 	   echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind
> 	   echo  1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id
> 	   Note: this should be done for each device in the same iommu_group
> 	5. Start qemu and pass device through vfio
> 	   /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \
> 		   -M pseries -m 2048 -enable-kvm -nographic \
> 		   -drive file=/home/ywywyang/kvm/fc19.img \
> 		   -monitor telnet:localhost:5435,server,nowait -boot cd \
> 		   -device "spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6"
> 
> Verify this is the exact VF response:
> 	1. ping from a machine in the same subnet(the broadcast domain)
> 	2. run arp -n on this machine
> 	   9.115.251.20             ether   00:00:c9:df:ed:bf   C eth0
> 	3. ifconfig in the guest
> 	   # ifconfig eth1
> 	   eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
> 	        inet 9.115.251.20  netmask 255.255.255.0  broadcast 9.115.251.255
> 		inet6 fe80::200:c9ff:fedf:edbf  prefixlen 64  scopeid 0x20<link>
> 	        ether 00:00:c9:df:ed:bf  txqueuelen 1000 (Ethernet)
> 	        RX packets 175  bytes 13278 (12.9 KiB)
> 	        RX errors 0  dropped 0  overruns 0  frame 0
> 		TX packets 58  bytes 9276 (9.0 KiB)
> 	        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 	4. They have the same MAC address
> 
> 	Note: make sure you shutdown other network interfaces in guest.
> 
> ---
> v6 -> v7:
>    1. add IORESOURCE_ARCH flag for IOV BAR on powernv platform.
>    2. when IOV BAR has IORESOURCE_ARCH flag, the size is retrieved from
>       hardware directly. If not, calculate as usual.
>    3. reorder the patch set, group them by subsystem:
>       PCI, powerpc, powernv
>    4. rebase it on 3.16-rc6
> v5 -> v6:
>    1. remove pcibios_enable_sriov()/pcibios_disable_sriov() weak function
>       similar function is moved to
>       pnv_pci_enable_device_hook()/pnv_pci_disable_device_hook(). When PF is
>       enabled, platform will try best to allocate resources for VFs.
>    2. remove pcibios_sriov_resource_size weak function
>    3. VF BAR size is retrieved from hardware directly in virtfn_add()
> v4 -> v5:
>    1. merge those SRIOV related platform functions in machdep_calls
>       wrap them in one CONFIG_PCI_IOV marco
>    2. define IODA_INVALID_M64 to replace (-1)
>       use this value to represent the m64_wins is not used
>    3. rename pnv_pci_release_dev_dma() to pnv_pci_ioda2_release_dma_pe()
>       this function is a conterpart to pnv_pci_ioda2_setup_dma_pe()
>    4. change dev_info() to dev_dgb() in pnv_pci_ioda_fixup_iov_resources()
>       reduce some log in kernel
>    5. release M64 window in pnv_pci_ioda2_release_dma_pe()
> v3 -> v4:
>    1. code format fix, eg. not exceed 80 chars
>    2. in commit "ppc/pnv: Add function to deconfig a PE"
>       check the bus has a bridge before print the name
>       remove a PE from its own PELTV
>    3. change the function name for sriov resource size/alignment
>    4. rebase on 3.16-rc3
>    5. VFs will not rely on device node
>       As Grant Likely's comments, kernel should have the ability to handle the
>       lack of device_node gracefully. Gavin restructure the pci_dn, which
>       makes the VF will have pci_dn even when VF's device_node is not provided
>       by firmware.
>    6. clean all the patch title to make them comply with one style
>    7. fix return value for pci_iov_virtfn_bus/pci_iov_virtfn_devfn
> v2 -> v3:
>    1. change the return type of virtfn_bus/virtfn_devfn to int
>       change the name of these two functions to pci_iov_virtfn_bus/pci_iov_virtfn_devfn
>    2. reduce the second parameter or pcibios_sriov_disable()
>    3. use data instead of pe in "ppc/pnv: allocate pe->iommu_table dynamically"
>    4. rename __pci_sriov_resource_size to pcibios_sriov_resource_size
>    5. rename __pci_sriov_resource_alignment to pcibios_sriov_resource_alignment
> v1 -> v2:
>    1. change the return value of virtfn_bus/virtfn_devfn to 0
>    2. move some TCE related marco definition to
>       arch/powerpc/platforms/powernv/pci.h
>    3. fix the __pci_sriov_resource_alignment on powernv platform
>       During the sizing stage, the IOV BAR is truncated to 0, which will
>       effect the order of allocation. Fix this, so that make sure BAR will be
>       allocated ordered by their alignment.
> v0 -> v1:
>    1. improve the change log for
>       "PCI: Add weak __pci_sriov_resource_size() interface"
>       "PCI: Add weak __pci_sriov_resource_alignment() interface"
>       "PCI: take additional IOV BAR alignment in sizing and assigning"
>    2. wrap VF PE code in CONFIG_PCI_IOV
>    3. did regression test on P7.
> 
> Gavin Shan (2):
>   powrepc/pci: Refactor pci_dn
>   powerpc/powernv: Use pci_dn in PCI config accessor
> 
> Wei Yang (15):
>   PCI/IOV: Export interface for retrieve VF's BDF
>   PCI/IOV: Get VF BAR size from hardware directly when platform needs
>   PCI: Add weak pcibios_sriov_resource_alignment() interface
>   PCI: Take additional IOV BAR alignment in sizing and assigning
>   powerpc/pci: Don't unset pci resources for VFs
>   powerpc/pci: Define pcibios_disable_device() on powerpc
>   powerpc/powernv: mark IOV BAR with IORESOURCE_ARCH
>   powerpc/powernv: Allocate pe->iommu_table dynamically
>   powerpc/powernv: Add function to deconfig a PE
>   powerpc/powernv: Expand VF resources according to the number of
>     total_pe
>   powerpc/powernv: Implement pcibios_sriov_resource_alignment on
>     powernv
>   powerpc/powernv: Shift VF resource with an offset
>   powerpc/powernv: Allocate VF PE
>   powerpc/powernv: Expanding IOV BAR, with m64_per_iov supported
>   powerpc/powernv: Group VF PE when IOV BAR is big on PHB3
> 
>  arch/powerpc/include/asm/device.h            |    3 +
>  arch/powerpc/include/asm/iommu.h             |    3 +
>  arch/powerpc/include/asm/machdep.h           |   12 +-
>  arch/powerpc/include/asm/pci-bridge.h        |   23 +-
>  arch/powerpc/kernel/pci-common.c             |   31 ++
>  arch/powerpc/kernel/pci-hotplug.c            |    3 +
>  arch/powerpc/kernel/pci_dn.c                 |  248 ++++++++-
>  arch/powerpc/platforms/powernv/eeh-powernv.c |   24 +-
>  arch/powerpc/platforms/powernv/pci-ioda.c    |  772 +++++++++++++++++++++++++-
>  arch/powerpc/platforms/powernv/pci.c         |  107 ++--
>  arch/powerpc/platforms/powernv/pci.h         |   15 +-
>  drivers/pci/iov.c                            |   65 ++-
>  drivers/pci/pci.h                            |   19 -
>  drivers/pci/setup-bus.c                      |   68 ++-
>  include/linux/ioport.h                       |    1 +
>  include/linux/pci.h                          |   47 ++
>  16 files changed, 1311 insertions(+), 130 deletions(-)
> 

  parent reply	other threads:[~2014-07-31  6:35 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-24  6:22 [PATCH V7 00/17] Enable SRIOV on POWER8 Wei Yang
2014-07-24  6:22 ` [PATCH V7 01/17] PCI/IOV: Export interface for retrieve VF's BDF Wei Yang
2014-08-19 21:37   ` Bjorn Helgaas
2014-08-20  2:25     ` Wei Yang
2014-07-24  6:22 ` [PATCH V7 02/17] PCI/IOV: Get VF BAR size from hardware directly when platform needs Wei Yang
2014-08-19 21:44   ` Bjorn Helgaas
2014-08-20  2:31     ` Wei Yang
2014-07-24  6:22 ` [PATCH V7 03/17] PCI: Add weak pcibios_sriov_resource_alignment() interface Wei Yang
2014-07-24  6:22 ` [PATCH V7 04/17] PCI: Take additional IOV BAR alignment in sizing and assigning Wei Yang
2014-08-20  3:08   ` Bjorn Helgaas
2014-08-20  6:14     ` Wei Yang
2014-08-28  2:34       ` Wei Yang
2014-09-09 20:09       ` Bjorn Helgaas
2014-09-10  3:27         ` Wei Yang
2014-07-24  6:22 ` [PATCH V7 05/17] powerpc/pci: Don't unset pci resources for VFs Wei Yang
2014-07-24  6:22 ` [PATCH V7 06/17] powerpc/pci: Define pcibios_disable_device() on powerpc Wei Yang
2014-07-24  6:22 ` [PATCH V7 07/17] powrepc/pci: Refactor pci_dn Wei Yang
2014-07-24  6:22 ` [PATCH V7 08/17] powerpc/powernv: Use pci_dn in PCI config accessor Wei Yang
2014-07-24  6:22 ` [PATCH V7 09/17] powerpc/powernv: mark IOV BAR with IORESOURCE_ARCH Wei Yang
2014-07-24  6:22 ` [PATCH V7 10/17] powerpc/powernv: Allocate pe->iommu_table dynamically Wei Yang
2014-07-24  6:22 ` [PATCH V7 11/17] powerpc/powernv: Add function to deconfig a PE Wei Yang
2014-07-24  6:22 ` [PATCH V7 12/17] powerpc/powernv: Expand VF resources according to the number of total_pe Wei Yang
2014-07-24  6:22 ` [PATCH V7 13/17] powerpc/powernv: Implement pcibios_sriov_resource_alignment on powernv Wei Yang
2014-07-24  6:22 ` [PATCH V7 14/17] powerpc/powernv: Shift VF resource with an offset Wei Yang
2014-07-24  6:22 ` [PATCH V7 15/17] powerpc/powernv: Allocate VF PE Wei Yang
2014-07-24  6:22 ` [PATCH V7 16/17] powerpc/powernv: Expanding IOV BAR, with m64_per_iov supported Wei Yang
2014-07-24  6:22 ` [PATCH V7 17/17] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3 Wei Yang
2014-07-31  6:35 ` Benjamin Herrenschmidt [this message]
2014-08-19 21:19 ` [PATCH V7 00/17] Enable SRIOV on POWER8 Bjorn Helgaas
2014-08-20  2:34   ` Wei Yang
2014-08-20  3:12     ` Bjorn Helgaas
2014-08-20  3:35       ` Wei Yang
2014-10-02 15:59         ` Bjorn Helgaas
2014-10-02 23:38           ` Gavin Shan
2014-10-15  9:00           ` Wei Yang
2014-10-15 13:52             ` Bjorn Helgaas
2014-10-16  8:41               ` Wei Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1406788510.4935.180.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=qiudayu@linux.vnet.ibm.com \
    --cc=weiyang@linux.vnet.ibm.com \
    --cc=yan@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).