linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Oliver O'Halloran <oohall@gmail.com>, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 15/15] powerpc/powernv/sriov: Make single PE mode a per-BAR setting
Date: Wed, 15 Jul 2020 15:24:48 +1000	[thread overview]
Message-ID: <c89dfd04-afc2-4d69-00ab-2e743d5bb844@ozlabs.ru> (raw)
In-Reply-To: <20200710052340.737567-16-oohall@gmail.com>



On 10/07/2020 15:23, Oliver O'Halloran wrote:
> Using single PE BARs to map an SR-IOV BAR is really a choice about what
> strategy to use when mapping a BAR. It doesn't make much sense for this to
> be a global setting since a device might have one large BAR which needs to
> be mapped with single PE windows and another smaller BAR that can be mapped
> with a regular segmented window. Make the segmented vs single decision a
> per-BAR setting and clean up the logic that decides which mode to use.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
>  arch/powerpc/platforms/powernv/pci-sriov.c | 131 +++++++++++----------
>  arch/powerpc/platforms/powernv/pci.h       |  10 +-
>  2 files changed, 75 insertions(+), 66 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
> index 8de03636888a..87377d95d648 100644
> --- a/arch/powerpc/platforms/powernv/pci-sriov.c
> +++ b/arch/powerpc/platforms/powernv/pci-sriov.c
> @@ -146,10 +146,9 @@
>  static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  {
>  	struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
> -	const resource_size_t gate = phb->ioda.m64_segsize >> 2;
>  	struct resource *res;
>  	int i;
> -	resource_size_t size, total_vf_bar_sz;
> +	resource_size_t vf_bar_sz;
>  	struct pnv_iov_data *iov;
>  	int mul, total_vfs;
>  
> @@ -158,9 +157,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  		goto disable_iov;
>  	pdev->dev.archdata.iov_data = iov;
>  
> +	/* FIXME: totalvfs > phb->ioda.total_pe_num is going to be a problem */


WARN_ON_ONCE() then?


>  	total_vfs = pci_sriov_get_totalvfs(pdev);
>  	mul = phb->ioda.total_pe_num;
> -	total_vf_bar_sz = 0;
>  
>  	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>  		res = &pdev->resource[i + PCI_IOV_RESOURCES];
> @@ -173,50 +172,51 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  			goto disable_iov;
>  		}
>  
> -		total_vf_bar_sz += pci_iov_resource_size(pdev,
> -				i + PCI_IOV_RESOURCES);
> +		vf_bar_sz = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>  
>  		/*
> -		 * If bigger than quarter of M64 segment size, just round up
> -		 * power of two.
> +		 * Generally, one segmented M64 BAR maps one IOV BAR. However,
> +		 * if a VF BAR is too large we end up wasting a lot of space.
> +		 * If we've got a BAR that's bigger than greater than 1/4 of the


bigger, greater, huger? :)

Also, a nit: s/got a BAR/got a VF BAR/


> +		 * default window's segment size then switch to using single PE
> +		 * windows. This limits the total number of VFs we can support.

Just to get idea about absolute numbers here.

On my P9:

./pciex@600c3c0300000/ibm,opal-m64-window
                 00060200 00000000 00060200 00000000 00000040 00000000

so that default window's segment size is 0x40.0000.0000/512 = 512MB?


>  		 *
> -		 * Generally, one M64 BAR maps one IOV BAR. To avoid conflict
> -		 * with other devices, IOV BAR size is expanded to be
> -		 * (total_pe * VF_BAR_size).  When VF_BAR_size is half of M64
> -		 * segment size , the expanded size would equal to half of the
> -		 * whole M64 space size, which will exhaust the M64 Space and
> -		 * limit the system flexibility.  This is a design decision to
> -		 * set the boundary to quarter of the M64 segment size.
> +		 * The 1/4 limit is arbitrary and can be tweaked.
>  		 */
> -		if (total_vf_bar_sz > gate) {
> -			mul = roundup_pow_of_two(total_vfs);
> -			dev_info(&pdev->dev,
> -				"VF BAR Total IOV size %llx > %llx, roundup to %d VFs\n",
> -				total_vf_bar_sz, gate, mul);
> -			iov->m64_single_mode = true;
> -			break;
> -		}
> -	}
> +		if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
> +			/*
> +			 * On PHB3, the minimum size alignment of M64 BAR in
> +			 * single mode is 32MB. If this VF BAR is smaller than
> +			 * 32MB, but still too large for a segmented window
> +			 * then we can't map it and need to disable SR-IOV for
> +			 * this device.


Why not use single PE mode for such BAR? Better than nothing.


> +			 */
> +			if (vf_bar_sz < SZ_32M) {
> +				pci_err(pdev, "VF BAR%d: %pR can't be mapped in single PE mode\n",
> +					i, res);
> +				goto disable_iov;
> +			}
>  
> -	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> -		res = &pdev->resource[i + PCI_IOV_RESOURCES];
> -		if (!res->flags || res->parent)
> +			iov->m64_single_mode[i] = true;
>  			continue;
> +		}
> +
>  
> -		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>  		/*
> -		 * On PHB3, the minimum size alignment of M64 BAR in single
> -		 * mode is 32MB.
> +		 * This BAR can be mapped with one segmented window, so adjust
> +		 * te resource size to accommodate.
>  		 */
> -		if (iov->m64_single_mode && (size < SZ_32M))
> -			goto disable_iov;
> +		pci_dbg(pdev, " Fixing VF BAR%d: %pR to\n", i, res);
> +		res->end = res->start + vf_bar_sz * mul - 1;
> +		pci_dbg(pdev, "                       %pR\n", res);
>  
> -		dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
> -		res->end = res->start + size * mul - 1;
> -		dev_dbg(&pdev->dev, "                       %pR\n", res);
> -		dev_info(&pdev->dev, "VF BAR%d: %pR (expanded to %d VFs for PE alignment)",
> +		pci_info(pdev, "VF BAR%d: %pR (expanded to %d VFs for PE alignment)",
>  			 i, res, mul);
> +
> +		iov->need_shift = true;
>  	}
> +
> +	// what should this be?
>  	iov->vfs_expanded = mul;
>  
>  	return;
> @@ -260,42 +260,42 @@ void pnv_pci_ioda_fixup_iov(struct pci_dev *pdev)
>  resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>  						      int resno)
>  {
> -	struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
>  	struct pnv_iov_data *iov = pnv_iov_get(pdev);
>  	resource_size_t align;
>  
> -	/*
> -	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
> -	 * SR-IOV. While from hardware perspective, the range mapped by M64
> -	 * BAR should be size aligned.
> -	 *
> -	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
> -	 * powernv-specific hardware restriction is gone. But if just use the
> -	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
> -	 * in one segment of M64 #15, which introduces the PE conflict between
> -	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
> -	 * m64_segsize.
> -	 *
> -	 * This function returns the total IOV BAR size if M64 BAR is in
> -	 * Shared PE mode or just VF BAR size if not.
> -	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
> -	 * M64 segment size if IOV BAR size is less.
> -	 */
> -	align = pci_iov_resource_size(pdev, resno);
> +	int bar_no = resno - PCI_IOV_RESOURCES;
>  
>  	/*
>  	 * iov can be null if we have an SR-IOV device with IOV BAR that can't
>  	 * be placed in the m64 space (i.e. The BAR is 32bit or non-prefetch).
> -	 * In that case we don't allow VFs to be enabled so just return the
> -	 * default alignment.
> +	 * In that case we don't allow VFs to be enabled since one of their
> +	 * BARs would not be placed in the correct PE.
>  	 */
>  	if (!iov)
>  		return align;
>  	if (!iov->vfs_expanded)
>  		return align;
> -	if (iov->m64_single_mode)
> -		return max(align, (resource_size_t)phb->ioda.m64_segsize);
>  
> +	align = pci_iov_resource_size(pdev, resno);
> +
> +	/*
> +	 * If we're using single mode then we can just use the native VF BAR
> +	 * alignment. We validated that it's possible to use a single PE
> +	 * window above when we did the fixup.
> +	 */
> +	if (iov->m64_single_mode[bar_no])
> +		return align;
> +
> +	/*
> +	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
> +	 * SR-IOV. While from hardware perspective, the range mapped by M64
> +	 * BAR should be size aligned.
> +	 *
> +	 * This function returns the total IOV BAR size if M64 BAR is in
> +	 * Shared PE mode or just VF BAR size if not.
> +	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
> +	 * M64 segment size if IOV BAR size is less.
> +	 */
>  	return iov->vfs_expanded * align;
>  }
>  
> @@ -453,7 +453,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>  			continue;
>  
>  		/* don't need single mode? map everything in one go! */
> -		if (!iov->m64_single_mode) {
> +		if (!iov->m64_single_mode[i]) {
>  			win = pnv_pci_alloc_m64_bar(phb, iov);
>  			if (win < 0)
>  				goto m64_failed;
> @@ -546,6 +546,8 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>  		res = &dev->resource[i + PCI_IOV_RESOURCES];
>  		if (!res->flags || !res->parent)
>  			continue;
> +		if (iov->m64_single_mode[i])
> +			continue;
>  
>  		/*
>  		 * The actual IOV BAR range is determined by the start address
> @@ -577,6 +579,8 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>  		res = &dev->resource[i + PCI_IOV_RESOURCES];
>  		if (!res->flags || !res->parent)
>  			continue;
> +		if (iov->m64_single_mode[i])
> +			continue;
>  
>  		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>  		res2 = *res;
> @@ -622,8 +626,8 @@ static void pnv_pci_sriov_disable(struct pci_dev *pdev)
>  	/* Release VF PEs */
>  	pnv_ioda_release_vf_PE(pdev);
>  
> -	/* Un-shift the IOV BAR resources */
> -	if (!iov->m64_single_mode)
> +	/* Un-shift the IOV BARs if we need to */
> +	if (iov->need_shift)
>  		pnv_pci_vf_resource_shift(pdev, -base_pe);
>  
>  	/* Release M64 windows */
> @@ -741,9 +745,8 @@ static int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>  	 * the IOV BAR according to the PE# allocated to the VFs.
>  	 * Otherwise, the PE# for the VF will conflict with others.
>  	 */
> -	if (!iov->m64_single_mode) {
> -		ret = pnv_pci_vf_resource_shift(pdev,
> -						base_pe->pe_number);
> +	if (iov->need_shift) {
> +		ret = pnv_pci_vf_resource_shift(pdev, base_pe->pe_number);
>  		if (ret)
>  			goto shift_failed;
>  	}
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 13555bc549f4..a78d1feb8fb8 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -236,14 +236,20 @@ struct pnv_iov_data {
>  	/* number of VFs IOV BAR expanded. FIXME: rename this to something less bad */
>  	u16     vfs_expanded;
>  
> +	/*
> +	 * indicates if we need to move our IOV BAR to account for our
> +	 * allocated PE number when enabling VFs.
> +	 */
> +	bool    need_shift;
> +
>  	/* number of VFs enabled */
>  	u16     num_vfs;
>  
>  	/* pointer to the array of VF PEs. num_vfs long*/
>  	struct pnv_ioda_pe *vf_pe_arr;
>  
> -	/* Did we map the VF BARs with single-PE IODA BARs? */
> -	bool    m64_single_mode;
> +	/* Did we map the VF BAR with single-PE IODA BARs? */
> +	bool    m64_single_mode[PCI_SRIOV_NUM_BARS];
>  
>  	/*
>  	 * Bit mask used to track which m64 windows that we used to map the
> 

-- 
Alexey

  reply	other threads:[~2020-07-15  5:26 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-10  5:23 PowerNV PCI & SR-IOV cleanups Oliver O'Halloran
2020-07-10  5:23 ` [PATCH 01/15] powernv/pci: Add pci_bus_to_pnvhb() helper Oliver O'Halloran
2020-07-13  8:28   ` Alexey Kardashevskiy
2020-07-10  5:23 ` [PATCH 02/15] powerpc/powernv/pci: Always tear down DMA windows on PE release Oliver O'Halloran
2020-07-13  8:30   ` Alexey Kardashevskiy
2020-07-10  5:23 ` [PATCH 03/15] powerpc/powernv/pci: Add explicit tracking of the DMA setup state Oliver O'Halloran
2020-07-14  5:37   ` Alexey Kardashevskiy
2020-07-14  5:58     ` Oliver O'Halloran
2020-07-14  7:21       ` Alexey Kardashevskiy
2020-07-15  0:23         ` Alexey Kardashevskiy
2020-07-15  1:38         ` Oliver O'Halloran
2020-07-15  3:33           ` Alexey Kardashevskiy
2020-07-15  7:05             ` Cédric Le Goater
2020-07-15  9:00               ` Oliver O'Halloran
2020-07-15 10:05                 ` Cédric Le Goater
2020-07-10  5:23 ` [PATCH 04/15] powerpc/powernv/pci: Initialise M64 for IODA1 as a 1-1 window Oliver O'Halloran
2020-07-14  7:39   ` Alexey Kardashevskiy
2020-07-10  5:23 ` [PATCH 05/15] powerpc/powernv/sriov: Move SR-IOV into a seperate file Oliver O'Halloran
2020-07-14  9:16   ` Alexey Kardashevskiy
2020-07-22  5:01     ` Oliver O'Halloran
2020-07-22  9:53       ` Alexey Kardashevskiy
2020-07-10  5:23 ` [PATCH 06/15] powerpc/powernv/sriov: Explain how SR-IOV works on PowerNV Oliver O'Halloran
2020-07-15  0:40   ` Alexey Kardashevskiy
2020-07-10  5:23 ` [PATCH 07/15] powerpc/powernv/sriov: Rename truncate_iov Oliver O'Halloran
2020-07-15  0:46   ` Alexey Kardashevskiy
2020-07-10  5:23 ` [PATCH 08/15] powerpc/powernv/sriov: Simplify used window tracking Oliver O'Halloran
2020-07-15  1:34   ` Alexey Kardashevskiy
2020-07-15  1:41     ` Oliver O'Halloran
2020-07-10  5:23 ` [PATCH 09/15] powerpc/powernv/sriov: Factor out M64 BAR setup Oliver O'Halloran
2020-07-15  2:09   ` Alexey Kardashevskiy
2020-07-10  5:23 ` [PATCH 10/15] powerpc/powernv/pci: Refactor pnv_ioda_alloc_pe() Oliver O'Halloran
2020-07-15  2:29   ` Alexey Kardashevskiy
2020-07-15  2:53     ` Oliver O'Halloran
2020-07-15  3:15       ` Alexey Kardashevskiy
2020-07-10  5:23 ` [PATCH 11/15] powerpc/powernv/sriov: Drop iov->pe_num_map[] Oliver O'Halloran
2020-07-15  3:31   ` Alexey Kardashevskiy
2020-07-10  5:23 ` [PATCH 12/15] powerpc/powernv/sriov: De-indent setup and teardown Oliver O'Halloran
2020-07-15  4:00   ` Alexey Kardashevskiy
2020-07-15  4:21     ` Oliver O'Halloran
2020-07-15  4:41       ` Alexey Kardashevskiy
2020-07-15  4:46         ` Oliver O'Halloran
2020-07-15  4:58           ` Alexey Kardashevskiy
2020-07-10  5:23 ` [PATCH 13/15] powerpc/powernv/sriov: Move M64 BAR allocation into a helper Oliver O'Halloran
2020-07-15  4:02   ` Alexey Kardashevskiy
2020-07-10  5:23 ` [PATCH 14/15] powerpc/powernv/sriov: Refactor M64 BAR setup Oliver O'Halloran
2020-07-15  4:50   ` Alexey Kardashevskiy
2020-07-10  5:23 ` [PATCH 15/15] powerpc/powernv/sriov: Make single PE mode a per-BAR setting Oliver O'Halloran
2020-07-15  5:24   ` Alexey Kardashevskiy [this message]
2020-07-15  6:16     ` Oliver O'Halloran
2020-07-15  8:00       ` Alexey Kardashevskiy
2020-07-22  5:39         ` Oliver O'Halloran
2020-07-22 10:06           ` Alexey Kardashevskiy
2020-07-24  3:40             ` Oliver O'Halloran
2020-07-10  6:45 ` PowerNV PCI & SR-IOV cleanups Christoph Hellwig
2020-07-10 12:45   ` Oliver O'Halloran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c89dfd04-afc2-4d69-00ab-2e743d5bb844@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=oohall@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).