From mboxrd@z Thu Jan 1 00:00:00 1970 From: Don Dutile Subject: Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0 Date: Tue, 29 May 2018 10:29:49 -0400 Message-ID: References: <20180402224652.4058-1-jakub.kicinski@netronome.com> <20180524235748.GD15320@bhelgaas-glaptop.roam.corp.google.com> <20180524182015.1af7b4c9@cakuba> <20180525140223.GA45098@bhelgaas-glaptop.roam.corp.google.com> <88390255-55f7-57a6-5324-d443373d1984@redhat.com> <20180525204651.GA92995@bhelgaas-glaptop.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Jakub Kicinski , Bjorn Helgaas , linux-pci@vger.kernel.org, netdev@vger.kernel.org, Sathya Perla , Felix Manlunas , alexander.duyck@gmail.com, john.fastabend@gmail.com, Jacob Keller , oss-drivers@netronome.com, Christoph Hellwig To: Bjorn Helgaas Return-path: Received: from mx1.redhat.com ([209.132.183.28]:55764 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935064AbeE2O3w (ORCPT ); Tue, 29 May 2018 10:29:52 -0400 In-Reply-To: <20180525204651.GA92995@bhelgaas-glaptop.roam.corp.google.com> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 05/25/2018 04:46 PM, Bjorn Helgaas wrote: > On Fri, May 25, 2018 at 03:27:52PM -0400, Don Dutile wrote: >> On 05/25/2018 10:02 AM, Bjorn Helgaas wrote: >>> On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote: >>>> Hi Bjorn! >>>> >>>> On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote: >>>>> On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote: >>>>>> Some user space depends on enabling sriov_totalvfs number of VFs >>>>>> to not fail, e.g.: >>>>>> >>>>>> $ cat .../sriov_totalvfs > .../sriov_numvfs >>>>>> >>>>>> For devices which VF support depends on loaded FW we have the >>>>>> pci_sriov_{g,s}et_totalvfs() API. However, this API uses 0 as >>>>>> a special "unset" value, meaning drivers can't limit sriov_totalvfs >>>>>> to 0. Remove the special values completely and simply initialize >>>>>> driver_max_VFs to total_VFs. Then always use driver_max_VFs. >>>>>> Add a helper for drivers to reset the VF limit back to total. >>>>> >>>>> I still can't really make sense out of the changelog. >>>>> >>>>> I think part of the reason it's confusing is because there are two >>>>> things going on: >>>>> >>>>> 1) You want this: >>>>> pci_sriov_set_totalvfs(dev, 0); >>>>> x = pci_sriov_get_totalvfs(dev) >>>>> >>>>> to return 0 instead of total_VFs. That seems to connect with >>>>> your subject line. It means "sriov_totalvfs" in sysfs could be >>>>> 0, but I don't know how that is useful (I'm sure it is; just >>>>> educate me :)) >>>> >>>> Let me just quote the bug report that got filed on our internal bug >>>> tracker :) >>>> >>>> When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes >>>> errors because Juju gets the sriov_totalvfs for SR-IOV-capable device >>>> then tries to set that as the sriov_numvfs parameter. >>>> >>>> For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0, >>>> but it's set to max. When FW is switched to flower*, the correct >>>> sriov_totalvfs value is presented. >>>> >>>> * flower is a project name >>> >>> From the point of view of the PCI core (which knows nothing about >>> device firmware and relies on the architected config space described >>> by the PCIe spec), this sounds like an erratum: with some firmware >>> installed, the device is not capable of SR-IOV, but still advertises >>> an SR-IOV capability with "TotalVFs > 0". >>> >>> Regardless of whether that's an erratum, we do allow PF drivers to use >>> pci_sriov_set_totalvfs() to limit the number of VFs that may be >>> enabled by writing to the PF's "sriov_numvfs" sysfs file. >>> >> +1. >> >>> But the current implementation does not allow a PF driver to limit VFs >>> to 0, and that does seem nonsensical. >>> >> Well, not really -- claiming to support VFs, and then wanting it to be 0... >> I could certainly argue is non-sensical. >> From a sw perspective, sure, see if we can set VFs to 0 (and reset to another value later). >> >> /me wishes that implementers would follow the architecture vs torquing it into strange shapes. >> >>>> My understanding is OpenStack uses sriov_totalvfs to determine how many >>>> VFs can be enabled, looks like this is the code: >>>> >>>> http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464 >>>> >>>>> 2) You're adding the pci_sriov_reset_totalvfs() interface. I'm not >>>>> sure what you intend for this. Is *every* driver supposed to >>>>> call it in .remove()? Could/should this be done in the core >>>>> somehow instead of depending on every driver? >>>> >>>> Good question, I was just thinking yesterday we may want to call it >>>> from the core, but I don't think it's strictly necessary nor always >>>> sufficient (we may reload FW without re-probing). >>>> >>>> We have a device which supports different number of VFs based on the FW >>>> loaded. Some legacy FWs does not inform the driver how many VFs it can >>>> support, because it supports max. So the flow in our driver is this: >>>> >>>> load_fw(dev); >>>> ... >>>> max_vfs = ask_fw_for_max_vfs(dev); >>>> if (max_vfs >= 0) >>>> return pci_sriov_set_totalvfs(dev, max_vfs); >>>> else /* FW didn't tell us, assume max */ >>>> return pci_sriov_reset_totalvfs(dev); >>>> >>>> We also reset the max on device remove, but that's not strictly >>>> necessary. >>>> >>>> Other users of pci_sriov_set_totalvfs() always know the value to set >>>> the total to (either always get it from FW or it's a constant). >>>> >>>> If you prefer we can work out the correct max for those legacy cases in >>>> the driver as well, although it seemed cleaner to just ask the core, >>>> since it already has total_VFs value handy :) >>>> >>>>> I'm also having a hard time connecting your user-space command example >>>>> with the rest of this. Maybe it will make more sense to me tomorrow >>>>> after some coffee. >>>> >>>> OpenStack assumes it will always be able to set sriov_numvfs to >>>> sriov_totalvfs, see this 'if': >>>> >>>> http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512 >>> >>> Thanks for educating me. I think there are two issues here that we >>> can separate. I extracted the patch below for the first. >>> >>> The second is the question of resetting driver_max_VFs. I think we >>> currently have a general issue in the core: >>> >>> - load PF driver 1 >>> - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs >>> - unload PF driver 1 >>> - load PF driver 2 >>> >>> Now driver_max_VFs is still stuck at the lower value set by driver 1. >>> I don't think that's the way this should work. >>> >>> I guess this is partly a consequence of setting driver_max_VFs in >>> sriov_init(), which is called before driver attach and should only >> um, if it's at sriov_init() how is max changed by a PF driver? >> or am I missing something subtle (a new sysfs param) as to what is being changed? > > sriov_init() basically just sets the default driver_max_VFs to Total_VFs. > > If the PF driver later calls pci_sriov_set_totalvfs(), it can reduce > driver_max_VFs. > > My concern is that there's nothing that resets driver_max_VFs back to > Total_VFs if we unload and reload the PF driver. > ok, gotcha. any complication of this non-arch quirk. :-/ >>> depend on hardware characteristics, so it is related to the patch >>> below. But I think we should fix it in general, not just for >>> netronome. >>> >>> >>> commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c >>> Author: Jakub Kicinski >>> Date: Fri May 25 08:18:34 2018 -0500 >>> >>> PCI/IOV: Allow PF drivers to limit total_VFs to 0 >>> Some SR-IOV PF drivers implement .sriov_configure(), which allows >>> user-space to enable VFs by writing the desired number of VFs to the sysfs >>> "sriov_numvfs" file (see sriov_numvfs_store()). >>> The PCI core limits the number of VFs to the TotalVFs advertised by the >>> device in its SR-IOV capability. The PF driver can limit the number of VFs >>> to even fewer (it may have pre-allocated data structures or knowledge of >>> device limitations) by calling pci_sriov_set_totalvfs(), but previously it >>> could not limit the VFs to 0. >>> Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed >>> by the PF driver, even if the limit is 0. >>> This sequence: >>> pci_sriov_set_totalvfs(dev, 0); >>> x = pci_sriov_get_totalvfs(dev); >>> previously set "x" to TotalVFs from the SR-IOV capability. Now it will set >>> "x" to 0. >>> Signed-off-by: Jakub Kicinski >>> Signed-off-by: Bjorn Helgaas >>> >>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c >>> index 192b82898a38..d0d73dbbd5ca 100644 >>> --- a/drivers/pci/iov.c >>> +++ b/drivers/pci/iov.c >>> @@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos) >>> iov->nres = nres; >>> iov->ctrl = ctrl; >>> iov->total_VFs = total; >>> + iov->driver_max_VFs = total; >>> pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device); >>> iov->pgsz = pgsz; >>> iov->self = dev; >>> @@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev) >>> if (!dev->is_physfn) >>> return 0; >>> - if (dev->sriov->driver_max_VFs) >>> - return dev->sriov->driver_max_VFs; >>> - >>> - return dev->sriov->total_VFs; >>> + return dev->sriov->driver_max_VFs; >>> } >>> EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs); >>> >>