linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Don Dutile <ddutile@redhat.com>
To: Greg Rose <gregory.v.rose@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
	linux-pci@vger.kernel.org, bhelgaas@google.com,
	yuvalmin@broadcom.com, bhutchings@solarflare.com,
	davem@davemloft.net
Subject: Re: [RFC v2] PCI: sysfs per device SRIOV control and status
Date: Tue, 09 Oct 2012 14:39:51 -0400	[thread overview]
Message-ID: <50746F77.60201@redhat.com> (raw)
In-Reply-To: <20121008084439.00006e56@unknown>

On 10/08/2012 11:44 AM, Greg Rose wrote:
> On Sat, 6 Oct 2012 11:21:00 -0700
> Alexander Duyck<alexander.duyck@gmail.com>  wrote:
>
>> On 10/4/2012 3:14 PM, Donald Dutile wrote:
>>> Provide files under sysfs to determine the max number of vfs
>>> an SRIOV-capable PCIe device supports, and methods to enable and
>>> disable the vfs on a per device basis.
>>>
>>> Currently, VF enablement by SRIOV-capable PCIe devices is done
>>> in driver-specific module parameters.  If not setup in modprobe
>>> files, it requires admin to unload&  reload PF drivers with number
>>> of desired VFs to enable.  Additionally, the enablement is system
>>> wide: all devices controlled by the same driver have the same
>>> number of VFs enabled.  Although the latter is probably desired,
>>> there are PCI configurations setup by system BIOS that may not
>>> enable that to occur.
>>>
>>> Three files are created if a PCIe device has SRIOV support:
>>> sriov_max_vfs -- cat-ing this file returns the maximum number
>>>                    of VFs a PCIe device supports.
>>> sriov_enable_vfs -- echo'ing a number to this file enables this
>>>                       number of VFs for this given PCIe device.
>>>                    -- cat-ing this file will return the number of VFs
>>>                       currently enabled on this PCIe device.
>>> sriov_disable_vfs -- echo-ing any number other than 0 disables all
>>>                        VFs associated with this PCIe device.
>>>
>>> VF enable and disablement is invoked much like other PCIe
>>> configuration functions -- via registered callbacks in the driver,
>>> i.e., probe, release, etc.
>>>
>>> v1->v2:
>>> This patch is based on previous 2 patches by Yinghai Lu
>>> that cleaned up the vga attributes for PCI devices under sysfs,
>>> and uses visibility-checking group attributes as recommended by
>>> Greg K-H.
>>>
>>> Signed-off-by: Donald Dutile<ddutile@redhat.com>
>>
>> I'm not certain if having separate values/functions for enable and
>> disabling VFs will work out very well.  It seems like it would
>> probably make more sense just to have one value that manipulates
>> NumVFs.  As is, in the sriov_enable_vfs case we would end up having
>> to disable SR-IOV should we be changing the number of VFs anyway so
>> it would probably make sense to just merge both values into a single
>> one and use 0 as the value to disable SR-IOV.
>>
>> Also in the naming scheme for the values it may preferable to use the
>> names of the values from the SR-IOV PCIe capability to avoid any
>> confusion on what the values represent.  I believe the names for the
>> two values you are representing are TotalVFs and NumVFs.
>
> Alex makes a good point here.  If you could change sriov_max_vfs to
> total_vfs that would help.
>
consider it done; pls read my reply to Alex's suggestion(s).

> And a single callback is all that is necessary, especially if you take
> into account my further comments below.  After looking at the
> implementation and thinking about it a bit over the weekend I see some
> problems that we should probably address.
>
+1.

> In my opinion we shouldn't be having the endpoint device driver call
> the pci_enable_sriov() and pci_disable_sriov() functions as this
> implementation requires.  The pci bus driver should do that itself and
> then just notify the endpoint device driver of the action and, in the
> case of enabling VFs, tell it how many were enabled, with the num_vfs
> parameter.
>
I want to double-check this logic, but on first glance I agree for this API...
But, until the PF driver's, param-based, 'max_vfs' interface is removed,
the driver will still have to have a code path that does the enable/disable_sriov().
Again, I agree it is flawed, as a bad driver not calling disable_sriov()
will leak resources and quickly fail other PCI (hotplug) ops (be they VFs or PFs).
Q: what did you mean by 'and then just notify the endpoint device driver' ?
    -- another/new api for pf drivers w/sriov support/device ??

> There is a good reason for this too. I asked you to change the return
> value of sriov_disable_vfs() to an int so that the driver could check
> if VFs are assigned and return an error.  But the more I think of I
> feel that is backwards.  This means that the system is at the mercy of
> the endpoint device driver to do the check for assigned VFs.  Badly
> written drivers might not do that check and then crash the system. Plus
> that means every driver has to write the same code to check for
> assigned VFs which could easily be put into the pci bus driver.
>
Isn't this code still needed when user-level unbind done, i.e.,
echo 1 > /sys/bus/pci/drivers/ixgbe/unbind

remove_id is another one, but bind/unbind the common path for virtualization

> If the pci bus driver were to check for assigned VFs then it could take
> steps to protect the system and not depend on dumb device driver
> writers such as yours truly to do it.
>


> I've pasted in the code below from the ixgbe driver that checks for
> assigned VFs.  The pci bus driver has access to all the necessary
> information for the check but it would need to get the VF device ID
> from the PF SR-IOV capability structure.
>
easy to do; already done & in the vid/did of the vf's pci-dev structure.

> - Greg
>
> --------------
>
> static bool ixgbe_vfs_are_assigned(struct ixgbe_adapter *adapter)
> {
>          struct pci_dev *pdev = adapter->pdev;
>          struct pci_dev *vfdev;
>          int dev_id;
>
>          switch (adapter->hw.mac.type) {
>          case ixgbe_mac_82599EB:
>                  dev_id = IXGBE_DEV_ID_82599_VF;
>                  break;
>          case ixgbe_mac_X540:
>                  dev_id = IXGBE_DEV_ID_X540_VF;
>                  break;
>          default:
>                  return false;
>          }
>
>          /* loop through all the VFs to see if we own any that are assigned */
>          vfdev = pci_get_device(PCI_VENDOR_ID_INTEL, dev_id, NULL);
>          while (vfdev) {
>                  /* if we don't own it we don't care */
>                  if (vfdev->is_virtfn&&  vfdev->physfn == pdev) {
>                          /* if it is assigned we cannot release it */
>                          if (vfdev->dev_flags&  PCI_DEV_FLAGS_ASSIGNED)
>                                  return true;
>                  }
>
>                  vfdev = pci_get_device(PCI_VENDOR_ID_INTEL, dev_id, vfdev);
>          }
>
>          return false;
> }
>
>


  reply	other threads:[~2012-10-09 18:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-04 22:14 [RFC v2] PCI: sysfs per device SRIOV control and status Donald Dutile
2012-10-04 22:30 ` Yinghai Lu
2012-10-04 22:50   ` Don Dutile
2012-10-04 23:05     ` Yinghai Lu
2012-10-05 15:10       ` Don Dutile
2012-10-06 18:21 ` Alexander Duyck
2012-10-08 15:44   ` Greg Rose
2012-10-09 18:39     ` Don Dutile [this message]
2012-10-09 20:31       ` Rose, Gregory V
2012-10-09 22:49         ` Don Dutile
2012-10-24  5:47           ` Yuval Mintz
2012-10-25 13:19             ` Don Dutile
2012-10-09 16:12   ` Don Dutile

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50746F77.60201@redhat.com \
    --to=ddutile@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=gregory.v.rose@intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=yuvalmin@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).