linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Don Dutile <ddutile@redhat.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-pci@vger.kernel.org, bhelgaas@google.com,
	yuvalmin@broadcom.com, bhutchings@solarflare.com,
	gregory.v.rose@intel.com, davem@davemloft.net
Subject: Re: [RFC v2] PCI: sysfs per device SRIOV control and status
Date: Thu, 04 Oct 2012 18:50:09 -0400	[thread overview]
Message-ID: <506E12A1.6000501@redhat.com> (raw)
In-Reply-To: <CAE9FiQWBq3seKZgqT7B=JKCovu4vOYuK0XYC+MvBgK5VWu1rbg@mail.gmail.com>

Thanks for adding Greg k-h to the thread...
(sorry about that omission on git send-email, Greg!)

On 10/04/2012 06:30 PM, Yinghai Lu wrote:
> On Thu, Oct 4, 2012 at 3:14 PM, Donald Dutile<ddutile@redhat.com>  wrote:
>> Provide files under sysfs to determine the max number of vfs
>> an SRIOV-capable PCIe device supports, and methods to enable and
>> disable the vfs on a per device basis.
>>
>> Currently, VF enablement by SRIOV-capable PCIe devices is done
>> in driver-specific module parameters.  If not setup in modprobe files,
>> it requires admin to unload&  reload PF drivers with number of desired
>> VFs to enable.  Additionally, the enablement is system wide: all
>> devices controlled by the same driver have the same number of VFs
>> enabled.  Although the latter is probably desired, there are PCI
>> configurations setup by system BIOS that may not enable that to occur.
>>
>> Three files are created if a PCIe device has SRIOV support:
>> sriov_max_vfs -- cat-ing this file returns the maximum number
>>                   of VFs a PCIe device supports.
>> sriov_enable_vfs -- echo'ing a number to this file enables this
>>                      number of VFs for this given PCIe device.
>>                   -- cat-ing this file will return the number of VFs
>>                      currently enabled on this PCIe device.
>> sriov_disable_vfs -- echo-ing any number other than 0 disables all
>>                       VFs associated with this PCIe device.
>>
>> VF enable and disablement is invoked much like other PCIe
>> configuration functions -- via registered callbacks in the driver,
>> i.e., probe, release, etc.
>>
>> v1->v2:
>> This patch is based on previous 2 patches by Yinghai Lu
>> that cleaned up the vga attributes for PCI devices under sysfs,
>> and uses visibility-checking group attributes as recommended by
>> Greg K-H.
>>
>> Signed-off-by: Donald Dutile<ddutile@redhat.com>
>> ---
>>   drivers/pci/pci-sysfs.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++-
>>   include/linux/pci.h     |   2 +
>>   2 files changed, 154 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>> index fbbb97f..e6522c2 100644
>> --- a/drivers/pci/pci-sysfs.c
>> +++ b/drivers/pci/pci-sysfs.c
>> @@ -404,6 +404,128 @@ static ssize_t d3cold_allowed_show(struct device *dev,
>>   }
>>   #endif
>>
>> +bool pci_dev_has_sriov(struct pci_dev *pdev)
>> +{
>> +        int ret = false;
>> +        int pos;
>> +
>> +        if (!pci_is_pcie(pdev))
>> +                goto out;
>> +
>> +        pos =  pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV);
>> +        if (pos)
>> +                ret = true;
>> +out:
>> +        return ret;
>> +}
>
> should use dev_is_pf?
>
No.  Only want the files to show up on pf's with sriov cap.
dev_is_pf will have the files show up on non-sriov-capable devices.

> cap does not mean you have sriov_init return successfully.
>
will find out if sriov-init fails if driver's sriov-enable callback fails!
could add a flag & check for sriov-init success...

>> +
>> +#ifdef CONFIG_PCI_IOV
>> +static ssize_t sriov_max_vfs_show(struct device *dev,
>> +                                  struct device_attribute *attr,
>> +                                  char *buf)
>> +{
>> +        struct pci_dev *pdev;
>> +
>> +        pdev = to_pci_dev(dev);
>> +        return sprintf (buf, "%u\n", pdev->sriov->total);
>> +}
>> +
>> +
>> +static ssize_t sriov_enable_vfs_show(struct device *dev,
>> +                                     struct device_attribute *attr,
>> +                                     char *buf)
>> +{
>> +        struct pci_dev *pdev;
>> +
>> +        pdev = to_pci_dev(dev);
>> +
>> +        return sprintf (buf, "%u\n", pdev->sriov->nr_virtfn);
>> +}
>> +
>> +static ssize_t sriov_enable_vfs_store(struct device *dev,
>> +                                      struct device_attribute *attr,
>> +                                      const char *buf, size_t count)
>> +{
>> +        struct pci_dev *pdev;
>> +        int num_vf_enabled = 0;
>> +        unsigned long num_vfs;
>> +        pdev = to_pci_dev(dev);
>> +
>> +        /* Requested VFs to enable<  max_vfs
>> +         * and none enabled already
>> +         */
>> +        if (strict_strtoul(buf, 0,&num_vfs)<  0)
>> +               return -EINVAL;
>
> checkpatch.pl should ask you to change to kstrtoul instead.
>
ok, thanks.  i saw strict_strtoul() used elsewhere, but
glad to update.

>> +
>> +        if ((num_vfs>  pdev->sriov->total) ||
>> +            (pdev->sriov->nr_virtfn != 0))
>> +                return -EINVAL;
>> +
>> +        /* Ready for lift-off! */
>> +        if (pdev->driver&&  pdev->driver->sriov_enable) {
>> +                num_vf_enabled = pdev->driver->sriov_enable(pdev, num_vfs);
>> +        }
>> +
>> +        if (num_vf_enabled != num_vfs)
>> +                printk(KERN_WARNING
>> +                        "%s: %04x:%02x:%02x.%d: Only %d VFs enabled \n",
>> +                        pci_name(pdev), pci_domain_nr(pdev->bus),
>> +                        pdev->bus->number, PCI_SLOT(pdev->devfn),
>> +                        PCI_FUNC(pdev->devfn), num_vf_enabled);
>> +
>> +        return count;
>
> do you need pci_remove_rescan_mutex ?
>
I saw that in your patch set; wondering if i need to now that the
files are added in sysfs device attribute framework, which would have
same race for other attributes, I would think.  Thus, I didn't add it.
Greg?

> also that enable could take a while ..., may need to use
> device_schedule_callback to start one callback.
>
agreed, but device_schedule_callback doesn't allow a parameter
to be passed. only way I can think to get around it is to
put the requested-num-vf's in the pci-(pf)-dev structure,
and have the pf driver pluck it from there.
other way to pass num-vf-to-enable?

>
>> +}
>> +
>> +static ssize_t sriov_disable_vfs_store(struct device *dev,
>> +                                       struct device_attribute *attr,
>> +                                       const char *buf, size_t count)
>> +{
>> +        struct pci_dev *pdev;
>> +       int ret = 0;
>> +
>> +        pdev = to_pci_dev(dev);
>> +
>> +        /* make sure sriov device&  at least 1 vf enabled */
>> +        if (pdev->sriov->nr_virtfn == 0) {
>> +               ret = -ENODEV;
>> +                goto out;
>> +       }
>> +
>> +        /* Ready for landing! */
>> +       /* Note: Disabling VF can fail if VF assigned to virtualized guest */
>> +        if (pdev->driver&&  pdev->driver->sriov_disable) {
>> +                ret = pdev->driver->sriov_disable(pdev);
>> +        }
>> +out:
>> +        return ret ? ret : count;
>> +}
>> +#else
>> +static ssize_t sriov_max_vfs_show(struct device *dev,
>> +                                  struct device_attribute *attr,
>> +                                  char *buf)
>> +{ return 0; }
>> +static ssize_t sriov_enable_vfs_show(struct device *dev,
>> +                                     struct device_attribute *attr,
>> +                                     char *buf)
>> +{ return 0; }
>> +static ssize_t sriov_enable_vfs_store(struct device *dev,
>> +                                      struct device_attribute *attr,
>> +                                      const char *buf, size_t count)
>> +{ return count; }
>> +static ssize_t sriov_disable_vfs_store(struct device *dev,
>> +                                       struct device_attribute *attr,
>> +                                       const char *buf, size_t count)
>> +{ return count; }
>> +#endif /* CONFIG_PCI_IOV */
>> +
>> +static struct device_attribute sriov_max_vfs_attr = __ATTR_RO(sriov_max_vfs);
>> +static struct device_attribute sriov_enable_vfs_attr =
>> +               __ATTR(sriov_enable_vfs, (S_IRUGO|S_IWUSR|S_IWGRP),
>> +                       sriov_enable_vfs_show, sriov_enable_vfs_store);
>> +static struct device_attribute sriov_disable_vfs_attr =
>> +               __ATTR(sriov_disable_vfs, (S_IWUSR|S_IWGRP),
>> +                       NULL, sriov_disable_vfs_store);
>> +
>>   struct device_attribute pci_dev_attrs[] = {
>>          __ATTR_RO(resource),
>>          __ATTR_RO(vendor),
>> @@ -1408,8 +1530,15 @@ static struct attribute *pci_dev_dev_attrs[] = {
>>          NULL,
>>   };
>>
>> +static struct attribute *sriov_dev_attrs[] = {
>> +&sriov_max_vfs_attr.attr,
>> +&sriov_enable_vfs_attr.attr,
>> +&sriov_disable_vfs_attr.attr,
>> +       NULL,
>> +};
>> +
>>   static umode_t pci_dev_attrs_are_visible(struct kobject *kobj,
>> -                                               struct attribute *a, int n)
>> +                                        struct attribute *a, int n)
>>   {
>>          struct device *dev = container_of(kobj, struct device, kobj);
>>          struct pci_dev *pdev = to_pci_dev(dev);
>> @@ -1421,13 +1550,35 @@ static umode_t pci_dev_attrs_are_visible(struct kobject *kobj,
>>          return a->mode;
>>   }
>>
>> +static umode_t sriov_attrs_are_visible(struct kobject *kobj,
>> +                                        struct attribute *a, int n)
>> +{
>> +       struct device *dev = container_of(kobj, struct device, kobj);
>> +       struct pci_dev *pdev = to_pci_dev(dev);
>> +
>> +       if ((a ==&sriov_max_vfs_attr.attr) ||
>> +           (a ==&sriov_enable_vfs_attr.attr) ||
>> +           (a ==&sriov_disable_vfs_attr.attr)) {
>> +               if (!pci_dev_has_sriov(pdev))
>> +                       return 0;
>> +       }
>> +
>> +       return a->mode;
>> +}
>> +
>>   static struct attribute_group pci_dev_attr_group = {
>>          .attrs = pci_dev_dev_attrs,
>>          .is_visible = pci_dev_attrs_are_visible,
>>   };
>>
>> +static struct attribute_group sriov_dev_attr_group = {
>> +       .attrs = sriov_dev_attrs,
>> +       .is_visible = sriov_attrs_are_visible,
>
> you could fold contents in sriov_dev_attrs and sriov_attrs_are_visible into
> pci_dev_dev_attrs and pci_dev_attrs_are_visible.
>
Yes, I did that originally, but since you had
the attr_groups[] below, I figured you wanted it separate,
and it makes clear division for other cleanup (pm attributes?).

>> +};
>> +
>>   static const struct attribute_group *pci_dev_attr_groups[] = {
>>          &pci_dev_attr_group,
>> +&sriov_dev_attr_group,
>>          NULL,
>>   };
>>
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index be1de01..b3e25a8 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -595,6 +595,8 @@ struct pci_driver {
>>          int  (*resume_early) (struct pci_dev *dev);
>>          int  (*resume) (struct pci_dev *dev);                   /* Device woken up */
>>          void (*shutdown) (struct pci_dev *dev);
>> +       int (*sriov_enable) (struct pci_dev *dev, int num_vfs);    /* PF pci dev */
>> +       int (*sriov_disable) (struct pci_dev *dev);
>>          const struct pci_error_handlers *err_handler;
>>          struct device_driver    driver;
>>          struct pci_dynids dynids;
>> --
>> 1.7.10.2.552.gaa3bb87
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks for prompt review & feedback.
I'll wait for other feedback b4 spinning a final PATCH.

  reply	other threads:[~2012-10-04 22:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-04 22:14 [RFC v2] PCI: sysfs per device SRIOV control and status Donald Dutile
2012-10-04 22:30 ` Yinghai Lu
2012-10-04 22:50   ` Don Dutile [this message]
2012-10-04 23:05     ` Yinghai Lu
2012-10-05 15:10       ` Don Dutile
2012-10-06 18:21 ` Alexander Duyck
2012-10-08 15:44   ` Greg Rose
2012-10-09 18:39     ` Don Dutile
2012-10-09 20:31       ` Rose, Gregory V
2012-10-09 22:49         ` Don Dutile
2012-10-24  5:47           ` Yuval Mintz
2012-10-25 13:19             ` Don Dutile
2012-10-09 16:12   ` Don Dutile

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=506E12A1.6000501@redhat.com \
    --to=ddutile@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=gregory.v.rose@intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=yinghai@kernel.org \
    --cc=yuvalmin@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).