From: <Jordan_Hargrave@Dell.com>
To: <hare@suse.de>, <jharg93@gmail.com>
Cc: <bhelgaas@google.com>, <alexander.duyck@gmail.com>,
<linux-pci@vger.kernel.org>, <babu.moger@oracle.com>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] Create sysfs entries for PCI VPDI and VPDR tags
Date: Fri, 19 Feb 2016 19:44:21 +0000 [thread overview]
Message-ID: <1455910852096.10012@Dell.com> (raw)
In-Reply-To: <56C72420.4070601@suse.de>
>On 02/19/2016 03:07 PM, Jordan Hargrave wrote:
>> On Fri, Feb 19, 2016 at 4:00 AM, Hannes Reinecke <hare@suse.de> wrote:
>>>
>>> On 02/18/2016 09:04 PM, Jordan Hargrave wrote:
>>>> The VPD-R is a readonly area of the PCI Vital Product Data region.
>>>> There are some standard keywords for serial number, manufacturer,
>>>> and vendor-specific values. Dell Servers use a vendor-specific
>>>> tag to store number of ports and port mapping of partitioned NICs.
>>>>
>>>> info = VPD-Info string
>>>> PN = Part Number
>>>> SN = Serial Number
>>>> MN = Manufacturer ID
>>>> Vx = Vendor-specific (x=0..9 A..Z)
>>>>
>>>> This creates a sysfs subdirectory in the pci device: vpdattr with
>>>> 'info', 'EC', 'SN', 'V0', etc. files containing the tag values.
>>>>
>>>> Signed-off-by: Jordan Hargrave <Jordan_Hargrave@dell.com>
>>> Hmm. Can we first get an agreement on the PCI VPD parsing patches
>>> I've posted earlier?
>>> VPD parsing is really tricky, and we should aim on making the
>>> read_vpd function robust enough before we begin putting things into
>>> sysfs.
>>>
>>> Also, I'm not utterly keen on this patchset.
>>> The sysfs space is blown up with tiny pieces of information, which
>>> can easily gotten via lspci, too.
>>>
>>> Also, to my knowledge it's perfectly valid to _write_ to the VPD, in
>>> which case the entire sysfs attribute setup would be invalided.
>>> How do you propose to handle that?
>>>
>>
>> This patch only reads the attributes from VPD-I and VPD-R areas, not
>> the VPD-W (read write) area.
>> The VPD-W data is located after the VPD-I and VPD-R area So nothing
>> in these attributes should change.
>>
>Ah. Ok.
>
>> The main reason I want this is for replacing biosdevname (ethernet
>> naming) functionality and getting the same functionality into the
>> kernel and systemd. Systemd doesn't want to do vpd parsing, and
>> reading the vpd can take a very long time on some devices, causing
>> systemd to timeout. Another disadvantage of it being in userspace
>> is for devices using SR-IOV. In those devices the vpd only
>> exists for the physfn devices but not the virtual devices. A
>> userspace program device will have to read the entire VPD for
>> each physical and virtual PCI device.
>>
>> Logic is something like this:
>> if (open("/sys/bus/pci/devices/X/physfn/vpd", O_RDONLY) < 0)
>> if (open("/sys/bus/pci/devices/X/vpd", O_RDONLY) < 0)
>> return;
>> }
>> parsevpd(fd);
>>
>> Specifically it is parsing one of the Vx attributes for a 'DCM' or
>> 'DC2' string that contain a mapping from
>> NIC ports and partitions to PCI device
>>
>Well, unfortunately you just gave a very good reason to _not_
>include this into the kernel:
The delay isn't a huge amount on any of the devices I've seen. The
Mellanox cards I have are the slowest. Here's some timing tests I've done,
using this patch vs a readvpd utility. I also compared it with lspci.
@@@ Read individual Broadcom
(time ./readvpd 0000:01:00.0 > /dev/null) &>>log
real 0m0.003s
user 0m0.002s
sys 0m0.002s
@@@ Read individual Mellanox
(time ./readvpd 0000:04:00.0 > /dev/null) &>>log
real 0m0.071s
user 0m0.001s
sys 0m0.070s
@@@ Read individual Broadcom using lspci
(time lspci -vvv -s 0000:01:00.0 > /dev/null) &>>log
real 0m0.036s
user 0m0.017s
sys 0m0.019s
@@@ Read individual Mellanox using lspci
(time lspci -vvv -s 0000:04:00.0 > /dev/null) &>>log
real 0m1.213s <--- SLOW!!!!
user 0m0.012s
sys 0m1.201s
@@@ Read each network device with 'real' VPD. This should be equivalent to the boot time delay, at least for network devices with VPD
(time for X in /sys/class/net/*/device ; do PF=$(readlink -f $X); SBDF=$(basename $PF) ; if [ -e $PF/vpdattr ] ; then echo ==== $X ; ./readvpd $SBDF > /dev/null; fi ; done) &>> log
==== /sys/class/net/eno1/device
==== /sys/class/net/eno2/device
==== /sys/class/net/eno3/device
==== /sys/class/net/eno4/device
==== /sys/class/net/eno5/device
==== /sys/class/net/eno6/device
==== /sys/class/net/enp4s0d1/device
==== /sys/class/net/enp4s0/device
real 0m0.319s
user 0m0.033s
sys 0m0.295s
@@@ Read each network device, including SR-IOV
(time for X in /sys/class/net/*/device ; do PF=$(readlink -f $X); if [ -e $PF/physfn ] ; then PF=$(readlink -f $PF/physfn) ; fi ; SBDF=$(basename $PF) ; if [ -e $PF/vpdattr ] ; then echo ==== $X ; ./readvpd $SBDF > /dev/null; fi ; done) &>> log
==== /sys/class/net/eno1/device
==== /sys/class/net/eno2/device
==== /sys/class/net/eno3/device
==== /sys/class/net/eno4/device
==== /sys/class/net/eno5/device
==== /sys/class/net/eno6/device
==== /sys/class/net/enp4s0d1/device
==== /sys/class/net/enp4s0/device
==== /sys/class/net/enp4s0f1d1/device (SR-IOV)
==== /sys/class/net/enp4s0f1/device (SR-IOV)
==== /sys/class/net/enp4s0f2d1/device (SR-IOV)
==== /sys/class/net/enp4s0f2/device (SR-IOV)
==== /sys/class/net/enp4s0f3d1/device (SR-IOV)
==== /sys/class/net/enp4s0f3/device (SR-IOV)
==== /sys/class/net/enp4s0f4d1/device (SR-IOV)
==== /sys/class/net/enp4s0f4/device (SR-IOV)
==== /sys/class/net/enp4s0f5d1/device (SR-IOV)
==== /sys/class/net/enp4s0f5/device (SR-IOV)
==== /sys/class/net/enp4s0f6d1/device (SR-IOV)
==== /sys/class/net/enp4s0f6/device (SR-IOV)
==== /sys/class/net/enp4s0f7d1/device (SR-IOV)
==== /sys/class/net/enp4s0f7/device (SR-IOV)
==== /sys/class/net/enp4s1d1/device (SR-IOV)
==== /sys/class/net/enp4s1/device (SR-IOV)
real 0m1.449s
user 0m0.047s
sys 0m1.412s
This is much slower as it has to re-read/parse the VPD data for each SR-IOV device
By contrast, here is using cached kernel entries (including virtual devices)
(time for X in /sys/class/net/*/device ; do PF=$(readlink -f $X); if [ -e $PF/physfn ] ; then PF=$(readlink -f $PF/physfn) ; fi ; if [ -e $PF/vpdattr ] ; then echo ==== $X ; cat $PF/vpdattr/* > /dev/null; fi ; done) &> log
==== /sys/class/net/eno1/device
==== /sys/class/net/eno2/device
==== /sys/class/net/eno3/device
==== /sys/class/net/eno4/device
==== /sys/class/net/eno5/device
==== /sys/class/net/eno6/device
==== /sys/class/net/enp4s0d1/device
==== /sys/class/net/enp4s0/device
==== /sys/class/net/enp4s0f1d1/device
==== /sys/class/net/enp4s0f1/device
==== /sys/class/net/enp4s0f2d1/device
==== /sys/class/net/enp4s0f2/device
==== /sys/class/net/enp4s0f3d1/device
==== /sys/class/net/enp4s0f3/device
==== /sys/class/net/enp4s0f4d1/device
==== /sys/class/net/enp4s0f4/device
==== /sys/class/net/enp4s0f5d1/device
==== /sys/class/net/enp4s0f5/device
==== /sys/class/net/enp4s0f6d1/device
==== /sys/class/net/enp4s0f6/device
==== /sys/class/net/enp4s0f7d1/device
==== /sys/class/net/enp4s0f7/device
==== /sys/class/net/enp4s1d1/device
==== /sys/class/net/enp4s1/device
real 0m0.212s
user 0m0.050s
sys 0m0.175s
>> reading the vpd can take a very long time on some devices, causing
>
>If we were to put your patch in, we would need to read the VPD
>_during each boot_, thereby slowing down the booting process noticeably.
>Plus the additional risk of locking up during boot for misbehaving
>PCI devices. Probably not something we should be doing.
>
>I would rather have it delegated to some helper function/program
>invoked from udev; with my latest patchset we always will have
>well-behaved VPD information so it's easy to just read the vpd
>attribute from sysfs.
>There still might be a lag, but surely not so long as if to timeout
>udev. And if we still encounter these devices I would mark them as
>broken via the blacklist and skip VPD reading for those.
>
>Cheers,
>
>Hannes
>--
>Dr. Hannes Reinecke Teamlead Storage & Networking
>hare@suse.de +49 911 74053 688
>SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
>GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
>HRB 21284 (AG Nürnberg)
>
next prev parent reply other threads:[~2016-02-19 19:44 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-18 20:04 [PATCH] Create sysfs entries for PCI VPDI and VPDR tags Jordan Hargrave
2016-02-19 10:00 ` Hannes Reinecke
2016-02-19 14:07 ` Jordan Hargrave
2016-02-19 14:18 ` Hannes Reinecke
2016-02-19 19:44 ` Jordan_Hargrave [this message]
2016-04-10 21:26 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1455910852096.10012@Dell.com \
--to=jordan_hargrave@dell.com \
--cc=alexander.duyck@gmail.com \
--cc=babu.moger@oracle.com \
--cc=bhelgaas@google.com \
--cc=hare@suse.de \
--cc=jharg93@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.