Re: `pcilib: sysfs_read_vpd: read failed: Input/output error` on ASRock E350M1

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Paul Menzel <paulepanter@users.sourceforge.net>
To: Myron Stowe <myron.stowe@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>, linux-pci@vger.kernel.org
Subject: Re: `pcilib: sysfs_read_vpd: read failed: Input/output error` on ASRock E350M1
Date: Sat, 11 Jun 2016 14:12:02 +0200	[thread overview]
Message-ID: <1465647122.2279.18.camel@users.sourceforge.net> (raw)
In-Reply-To: <1463861932.2810.8.camel@users.sourceforge.net>

[-- Attachment #1: Type: text/plain, Size: 9424 bytes --]

Dear Myron, dear Linux folks,


Am Samstag, den 21.05.2016, 22:18 +0200 schrieb Paul Menzel:

> Am Dienstag, den 17.05.2016, 19:52 -0600 schrieb Myron Stowe:
> > On Tue, May 17, 2016 at 3:21 PM, Paul Menzel wrote:
> 
> > > Am Montag, den 16.05.2016, 19:42 -0600 schrieb Myron Stowe:
> > > > On Mon, May 16, 2016 at 2:37 PM, Paul Menzel wrote:
> > > 
> > > > > Am Montag, den 16.05.2016, 12:02 -0600 schrieb Myron Stowe:
> > > > > > 
> > > > > > On Sun, May 15, 2016 at 10:56 AM, Paul Menzel wrote:
> > > > > […]
> > > > > > > 
> > > > > > > Is that related to [1]?
> > > > > > There have been a number of recent kernel commits to work-around know
> > > > > > buggy devices -
> > > > > >   v4.6-rc5
> > > > > >     67e6587  cxgb4: Set VPD size so we can read both VPD structures
> > > > > >     cb92148  PCI: Add pci_set_vpd_size() to set VPD size
> > > > > >   v4.6 -
> > > > > >     7c20078  PCI: Prevent VPD access for buggy devices
> > > > > >     c521b01  PCI: Sleep rather than busy-wait for VPD access completion
> > > > > >     408641e  PCI: Fold struct pci_vpd_pci22 into struct pci_vpd
> > > > > >     f1cd93f  PCI: Rename VPD symbols to remove unnecessary "pci22"
> > > > > >     da00684  PCI: Remove struct pci_vpd_ops.release function pointer
> > > > > >     6437907  PCI: Move pci_vpd_release() from header file to pci/access.c
> > > > > >     fc0a407  PCI: Move pci_read_vpd() and pci_write_vpd() close to other code
> > > > > >     104daa7  PCI: Determine actual VPD size on first access
> > > > > >     c556388  PCI: Use bitfield instead of bool for struct pci_vpd_pci22.busy
> > > > > >     f52e562  PCI: Allow access to VPD attributes with size 0
> > > > > >     9eb45d5  PCI: Update VPD definitions
> > > > > >   v4.3 -
> > > > > >     da2d03e  PCI: Use function 0 VPD only for identical functions
> > > > > >     9d92407  PCI: Fix devfn for VPD access through function 0
> > > > > >     7aa6ca4  PCI: Add VPD function 0 quirk for Intel Ethernet devices
> > > > > >     932c435  PCI: Add dev_flags bit to access VPD through function 0
> > > > > > 
> > > > > > What were you getting/seeing before?
> > > > > The error wasn’t shown. Sorry if that is not a helpful answer. If I
> > > > > provide more information, please tell me how I can get them.
> > > > I expect that prior to the kernel updates you were seeing:
> > > > 
> > > > lspci -s 3:0 -vv
> > > > ...
> > > >         Capabilities: [d0] Vital Product Data
> > > >                 Unknown small resource type 00, will not decode more.
> > > > ...
> > > 
> > > From logs stored in coreboot’s board status repository [2].
> > > 
> > > ```
> > > $ more asrock/e350m1/4.3-817-g477a0d6/2016-04-22T18_41_34Z/lspci-vvxxx.txt
> > > […]
> > >        Capabilities: [d0] Vital Product Data
> > >                No end tag found
> > > […]
> > > ```
> > 
> > The above - "No end tag found" - is from 'lspci' reading the VPD area
> > and parsing it looking for an small resource data type "End Tag"
> > (0x0f) and never finding one.  Thus it seems like the VPD area is
> > malformed at best.
> > 
> > > > and with 'xxd', looking at the VPD area you likely would have seen:
> > > > 
> > > > # xxd /sys/bus/pci/devices/0000\:03\:00.0/vpd
> > > > 0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> > > > 0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> > > > 0000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> > > > ...
> > > > 
> > > > and nothing in the 'dmesg' logs.
> > > 
> > > ```
> > > $ more asrock/e350m1/4.3-817-g477a0d6/2016-04-22T18_41_34Z/lspci-vvxxx.txt
> > > […]
> > > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > e0: 10 00 01 01 01 12 63 80 00 00 00 00 00 00 00 00
> > > f0: 00 00 00 00 00 80 80 00 00 00 00 00 00 00 00 00
> > > […]
> > > ```
> > 
> > Careful here, the above is _not_ the VPD area.  The above is the
> > device's standard PCI (type 0) configuration header registers [I'm
> > thinking you mistook the 0xd0 from "Capabilities: [d0] Vital Product
> > Data" and thought that was the offset for the VPD area - it is not.
> > To get to the VPD area one has to manipulate a couple of VPD registers
> > which is basically obtaining the VPD bytes from within the device
> > itself (i.e. from within the device's firmware)].
> > 
> > To get the VPD area you will need to run a kernel prior to the commits
> > I pointed out and then use 'xxd' as shown above to gain insight into
> > what, if anything, is really there.
> > 
> > > There is nothing in the Linux 3.16 messages.
> > > 
> > > > With the recent kernel updates you see the new output from the kernel
> > > > and 'lspci -vv' that started this thread.  I expect that you also get
> > > > something close to the following from 'xxd':
> > > > 
> > > > # xxd /sys/bus/pci/devices/0000\:03\:00.0/vpd
> > > > xxd: Input/output error
> > > 
> > > Correct.
> > > 
> > > > and also see the following in the 'dmesg' logs.
> > > > 
> > > > # dmesg | grep VPD
> > > > r8169 0000:03:00.0: invalid short VPD tag 00 at offset 1
> > > 
> > > Indeed similar:
> > > 
> > > ```
> > > $ dmesg | grep VPD
> > > [15841.258561] r8169 0000:03:00.0: invalid large VPD tag 7f at offset 0
> > > ```
> > 
> > Humm, the above is interesting to me in that it implies that the VPD
> > area is not necessarily all 0's as I was expecting.  A "Large VPD tag"
> > is denoted by the most significant bit of the first byte being set.
> > However, the value denoted - 0x7f - does not have the MSB set so now
> > I'm a bit confused (perhaps the MSB is stripped off - I'll need to
> > look at the related code).  Either way, its looking like perhaps the
> > VPD area is not all 0's so I'd really be interested in knowing what is
> > there.
> > 
> > Ok, looking at the code - ./drivers/pci/access.c::pci_vpd_size() - the
> > tag value output - 7f - had its MSB stripped off via
> > 'pci_vpd_lrdt_tag()'.  So what must have been read for the first byte
> > was 0xff.
> > 
> > > > The new kernel behavior is correct in its complaint as the device is
> > > > advertising that it has VPD via a 'capability' yet its VPD data area
> > > > is likely all 0's.  This is a device error - specifically an error
> > > > with the device's firmware: either it should _not_ advertise via the
> > > > 'capability' that it has VPD or, it s VPD area needs to conform to the
> > > > VPD data format as expressed in the "PCI Local Bus Specification, Rev.
> > > > 3.0" appendix I - "Vital Product Data".
> > > > 
> > > > With the new kernel behavior, effectively circumventing reading of an
> > > > invalid VPD area, you can't get to the data via "# xxd" (as shown
> > > > above).  You could run a kernel prior to the changes and run the "xxd"
> > > > command and likely see that the VPD area is all 0's.  I would be
> > > > interested in confirming such if you could do so please.
> > > 
> > > Please see the paste above. Please tell me if I missed something.
> > 
> > I have to admit I'm a bit surprised at this point (that the VPD area
> > is not all 0's as I was expecting).  The only way we'll know is to
> > obtain the VPD area - if you can get that via 'xxd' it would be
> > interesting to see what is there exactly.
> 
> Checking with xxd under Linux 3.19, the VPD area is all 0xff. Please
> find the file obtained by the command below attached.
> 
> ```
> $ sudo cp /sys/devices/pci0000:00/0000:00:15.1/0000:03:00.0/vpd /tmp/20160521--realtek-vpd.bin
> $ ls -l /tmp/20160521--realtek-vpd.bin
> -rw------- 1 root root 32768 Mai 21 20:49 /tmp/20160521--realtek-vpd.bin
> ```
> 
> > > So what to do about this? Contact Realtek? Add a workaround?
> > 
> > The VPD area is almost certainly malformed (or doesn't exist at all).
> > As a colleague of mine noted today, LSI devices seem to be especially
> > negligent in this manner (see commit 7c20078 ("PCI: Prevent VPD access
> > for buggy devices")).  Assuming the VPD area of this device is at best
> > invalid then the kernel is correct in its current behavior - not much
> > more it can do.  While it would be good for the device's firmware to
> > be upgraded to either not advertise it has VPD or to put in valid VPD,
> > I very much doubt you'd get any traction there.  You could try adding
> > this device's signature (device and vendor id) to the 7c20078 quirk
> > and see if there is any subtle difference.  If there was, it'd be very
> > subtle at best.
> > 
> > All that aside, you are just questioning the error output - correct
> > (i.e. you are not expecting, nor needing, valid VPD content for some
> > reason and questioning that I assume)?
> 
> Correct. I was just surprised by the error by lspci, as I hadn’t seen
> it before.

Were you able to look at the VPD data I sent? If it’s not following the
standard, could you please confirm that, and tell me, what I could
report to the manufacturer?


Thanks,

Paul


> > > > > > > [1] https://lkml.org/lkml/2016/4/15/649
> > > > > [2] http://review.coreboot.org/cgit/board-status.git/tree/asrock/e350m1

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

     prev parent reply	other threads:[~2016-06-11 12:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-15 16:56 `pcilib: sysfs_read_vpd: read failed: Input/output error` on ASRock E350M1 Paul Menzel
2016-05-16 18:02 ` Myron Stowe
2016-05-16 20:37   ` Paul Menzel
2016-05-17  1:42     ` Myron Stowe
2016-05-17 21:21       ` Paul Menzel
2016-05-18  1:52         ` Myron Stowe
2016-05-21 20:18           ` Paul Menzel
2016-06-11 12:12             ` Paul Menzel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1465647122.2279.18.camel@users.sourceforge.net \
    --to=paulepanter@users.sourceforge.net \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=myron.stowe@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).