netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
@ 2018-09-26 19:29 Chris Preimesberger
  2018-09-26 19:44 ` Andrew Lunn
  2018-09-26 21:34 ` Neil Horman
  0 siblings, 2 replies; 16+ messages in thread
From: Chris Preimesberger @ 2018-09-26 19:29 UTC (permalink / raw)
  To: linville@tuxdriver.com, netdev@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 24434 bytes --]

Hello,

I'm re-sending in plain text per the auto-reply from a spam filter.  I have attached some text files this time, which explain the situation below, in case the below email's font & formatting is now too messed up for easy comprehension.

Thank you and best regards.


Chris Preimesberger | Test & Validation Engineer
Transition Networks, Inc.

chrisp@transition.com
direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com
________________________________________





From: Chris Preimesberger 
Sent: Wednesday, September 26, 2018 2:14 PM
To: 'linville@tuxdriver.com'; 'netdev@vger.kernel.org'
Subject: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers

Hello John, All,


I think I may have found a bug or two in ethtool, with respect to its reporting of a QSFP28 transceiver's diagnostic information.  Ethtool seems to correctly report all diagnostic information about QSFP28 transceivers, except for the transceiver's warning and alarm thresholds.  I'm not sure whether the spurious warning and alarm values that get reported are the fault of ethtool or my NIC/driver, and I have no other models of 100GbE NICs to test with.  I've contacted Mellanox support about this, and they point the finger at ethtool.  Can these issues be investigated by ethtool developers?  Here is some background information about the equipment and software used when I observe these issues:

Equipment used:
NIC: Mellanox ConnectX-4 100GbE, part number MCX415A-CCAT
Transceiver: Any 40Gb or 100Gb QSFP28 transceiver installed in the NIC (Intel, Mellanox, Transition Networks, etc..)

Software used:
Ubuntu 18.04 with the distro's packaged NIC driver and ethtool v4.15
also tested were ethtool v4.18 compiled from source and the current Mellanox OFED driver.

All test scenarios produced the same bugs.


Bug #1.  Ethtool's reporting of the installed transceiver's alarm and warning thresholds will differ, depending on whether or not ethtool is piped to another command.  Example commands are below, with their respective differing output values highlighted:


tech1@D8:~$ sudo ethtool -m enp1s0
        Identifier                                : 0x11 (QSFP28)
        Extended identifier                       : 0xfc
        Extended identifier description           : 3.5W max. Power consumption
        Extended identifier description           : CDR present in TX, CDR present in RX
        Extended identifier description           : High Power Class (> 3.5 W) not enabled
        Connector                                 : 0x07 (LC)
        Transceiver codes                         : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 100G Ethernet: 100G CWDM4 MSA with FEC
        Encoding                                  : 0x03 (NRZ)
        BR, Nominal                               : 25500Mbps
        Rate identifier                           : 0x00
        Length (SMF,km)                           : 2km
        Length (OM3 50um)                         : 0m
        Length (OM2 50um)                         : 0m
        Length (OM1 62.5um)                       : 0m
        Length (Copper or Active cable)           : 0m
        Transmitter technology                    : 0x40 (1310 nm DFB)
        Laser wavelength                          : 1310.000nm
        Laser wavelength tolerance                : 47.500nm
        Vendor name                               : TRANSITION
        Vendor OUI                                : 00:c0:f2
        Vendor PN                                 : TNQSFP100GCWDM4
        Vendor rev                                : 1A
        Vendor SN                                 : TN02000302
        Date code                                 : 180919
        Revision Compliance                       : SFF-8636 Rev 2.5/2.6/2.7
        Module temperature                        : 39.53 degrees C / 103.15 degrees F
        Module voltage                            : 3.3241 V
        Alarm/warning flags implemented           : Yes
        Laser tx bias current (Channel 1)         : 34.432 mA
        Laser tx bias current (Channel 2)         : 34.432 mA
        Laser tx bias current (Channel 3)         : 33.408 mA
        Laser tx bias current (Channel 4)         : 33.920 mA
        Transmit avg optical power (Channel 1)    : 0.9048 mW / -0.43 dBm
        Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
        Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
        Transmit avg optical power (Channel 4)    : 0.7014 mW / -1.54 dBm
        Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
        Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
        Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
        Rcvr signal avg optical power(Channel 4)  : 0.6847 mW / -1.64 dBm
        Laser bias current high alarm   (Chan 1)  : Off
        Laser bias current low alarm    (Chan 1)  : Off
        Laser bias current high warning (Chan 1)  : Off
        Laser bias current low warning  (Chan 1)  : Off
        Laser bias current high alarm   (Chan 2)  : Off
        Laser bias current low alarm    (Chan 2)  : Off
        Laser bias current high warning (Chan 2)  : Off
        Laser bias current low warning  (Chan 2)  : Off
        Laser bias current high alarm   (Chan 3)  : Off
        Laser bias current low alarm    (Chan 3)  : Off
        Laser bias current high warning (Chan 3)  : Off
        Laser bias current low warning  (Chan 3)  : Off
        Laser bias current high alarm   (Chan 4)  : Off
        Laser bias current low alarm    (Chan 4)  : Off
        Laser bias current high warning (Chan 4)  : Off
        Laser bias current low warning  (Chan 4)  : Off
        Module temperature high alarm             : Off
        Module temperature low alarm              : Off
        Module temperature high warning           : Off
        Module temperature low warning            : Off
        Module voltage high alarm                 : Off
        Module voltage low alarm                  : Off
        Module voltage high warning               : Off
        Module voltage low warning                : Off
        Laser tx power high alarm   (Channel 1)   : Off
        Laser tx power low alarm    (Channel 1)   : Off
        Laser tx power high warning (Channel 1)   : Off
        Laser tx power low warning  (Channel 1)   : Off
        Laser tx power high alarm   (Channel 2)   : Off
        Laser tx power low alarm    (Channel 2)   : Off
        Laser tx power high warning (Channel 2)   : Off
        Laser tx power low warning  (Channel 2)   : Off
        Laser tx power high alarm   (Channel 3)   : Off
        Laser tx power low alarm    (Channel 3)   : Off
        Laser tx power high warning (Channel 3)   : Off
        Laser tx power low warning  (Channel 3)   : Off
        Laser tx power high alarm   (Channel 4)   : Off
        Laser tx power low alarm    (Channel 4)   : Off
        Laser tx power high warning (Channel 4)   : Off
        Laser tx power low warning  (Channel 4)   : Off
        Laser rx power high alarm   (Channel 1)   : Off
        Laser rx power low alarm    (Channel 1)   : Off
        Laser rx power high warning (Channel 1)   : Off
        Laser rx power low warning  (Channel 1)   : Off
        Laser rx power high alarm   (Channel 2)   : Off
        Laser rx power low alarm    (Channel 2)   : Off
        Laser rx power high warning (Channel 2)   : Off
        Laser rx power low warning  (Channel 2)   : Off
        Laser rx power high alarm   (Channel 3)   : Off
        Laser rx power low alarm    (Channel 3)   : Off
        Laser rx power high warning (Channel 3)   : Off
        Laser rx power low warning  (Channel 3)   : Off
        Laser rx power high alarm   (Channel 4)   : Off
        Laser rx power low alarm    (Channel 4)   : Off
        Laser rx power high warning (Channel 4)   : Off
        Laser rx power low warning  (Channel 4)   : Off
        Laser bias current high alarm threshold   : 0.000 mA
        Laser bias current low alarm threshold    : 0.000 mA
        Laser bias current high warning threshold : 0.000 mA
        Laser bias current low warning threshold  : 0.000 mA
        Laser output power high alarm threshold   : 0.0000 mW / -inf dBm
        Laser output power low alarm threshold    : 0.0000 mW / -inf dBm
        Laser output power high warning threshold : 0.0000 mW / -inf dBm
        Laser output power low warning threshold  : 0.0000 mW / -inf dBm
        Module temperature high alarm threshold   : 0.00 degrees C / 32.00 degrees F
        Module temperature low alarm threshold    : 0.00 degrees C / 32.00 degrees F
        Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F
        Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
        Module voltage high alarm threshold       : 0.0000 V
        Module voltage low alarm threshold        : 0.0000 V
        Module voltage high warning threshold     : 0.0000 V
        Module voltage low warning threshold      : 0.0000 V
        Laser rx power high alarm threshold       : 0.0000 mW / -inf dBm
        Laser rx power low alarm threshold        : 0.0000 mW / -inf dBm
        Laser rx power high warning threshold     : 0.0000 mW / -inf dBm
        Laser rx power low warning threshold      : 0.0000 mW / -inf dBm


tech1@D8:~$ sudo ethtool -m enp1s0 | cat
        Identifier                                : 0x11 (QSFP28)
        Extended identifier                       : 0xfc
        Extended identifier description           : 3.5W max. Power consumption
        Extended identifier description           : CDR present in TX, CDR present in RX
        Extended identifier description           : High Power Class (> 3.5 W) not enabled
        Connector                                 : 0x07 (LC)
        Transceiver codes                         : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 100G Ethernet: 100G CWDM4 MSA with FEC
        Encoding                                  : 0x03 (NRZ)
        BR, Nominal                               : 25500Mbps
        Rate identifier                           : 0x00
        Length (SMF,km)                           : 2km
        Length (OM3 50um)                         : 0m
        Length (OM2 50um)                         : 0m
        Length (OM1 62.5um)                       : 0m
        Length (Copper or Active cable)           : 0m
        Transmitter technology                    : 0x40 (1310 nm DFB)
        Laser wavelength                          : 1310.000nm
        Laser wavelength tolerance                : 47.500nm
        Vendor name                               : TRANSITION
        Vendor OUI                                : 00:c0:f2
        Vendor PN                                 : TNQSFP100GCWDM4
        Vendor rev                                : 1A
        Vendor SN                                 : TN02000302
        Date code                                 : 180919
        Revision Compliance                       : SFF-8636 Rev 2.5/2.6/2.7
        Module temperature                        : 39.53 degrees C / 103.15 degrees F
        Module voltage                            : 3.3249 V
        Alarm/warning flags implemented           : Yes
        Laser tx bias current (Channel 1)         : 34.432 mA
        Laser tx bias current (Channel 2)         : 34.432 mA
        Laser tx bias current (Channel 3)         : 33.408 mA
        Laser tx bias current (Channel 4)         : 33.920 mA
        Transmit avg optical power (Channel 1)    : 0.9043 mW / -0.44 dBm
        Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
        Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
        Transmit avg optical power (Channel 4)    : 0.7009 mW / -1.54 dBm
        Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
        Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
        Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
        Rcvr signal avg optical power(Channel 4)  : 0.6847 mW / -1.64 dBm
        Laser bias current high alarm   (Chan 1)  : Off
        Laser bias current low alarm    (Chan 1)  : Off
        Laser bias current high warning (Chan 1)  : Off
        Laser bias current low warning  (Chan 1)  : Off
        Laser bias current high alarm   (Chan 2)  : Off
        Laser bias current low alarm    (Chan 2)  : Off
        Laser bias current high warning (Chan 2)  : Off
        Laser bias current low warning  (Chan 2)  : Off
        Laser bias current high alarm   (Chan 3)  : Off
        Laser bias current low alarm    (Chan 3)  : Off
        Laser bias current high warning (Chan 3)  : Off
        Laser bias current low warning  (Chan 3)  : Off
        Laser bias current high alarm   (Chan 4)  : Off
        Laser bias current low alarm    (Chan 4)  : Off
        Laser bias current high warning (Chan 4)  : Off
        Laser bias current low warning  (Chan 4)  : Off
        Module temperature high alarm             : Off
        Module temperature low alarm              : Off
        Module temperature high warning           : Off
        Module temperature low warning            : Off
        Module voltage high alarm                 : Off
        Module voltage low alarm                  : Off
        Module voltage high warning               : Off
        Module voltage low warning                : Off
        Laser tx power high alarm   (Channel 1)   : Off
        Laser tx power low alarm    (Channel 1)   : Off
        Laser tx power high warning (Channel 1)   : Off
        Laser tx power low warning  (Channel 1)   : Off
        Laser tx power high alarm   (Channel 2)   : Off
        Laser tx power low alarm    (Channel 2)   : Off
        Laser tx power high warning (Channel 2)   : Off
        Laser tx power low warning  (Channel 2)   : Off
        Laser tx power high alarm   (Channel 3)   : Off
        Laser tx power low alarm    (Channel 3)   : Off
        Laser tx power high warning (Channel 3)   : Off
        Laser tx power low warning  (Channel 3)   : Off
        Laser tx power high alarm   (Channel 4)   : Off
        Laser tx power low alarm    (Channel 4)   : Off
        Laser tx power high warning (Channel 4)   : Off
        Laser tx power low warning  (Channel 4)   : Off
        Laser rx power high alarm   (Channel 1)   : Off
        Laser rx power low alarm    (Channel 1)   : Off
        Laser rx power high warning (Channel 1)   : Off
        Laser rx power low warning  (Channel 1)   : Off
        Laser rx power high alarm   (Channel 2)   : Off
        Laser rx power low alarm    (Channel 2)   : Off
        Laser rx power high warning (Channel 2)   : Off
        Laser rx power low warning  (Channel 2)   : Off
        Laser rx power high alarm   (Channel 3)   : Off
        Laser rx power low alarm    (Channel 3)   : Off
        Laser rx power high warning (Channel 3)   : Off
        Laser rx power low warning  (Channel 3)   : Off
        Laser rx power high alarm   (Channel 4)   : Off
        Laser rx power low alarm    (Channel 4)   : Off
        Laser rx power high warning (Channel 4)   : Off
        Laser rx power low warning  (Channel 4)   : Off
        Laser bias current high alarm threshold   : 16.448 mA
        Laser bias current low alarm threshold    : 16.448 mA
        Laser bias current high warning threshold : 16.448 mA
        Laser bias current low warning threshold  : 16.448 mA
        Laser output power high alarm threshold   : 0.8224 mW / -0.85 dBm
        Laser output power low alarm threshold    : 0.8250 mW / -0.84 dBm
        Laser output power high warning threshold : 0.8264 mW / -0.83 dBm
        Laser output power low warning threshold  : 2.6983 mW / 4.31 dBm
        Module temperature high alarm threshold   : 110.12 degrees C / 230.22 degrees F
        Module temperature low alarm threshold    : 84.34 degrees C / 183.82 degrees F
        Module temperature high warning threshold : 44.12 degrees C / 111.42 degrees F
        Module temperature low warning threshold  : 67.27 degrees C / 153.08 degrees F
        Module voltage high alarm threshold       : 2.9728 V
        Module voltage low alarm threshold        : 2.6990 V
        Module voltage high warning threshold     : 0.8274 V
        Module voltage low warning threshold      : 2.2538 V
        Laser rx power high alarm threshold       : 2.5458 mW / 4.06 dBm
        Laser rx power low alarm threshold        : 2.6992 mW / 4.31 dBm
        Laser rx power high warning threshold     : 2.9801 mW / 4.74 dBm
        Laser rx power low warning threshold      : 2.8526 mW / 4.55 dBm


Bug # 2. All of the alarm and warning threshold values reported in the above commands are spurious.
At first glance, one would assume that the threshold values reported by the piped ethtool command are correct, but they're not.  I know the programmed values for the above transceiver, so that makes it easy for me to spot the spurious values, but even without knowing the programmed values of a given transceiver, one can use logic to detect when the ethtool displayed values don't make sense.
For example, lets scrutinize the values for voltage warnings and alarms reported by ethtool on this transceiver.  We will look at each voltage threshold, and scrutinize that value relative to the other voltage thresholds, and look for contradictions to determine whether the reported values seem legit.  
                                Known           ethtool
                                Actual          Reported
         Values          Values
High Voltage Alarm              3.70V           2.9728 V
High Voltage Warning            3.59V           0.8274 V
(Operating spec = 3.30V)        
Low Voltage Warning             3.00V           2.2538 V
Low Voltage Alarm               2.90V           2.6990 V

Contradictions for the ethtool reported voltage warning and alarm thresholds:
1. The high voltage alarm should occur at higher voltage than the operating voltage, but ethtool didn't report that.
2. The high voltage warning should occur at higher voltage than the low voltage warning and alarm, but ethtool didn't report that.
3. The low voltage warning should occur at higher voltage than the low voltage alarm, but ethtool didn't report that.
4. The low voltage alarm should occur at a lower voltage than any of the other voltage warnings and alarms, but ethtool didn't report that.
5. The current voltage value was reported as 3.3249V, which should trigger high voltage warning and alarm, according to the reported thresholds, but no warnings or alarms are indicated.  
 
Each of the 4 voltage thresholds reported by ethtool have contradictions, so we know something is not right.  This same kind of logic can be applied to the thresholds for temperature, laser TX power, etc.. to find that those values are also spurious.


Installing the above transceiver in a Cisco switch reveals that the Cisco correctly retrieves the true warning and alarm threshold values from the transceiver's EEPROM, so we trust that the transceiver has been correctly programmed.  Cisco CLI output for that transceiver shown here:

switch# show interface ethernet 1/3 transceiver details 
Ethernet1/3
    transceiver is present
    type is QSFP-100G-CWDM4-MSA-FEC
    name is TRANSITION
    part number is TNQSFP100GCWDM4
    revision is 1A
    serial number is TN02000302
    nominal bitrate is 25500 MBit/sec per channel
    Link length supported for 9/125um fiber is 2 km
    cisco id is 17
    cisco extended id number is 252

Lane Number:1 Network Lane
           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
  Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
  Current       34.24 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
  Tx Power      -0.44 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
  Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
  Transmit Fault Count = 0
  ----------------------------------------------------------------------------
  Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning

Lane Number:2 Network Lane
           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
  Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
  Current       34.24 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
  Tx Power      -1.20 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
  Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
  Transmit Fault Count = 0
  ----------------------------------------------------------------------------
  Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning

Lane Number:3 Network Lane
           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
  Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
  Current       33.21 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
  Tx Power      -0.96 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
  Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
 Transmit Fault Count = 0
  ----------------------------------------------------------------------------
  Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning

Lane Number:4 Network Lane
           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
  Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
  Current       33.72 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
  Tx Power      -1.59 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
  Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
  Transmit Fault Count = 0
  ----------------------------------------------------------------------------
  Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning

switch#


Any help with these issues is greatly appreciated.  If you have any questions or advice, please let me know.  I'll be glad to continue troubleshooting this until it's resolved.  Thank you.    


Chris Preimesberger | Test & Validation Engineer
Transition Networks, Inc.

chrisp@transition.com
direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com
________________________________________








[-- Attachment #2: ethtoolQSFP28thresholdsCiscoComparison.txt --]
[-- Type: text/plain, Size: 4602 bytes --]

For comparison to ethtool's output that shows incorrect threshold values, when installing the same transceiver in a Cisco Nexus switch, and issuing the Cisco command "show interface ethernet 1/3 transceiver details", the switch correctly correctly reads/displays the transceiver's Alarm and Warning thresholds, as shown below:


switch# show interface ethernet 1/3 transceiver details 
Ethernet1/3
    transceiver is present
    type is QSFP-100G-CWDM4-MSA-FEC
    name is TRANSITION
    part number is TNQSFP100GCWDM4
    revision is 1A
    serial number is TN02000302
    nominal bitrate is 25500 MBit/sec per channel
    Link length supported for 9/125um fiber is 2 km
    cisco id is 17
    cisco extended id number is 252

Lane Number:1 Network Lane
           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
  Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
  Current       34.24 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
  Tx Power      -0.44 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
  Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
  Transmit Fault Count = 0
  ----------------------------------------------------------------------------
  Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning

Lane Number:2 Network Lane
           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
  Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
  Current       34.24 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
  Tx Power      -1.20 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
  Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
  Transmit Fault Count = 0
  ----------------------------------------------------------------------------
  Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning

Lane Number:3 Network Lane
           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
  Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
  Current       33.21 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
  Tx Power      -0.96 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
  Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
  Transmit Fault Count = 0
  ----------------------------------------------------------------------------
  Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning

Lane Number:4 Network Lane
           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
  Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
  Current       33.72 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
  Tx Power      -1.59 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
  Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
  Transmit Fault Count = 0
  ----------------------------------------------------------------------------
  Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning

switch# 


[-- Attachment #3: ethtoolQSFP28thresholdsExpectedOutput.txt --]
[-- Type: text/plain, Size: 7258 bytes --]


Look at each line in the ethtool output below that includes the word "threshold".  This file has been hand-edited to show the threshold values that have been programmed into the transceiver, which should be displayed by ethtool.  The threshold values shown below are copied and pasted from the output of the Cisco NX-OS command "show interface ethernet 1/3 transceiver details", while the transceiver was installed in a Cisco Nexus switch.

Note - I only copied the threshold values in the units that were displayed by the Cisco switch.  The "?" symbols are just a placeholder for the converted values; I was too lazy to do conversions between dBm and mW, or between degrees C and degrees F.  Ethtool would be expected to report the true / converted values.




tech1@D8:~$ sudo ethtool -m enp1s0
	Identifier                                : 0x11 (QSFP28)
	Extended identifier                       : 0xfc
	Extended identifier description           : 3.5W max. Power consumption
	Extended identifier description           : CDR present in TX, CDR present in RX
	Extended identifier description           : High Power Class (> 3.5 W) not enabled
	Connector                                 : 0x07 (LC)
	Transceiver codes                         : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
	Transceiver type                          : 100G Ethernet: 100G CWDM4 MSA with FEC
	Encoding                                  : 0x03 (NRZ)
	BR, Nominal                               : 25500Mbps
	Rate identifier                           : 0x00
	Length (SMF,km)                           : 2km
	Length (OM3 50um)                         : 0m
	Length (OM2 50um)                         : 0m
	Length (OM1 62.5um)                       : 0m
	Length (Copper or Active cable)           : 0m
	Transmitter technology                    : 0x40 (1310 nm DFB)
	Laser wavelength                          : 1310.000nm
	Laser wavelength tolerance                : 47.500nm
	Vendor name                               : TRANSITION
	Vendor OUI                                : 00:c0:f2
	Vendor PN                                 : TNQSFP100GCWDM4
	Vendor rev                                : 1A
	Vendor SN                                 : TN02000302
	Date code                                 : 180919
	Revision Compliance                       : SFF-8636 Rev 2.5/2.6/2.7
	Module temperature                        : 39.53 degrees C / 103.15 degrees F
	Module voltage                            : 3.3233 V
	Alarm/warning flags implemented           : Yes
	Laser tx bias current (Channel 1)         : 34.432 mA
	Laser tx bias current (Channel 2)         : 34.432 mA
	Laser tx bias current (Channel 3)         : 33.408 mA
	Laser tx bias current (Channel 4)         : 33.920 mA
	Transmit avg optical power (Channel 1)    : 0.9052 mW / -0.43 dBm
	Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
	Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
	Transmit avg optical power (Channel 4)    : 0.7009 mW / -1.54 dBm
	Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
	Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
	Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
	Rcvr signal avg optical power(Channel 4)  : 0.6948 mW / -1.58 dBm
	Laser bias current high alarm   (Chan 1)  : Off
	Laser bias current low alarm    (Chan 1)  : Off
	Laser bias current high warning (Chan 1)  : Off
	Laser bias current low warning  (Chan 1)  : Off
	Laser bias current high alarm   (Chan 2)  : Off
	Laser bias current low alarm    (Chan 2)  : Off
	Laser bias current high warning (Chan 2)  : Off
	Laser bias current low warning  (Chan 2)  : Off
	Laser bias current high alarm   (Chan 3)  : Off
	Laser bias current low alarm    (Chan 3)  : Off
	Laser bias current high warning (Chan 3)  : Off
	Laser bias current low warning  (Chan 3)  : Off
	Laser bias current high alarm   (Chan 4)  : Off
	Laser bias current low alarm    (Chan 4)  : Off
	Laser bias current high warning (Chan 4)  : Off
	Laser bias current low warning  (Chan 4)  : Off
	Module temperature high alarm             : Off
	Module temperature low alarm              : Off
	Module temperature high warning           : Off
	Module temperature low warning            : Off
	Module voltage high alarm                 : Off
	Module voltage low alarm                  : Off
	Module voltage high warning               : Off
	Module voltage low warning                : Off
	Laser tx power high alarm   (Channel 1)   : Off
	Laser tx power low alarm    (Channel 1)   : Off
	Laser tx power high warning (Channel 1)   : Off
	Laser tx power low warning  (Channel 1)   : Off
	Laser tx power high alarm   (Channel 2)   : Off
	Laser tx power low alarm    (Channel 2)   : Off
	Laser tx power high warning (Channel 2)   : Off
	Laser tx power low warning  (Channel 2)   : Off
	Laser tx power high alarm   (Channel 3)   : Off
	Laser tx power low alarm    (Channel 3)   : Off
	Laser tx power high warning (Channel 3)   : Off
	Laser tx power low warning  (Channel 3)   : Off
	Laser tx power high alarm   (Channel 4)   : Off
	Laser tx power low alarm    (Channel 4)   : Off
	Laser tx power high warning (Channel 4)   : Off
	Laser tx power low warning  (Channel 4)   : Off
	Laser rx power high alarm   (Channel 1)   : Off
	Laser rx power low alarm    (Channel 1)   : Off
	Laser rx power high warning (Channel 1)   : Off
	Laser rx power low warning  (Channel 1)   : Off
	Laser rx power high alarm   (Channel 2)   : Off
	Laser rx power low alarm    (Channel 2)   : Off
	Laser rx power high warning (Channel 2)   : Off
	Laser rx power low warning  (Channel 2)   : Off
	Laser rx power high alarm   (Channel 3)   : Off
	Laser rx power low alarm    (Channel 3)   : Off
	Laser rx power high warning (Channel 3)   : Off
	Laser rx power low warning  (Channel 3)   : Off
	Laser rx power high alarm   (Channel 4)   : Off
	Laser rx power low alarm    (Channel 4)   : Off
	Laser rx power high warning (Channel 4)   : Off
	Laser rx power low warning  (Channel 4)   : Off
	Laser bias current high alarm threshold   : 75.000 mA
	Laser bias current low alarm threshold    : 10.000 mA
	Laser bias current high warning threshold : 70.000 mA
	Laser bias current low warning threshold  : 15.000 mA
	Laser output power high alarm threshold   : ? mW / 4.49 dBm
	Laser output power low alarm threshold    : ? mW / -8.50 dBm
	Laser output power high warning threshold : ? mW / 3.49 dBm
	Laser output power low warning threshold  : ? mW / -7.52 dBm
	Module temperature high alarm threshold   : 80.00 degrees C / ? degrees F
	Module temperature low alarm threshold    : -10.00 degrees C / ? degrees F
	Module temperature high warning threshold : 75.00 degrees C / ? degrees F
	Module temperature low warning threshold  : -5.00 degrees C / ? degrees F
	Module voltage high alarm threshold       : 3.7000 V
	Module voltage low alarm threshold        : 2.9000 V
	Module voltage high warning threshold     : 3.5900 V
	Module voltage low warning threshold      : 3.0000 V
	Laser rx power high alarm threshold       : ? mW / 4.49 dBm
	Laser rx power low alarm threshold        : ? mW / -14.55 dBm
	Laser rx power high warning threshold     : ? mW / 3.49 dBm
	Laser rx power low warning threshold      : ? mW / -12.51 dBm


[-- Attachment #4: ethtoolQSFP28thresholdsSpuriousOutput1of2.txt --]
[-- Type: text/plain, Size: 6843 bytes --]


Look at each line in the ethtool output below that includes the word "threshold".  This file shows the actual output from ethtool v4.18, when the output is not piped to another command.  Notice that all of the displayed threshold values are 0 (which is incorrect), while other values report as expected.

tech1@D8:~$ sudo ethtool -m enp1s0
	Identifier                                : 0x11 (QSFP28)
	Extended identifier                       : 0xfc
	Extended identifier description           : 3.5W max. Power consumption
	Extended identifier description           : CDR present in TX, CDR present in RX
	Extended identifier description           : High Power Class (> 3.5 W) not enabled
	Connector                                 : 0x07 (LC)
	Transceiver codes                         : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
	Transceiver type                          : 100G Ethernet: 100G CWDM4 MSA with FEC
	Encoding                                  : 0x03 (NRZ)
	BR, Nominal                               : 25500Mbps
	Rate identifier                           : 0x00
	Length (SMF,km)                           : 2km
	Length (OM3 50um)                         : 0m
	Length (OM2 50um)                         : 0m
	Length (OM1 62.5um)                       : 0m
	Length (Copper or Active cable)           : 0m
	Transmitter technology                    : 0x40 (1310 nm DFB)
	Laser wavelength                          : 1310.000nm
	Laser wavelength tolerance                : 47.500nm
	Vendor name                               : TRANSITION
	Vendor OUI                                : 00:c0:f2
	Vendor PN                                 : TNQSFP100GCWDM4
	Vendor rev                                : 1A
	Vendor SN                                 : TN02000302
	Date code                                 : 180919
	Revision Compliance                       : SFF-8636 Rev 2.5/2.6/2.7
	Module temperature                        : 39.53 degrees C / 103.15 degrees F
	Module voltage                            : 3.3241 V
	Alarm/warning flags implemented           : Yes
	Laser tx bias current (Channel 1)         : 34.432 mA
	Laser tx bias current (Channel 2)         : 34.432 mA
	Laser tx bias current (Channel 3)         : 33.408 mA
	Laser tx bias current (Channel 4)         : 33.920 mA
	Transmit avg optical power (Channel 1)    : 0.9048 mW / -0.43 dBm
	Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
	Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
	Transmit avg optical power (Channel 4)    : 0.7014 mW / -1.54 dBm
	Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
	Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
	Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
	Rcvr signal avg optical power(Channel 4)  : 0.6847 mW / -1.64 dBm
	Laser bias current high alarm   (Chan 1)  : Off
	Laser bias current low alarm    (Chan 1)  : Off
	Laser bias current high warning (Chan 1)  : Off
	Laser bias current low warning  (Chan 1)  : Off
	Laser bias current high alarm   (Chan 2)  : Off
	Laser bias current low alarm    (Chan 2)  : Off
	Laser bias current high warning (Chan 2)  : Off
	Laser bias current low warning  (Chan 2)  : Off
	Laser bias current high alarm   (Chan 3)  : Off
	Laser bias current low alarm    (Chan 3)  : Off
	Laser bias current high warning (Chan 3)  : Off
	Laser bias current low warning  (Chan 3)  : Off
	Laser bias current high alarm   (Chan 4)  : Off
	Laser bias current low alarm    (Chan 4)  : Off
	Laser bias current high warning (Chan 4)  : Off
	Laser bias current low warning  (Chan 4)  : Off
	Module temperature high alarm             : Off
	Module temperature low alarm              : Off
	Module temperature high warning           : Off
	Module temperature low warning            : Off
	Module voltage high alarm                 : Off
	Module voltage low alarm                  : Off
	Module voltage high warning               : Off
	Module voltage low warning                : Off
	Laser tx power high alarm   (Channel 1)   : Off
	Laser tx power low alarm    (Channel 1)   : Off
	Laser tx power high warning (Channel 1)   : Off
	Laser tx power low warning  (Channel 1)   : Off
	Laser tx power high alarm   (Channel 2)   : Off
	Laser tx power low alarm    (Channel 2)   : Off
	Laser tx power high warning (Channel 2)   : Off
	Laser tx power low warning  (Channel 2)   : Off
	Laser tx power high alarm   (Channel 3)   : Off
	Laser tx power low alarm    (Channel 3)   : Off
	Laser tx power high warning (Channel 3)   : Off
	Laser tx power low warning  (Channel 3)   : Off
	Laser tx power high alarm   (Channel 4)   : Off
	Laser tx power low alarm    (Channel 4)   : Off
	Laser tx power high warning (Channel 4)   : Off
	Laser tx power low warning  (Channel 4)   : Off
	Laser rx power high alarm   (Channel 1)   : Off
	Laser rx power low alarm    (Channel 1)   : Off
	Laser rx power high warning (Channel 1)   : Off
	Laser rx power low warning  (Channel 1)   : Off
	Laser rx power high alarm   (Channel 2)   : Off
	Laser rx power low alarm    (Channel 2)   : Off
	Laser rx power high warning (Channel 2)   : Off
	Laser rx power low warning  (Channel 2)   : Off
	Laser rx power high alarm   (Channel 3)   : Off
	Laser rx power low alarm    (Channel 3)   : Off
	Laser rx power high warning (Channel 3)   : Off
	Laser rx power low warning  (Channel 3)   : Off
	Laser rx power high alarm   (Channel 4)   : Off
	Laser rx power low alarm    (Channel 4)   : Off
	Laser rx power high warning (Channel 4)   : Off
	Laser rx power low warning  (Channel 4)   : Off
	Laser bias current high alarm threshold   : 0.000 mA
	Laser bias current low alarm threshold    : 0.000 mA
	Laser bias current high warning threshold : 0.000 mA
	Laser bias current low warning threshold  : 0.000 mA
	Laser output power high alarm threshold   : 0.0000 mW / -inf dBm
	Laser output power low alarm threshold    : 0.0000 mW / -inf dBm
	Laser output power high warning threshold : 0.0000 mW / -inf dBm
	Laser output power low warning threshold  : 0.0000 mW / -inf dBm
	Module temperature high alarm threshold   : 0.00 degrees C / 32.00 degrees F
	Module temperature low alarm threshold    : 0.00 degrees C / 32.00 degrees F
	Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F
	Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
	Module voltage high alarm threshold       : 0.0000 V
	Module voltage low alarm threshold        : 0.0000 V
	Module voltage high warning threshold     : 0.0000 V
	Module voltage low warning threshold      : 0.0000 V
	Laser rx power high alarm threshold       : 0.0000 mW / -inf dBm
	Laser rx power low alarm threshold        : 0.0000 mW / -inf dBm
	Laser rx power high warning threshold     : 0.0000 mW / -inf dBm
	Laser rx power low warning threshold      : 0.0000 mW / -inf dBm



[-- Attachment #5: ethtoolQSFP28thresholdsSpuriousOutput2of2.txt --]
[-- Type: text/plain, Size: 6866 bytes --]


Look at each line in the ethtool output below that includes the word "threshold".  This file shows the actual output from ethtool v4.18, when the ethtool output is piped to another command.  Notice that all of the displayed threshold values are spurious while other values report as expected.

tech1@D8:~$ sudo ethtool -m enp1s0 | cat
	Identifier                                : 0x11 (QSFP28)
	Extended identifier                       : 0xfc
	Extended identifier description           : 3.5W max. Power consumption
	Extended identifier description           : CDR present in TX, CDR present in RX
	Extended identifier description           : High Power Class (> 3.5 W) not enabled
	Connector                                 : 0x07 (LC)
	Transceiver codes                         : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
	Transceiver type                          : 100G Ethernet: 100G CWDM4 MSA with FEC
	Encoding                                  : 0x03 (NRZ)
	BR, Nominal                               : 25500Mbps
	Rate identifier                           : 0x00
	Length (SMF,km)                           : 2km
	Length (OM3 50um)                         : 0m
	Length (OM2 50um)                         : 0m
	Length (OM1 62.5um)                       : 0m
	Length (Copper or Active cable)           : 0m
	Transmitter technology                    : 0x40 (1310 nm DFB)
	Laser wavelength                          : 1310.000nm
	Laser wavelength tolerance                : 47.500nm
	Vendor name                               : TRANSITION
	Vendor OUI                                : 00:c0:f2
	Vendor PN                                 : TNQSFP100GCWDM4
	Vendor rev                                : 1A
	Vendor SN                                 : TN02000302
	Date code                                 : 180919
	Revision Compliance                       : SFF-8636 Rev 2.5/2.6/2.7
	Module temperature                        : 39.53 degrees C / 103.15 degrees F
	Module voltage                            : 3.3249 V
	Alarm/warning flags implemented           : Yes
	Laser tx bias current (Channel 1)         : 34.432 mA
	Laser tx bias current (Channel 2)         : 34.432 mA
	Laser tx bias current (Channel 3)         : 33.408 mA
	Laser tx bias current (Channel 4)         : 33.920 mA
	Transmit avg optical power (Channel 1)    : 0.9043 mW / -0.44 dBm
	Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
	Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
	Transmit avg optical power (Channel 4)    : 0.7009 mW / -1.54 dBm
	Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
	Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
	Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
	Rcvr signal avg optical power(Channel 4)  : 0.6847 mW / -1.64 dBm
	Laser bias current high alarm   (Chan 1)  : Off
	Laser bias current low alarm    (Chan 1)  : Off
	Laser bias current high warning (Chan 1)  : Off
	Laser bias current low warning  (Chan 1)  : Off
	Laser bias current high alarm   (Chan 2)  : Off
	Laser bias current low alarm    (Chan 2)  : Off
	Laser bias current high warning (Chan 2)  : Off
	Laser bias current low warning  (Chan 2)  : Off
	Laser bias current high alarm   (Chan 3)  : Off
	Laser bias current low alarm    (Chan 3)  : Off
	Laser bias current high warning (Chan 3)  : Off
	Laser bias current low warning  (Chan 3)  : Off
	Laser bias current high alarm   (Chan 4)  : Off
	Laser bias current low alarm    (Chan 4)  : Off
	Laser bias current high warning (Chan 4)  : Off
	Laser bias current low warning  (Chan 4)  : Off
	Module temperature high alarm             : Off
	Module temperature low alarm              : Off
	Module temperature high warning           : Off
	Module temperature low warning            : Off
	Module voltage high alarm                 : Off
	Module voltage low alarm                  : Off
	Module voltage high warning               : Off
	Module voltage low warning                : Off
	Laser tx power high alarm   (Channel 1)   : Off
	Laser tx power low alarm    (Channel 1)   : Off
	Laser tx power high warning (Channel 1)   : Off
	Laser tx power low warning  (Channel 1)   : Off
	Laser tx power high alarm   (Channel 2)   : Off
	Laser tx power low alarm    (Channel 2)   : Off
	Laser tx power high warning (Channel 2)   : Off
	Laser tx power low warning  (Channel 2)   : Off
	Laser tx power high alarm   (Channel 3)   : Off
	Laser tx power low alarm    (Channel 3)   : Off
	Laser tx power high warning (Channel 3)   : Off
	Laser tx power low warning  (Channel 3)   : Off
	Laser tx power high alarm   (Channel 4)   : Off
	Laser tx power low alarm    (Channel 4)   : Off
	Laser tx power high warning (Channel 4)   : Off
	Laser tx power low warning  (Channel 4)   : Off
	Laser rx power high alarm   (Channel 1)   : Off
	Laser rx power low alarm    (Channel 1)   : Off
	Laser rx power high warning (Channel 1)   : Off
	Laser rx power low warning  (Channel 1)   : Off
	Laser rx power high alarm   (Channel 2)   : Off
	Laser rx power low alarm    (Channel 2)   : Off
	Laser rx power high warning (Channel 2)   : Off
	Laser rx power low warning  (Channel 2)   : Off
	Laser rx power high alarm   (Channel 3)   : Off
	Laser rx power low alarm    (Channel 3)   : Off
	Laser rx power high warning (Channel 3)   : Off
	Laser rx power low warning  (Channel 3)   : Off
	Laser rx power high alarm   (Channel 4)   : Off
	Laser rx power low alarm    (Channel 4)   : Off
	Laser rx power high warning (Channel 4)   : Off
	Laser rx power low warning  (Channel 4)   : Off
	Laser bias current high alarm threshold   : 16.448 mA
	Laser bias current low alarm threshold    : 16.448 mA
	Laser bias current high warning threshold : 16.448 mA
	Laser bias current low warning threshold  : 16.448 mA
	Laser output power high alarm threshold   : 0.8224 mW / -0.85 dBm
	Laser output power low alarm threshold    : 0.8250 mW / -0.84 dBm
	Laser output power high warning threshold : 0.8264 mW / -0.83 dBm
	Laser output power low warning threshold  : 2.6983 mW / 4.31 dBm
	Module temperature high alarm threshold   : 110.12 degrees C / 230.22 degrees F
	Module temperature low alarm threshold    : 84.34 degrees C / 183.82 degrees F
	Module temperature high warning threshold : 44.12 degrees C / 111.42 degrees F
	Module temperature low warning threshold  : 67.27 degrees C / 153.08 degrees F
	Module voltage high alarm threshold       : 2.9728 V
	Module voltage low alarm threshold        : 2.6990 V
	Module voltage high warning threshold     : 0.8274 V
	Module voltage low warning threshold      : 2.2538 V
	Laser rx power high alarm threshold       : 2.5458 mW / 4.06 dBm
	Laser rx power low alarm threshold        : 2.6992 mW / 4.31 dBm
	Laser rx power high warning threshold     : 2.9801 mW / 4.74 dBm
	Laser rx power low warning threshold      : 2.8526 mW / 4.55 dBm
tech1@D8:~$ 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-26 19:29 bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers Chris Preimesberger
@ 2018-09-26 19:44 ` Andrew Lunn
  2018-09-26 20:47   ` Chris Preimesberger
  2018-09-26 21:34 ` Neil Horman
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Lunn @ 2018-09-26 19:44 UTC (permalink / raw)
  To: Chris Preimesberger; +Cc: linville@tuxdriver.com, netdev@vger.kernel.org

On Wed, Sep 26, 2018 at 07:29:23PM +0000, Chris Preimesberger wrote:
> Hello,
> 
> I'm re-sending in plain text per the auto-reply from a spam filter.

Yep. no html obfustication accepted here. Please ASCII only please :-)

Please can you also wrap your lines at about 75 characters.

>  I have attached some text files this time, which explain the situation below, in case the below email's font & formatting is now too messed up for easy comprehension.

> Bug #1.  Ethtool's reporting of the installed transceiver's alarm and warning thresholds will differ, depending on whether or not ethtool is piped to another command.  Example commands are below, with their respective differing output values highlighted:

Could you dump the raw values. That will make it easier for us to
reproduce this issue, assuming it is ethtool, and not the kernel
driver.

Thanks
	Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-26 19:44 ` Andrew Lunn
@ 2018-09-26 20:47   ` Chris Preimesberger
  2018-09-26 21:46     ` Andrew Lunn
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Preimesberger @ 2018-09-26 20:47 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: linville@tuxdriver.com, netdev@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 3140 bytes --]

Hello Andrew,

Thank you for the quick response!!
Apologies in advance for my use of outlook and top-posting, etc...

I've run the raw option and the hex option, and pasted the results below.
Since the raw option printed strange characters on the CLI, I re-ran it,
Sending the output to a file (raw.txt) and attached that file as well.

Pasted from Ubuntu CLI:

tech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ sudo ethtool -m enp1s0 raw on
\x11UU$��pA`?�@�G\x10#
                 �\x12v\x01\x11��\x03�\x02@TRANSITION      ��TNQSFP100GCWDM4 1AfX%\x1cF?\x06?�TN02000301      180919  
    h�\x02I��_��'\x16��Ri=\x02`��Zntech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ sudo ethtool -m enp1s0 hex on
Offset		Values
------		------
0x0000:		11 00 00 0f 00 00 00 00 00 55 55 00 00 00 00 00 
0x0010:		00 00 00 00 00 00 24 e2 00 00 81 68 00 00 00 00 
0x0020:		00 00 00 00 00 00 00 00 00 00 41 60 3f e0 40 e0 
0x0030:		47 00 1f 10 0e 1e 0b f7 12 76 00 00 00 00 00 00 
0x0040:		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x0050:		00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 
0x0060:		00 00 00 00 00 00 00 00 00 00 1f 00 00 00 00 00 
0x0070:		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x0080:		11 fc 07 80 00 00 00 00 00 00 00 03 ff 00 02 00 
0x0090:		00 00 00 40 54 52 41 4e 53 49 54 49 4f 4e 20 20 
0x00a0:		20 20 20 20 00 00 c0 f2 54 4e 51 53 46 50 31 30 
0x00b0:		30 47 43 57 44 4d 34 20 31 41 66 58 25 1c 46 3f 
0x00c0:		06 00 3f d6 54 4e 30 32 30 30 30 33 30 31 20 20 
0x00d0:		20 20 20 20 31 38 30 39 31 39 20 20 0c 00 68 f3 
0x00e0:		00 00 02 49 80 a0 5f 1f de c9 27 16 f8 ae 52 69 
0x00f0:		3d 02 60 00 00 00 00 00 00 00 00 00 83 f4 5a 6e 
tech1@D7:~$ 
tech1@D7:~$ 






Chris Preimesberger | Test & Validation Engineer
Transition Networks, Inc.

chrisp@transition.com
direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com


-----Original Message-----
From: Andrew Lunn [mailto:andrew@lunn.ch] 
Sent: Wednesday, September 26, 2018 2:45 PM
To: Chris Preimesberger
Cc: linville@tuxdriver.com; netdev@vger.kernel.org
Subject: Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers

On Wed, Sep 26, 2018 at 07:29:23PM +0000, Chris Preimesberger wrote:
> Hello,
> 
> I'm re-sending in plain text per the auto-reply from a spam filter.

Yep. no html obfustication accepted here. Please ASCII only please :-)

Please can you also wrap your lines at about 75 characters.

>  I have attached some text files this time, which explain the situation below, in case the below email's font & formatting is now too messed up for easy comprehension.

> Bug #1.  Ethtool's reporting of the installed transceiver's alarm and warning thresholds will differ, depending on whether or not ethtool is piped to another command.  Example commands are below, with their respective differing output values highlighted:

Could you dump the raw values. That will make it easier for us to
reproduce this issue, assuming it is ethtool, and not the kernel
driver.

Thanks
	Andrew


[-- Attachment #2: raw.txt --]
[-- Type: text/plain, Size: 256 bytes --]

\x11\0\0\x0f\0\0\0\0\0UU\0\0\0\0\0\0\0\0\0\0\0$ò\0\0h\0\0\0\0\0\0\0\0\0\0\0\0\0\0A`?à@àG\0\x1f\x10\x0e\x1e\v÷\x12{\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\x1f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x11ü\a€\0\0\0\0\0\0\0\x03ÿ\0\x02\0\0\0\0@TRANSITION      \0\0ÀòTNQSFP100GCWDM4 1AfX%\x1cF?\x06\0?ÖTN02000301      180919  \f\0hó\0\0\x02I€ _\x1fÞÉ'\x16ø®Ri=\x02`\0\0\0\0\0\0\0\0\0ƒôZn

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-26 19:29 bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers Chris Preimesberger
  2018-09-26 19:44 ` Andrew Lunn
@ 2018-09-26 21:34 ` Neil Horman
  2018-09-26 21:58   ` Andrew Lunn
  2018-09-27 13:25   ` Eran Ben Elisha
  1 sibling, 2 replies; 16+ messages in thread
From: Neil Horman @ 2018-09-26 21:34 UTC (permalink / raw)
  To: Chris Preimesberger; +Cc: linville@tuxdriver.com, netdev@vger.kernel.org

On Wed, Sep 26, 2018 at 07:29:23PM +0000, Chris Preimesberger wrote:
> Hello,
> 
> I'm re-sending in plain text per the auto-reply from a spam filter.  I have attached some text files this time, which explain the situation below, in case the below email's font & formatting is now too messed up for easy comprehension.
> 
> Thank you and best regards.
> 
> 
> Chris Preimesberger | Test & Validation Engineer
> Transition Networks, Inc.
> 
> chrisp@transition.com
> direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com
> ________________________________________
> 
> 
> 
This is just a drive by guess, but I think this is a driver issue.  


Issue 1 seems like a red herring, cat doesn't modify output, nor does ethtool
know if its output is going to a console or a pipe, its all the same.  And given
issue 2 (that the output of the thresholds, etc are spurriously changing and
wrong), suggests that they are spurriously changing and wrong regardless of what
cat does.

That said, I think issue two is a problem with the mlx4 driver.  Specifically
that the driver is copying garbage data.

The three ethtool functions at work here are:
mlx4_en_get_module_info
mlx4_en_get_module_eeprom
mlx4_get_module_info

When you run ethtool -m on this driver, the kernel calls mlx4_en_get_module_info
to determine the length of the eeprom, and that value will be either 256 or 512
bytes.  Lets assume that the value is 256 for the sake of argument

Next it calls mlx4_en_get_module_eeprom, passing in that size 256 to actually
read the eeprom data, which in turn calls mlx4_get_module_info to fetch the data
from hardware, again, passing in 256 as the size for the first call (theres a
loop, but it will only get executed once in this scenario)

mlx4_get_module_info then issues the appropriate mailbox commands to dump the
eeprom.  Here it starts to go sideways.  The mailbox buffer allocated for the
return data is of type mlx4_mad_ifc, which has some front matter information and
a data buffer that is 192 bytes long!

A little further down in the function, size gets restricted if the buffer
crosses a page boundary, but given that the size is 256 on the first call here,
and offset is zero on the first call, we're not crossing anything, so size
remains unchanged.

The output mailbox buffer outmad->data (a 192 byte array), then gets cast to a
sturt mlx4_cable_info structure, which has its own internal data buffer that is
only 48 bytes long.

The memcpy in this functionthen copies cable_info->data to the buffer that gets
returned to ethtool, but it copies size bytes (256), even though the source data
buffer is only 48 bytes long.  That 48 byte array is embedded in the larger 192
byte structure, so there won't be a panic on the overrun, but theres no telling
what garbage is in the buffer beyond those first 48 bytes.  Even if the
remaining 144 bytes have valid eeprom data, its less than the required 256
bytes.  The additional copy may cause a panic, but if the buffer commonly bumps
up against other allocated memory, that will go unnoticed.

after the memcpy, mlx4_get_module_info just returns the size of the passed in
buffer (256), and so the calling function thinks its work is done, and lets  the
kernel send back the buffer with garbage data to ethtool.

I think the mlx4 guys have some work to do here.

My $0.02
Neil

> 
> 
> From: Chris Preimesberger 
> Sent: Wednesday, September 26, 2018 2:14 PM
> To: 'linville@tuxdriver.com'; 'netdev@vger.kernel.org'
> Subject: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
> 
> Hello John, All,
> 
> 
> I think I may have found a bug or two in ethtool, with respect to its reporting of a QSFP28 transceiver's diagnostic information.  Ethtool seems to correctly report all diagnostic information about QSFP28 transceivers, except for the transceiver's warning and alarm thresholds.  I'm not sure whether the spurious warning and alarm values that get reported are the fault of ethtool or my NIC/driver, and I have no other models of 100GbE NICs to test with.  I've contacted Mellanox support about this, and they point the finger at ethtool.  Can these issues be investigated by ethtool developers?  Here is some background information about the equipment and software used when I observe these issues:
> 
> Equipment used:
> NIC: Mellanox ConnectX-4 100GbE, part number MCX415A-CCAT
> Transceiver: Any 40Gb or 100Gb QSFP28 transceiver installed in the NIC (Intel, Mellanox, Transition Networks, etc..)
> 
> Software used:
> Ubuntu 18.04 with the distro's packaged NIC driver and ethtool v4.15
> also tested were ethtool v4.18 compiled from source and the current Mellanox OFED driver.
> 
> All test scenarios produced the same bugs.
> 
> 
> Bug #1.  Ethtool's reporting of the installed transceiver's alarm and warning thresholds will differ, depending on whether or not ethtool is piped to another command.  Example commands are below, with their respective differing output values highlighted:
> 
> 
> tech1@D8:~$ sudo ethtool -m enp1s0
>         Identifier                                : 0x11 (QSFP28)
>         Extended identifier                       : 0xfc
>         Extended identifier description           : 3.5W max. Power consumption
>         Extended identifier description           : CDR present in TX, CDR present in RX
>         Extended identifier description           : High Power Class (> 3.5 W) not enabled
>         Connector                                 : 0x07 (LC)
>         Transceiver codes                         : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>         Transceiver type                          : 100G Ethernet: 100G CWDM4 MSA with FEC
>         Encoding                                  : 0x03 (NRZ)
>         BR, Nominal                               : 25500Mbps
>         Rate identifier                           : 0x00
>         Length (SMF,km)                           : 2km
>         Length (OM3 50um)                         : 0m
>         Length (OM2 50um)                         : 0m
>         Length (OM1 62.5um)                       : 0m
>         Length (Copper or Active cable)           : 0m
>         Transmitter technology                    : 0x40 (1310 nm DFB)
>         Laser wavelength                          : 1310.000nm
>         Laser wavelength tolerance                : 47.500nm
>         Vendor name                               : TRANSITION
>         Vendor OUI                                : 00:c0:f2
>         Vendor PN                                 : TNQSFP100GCWDM4
>         Vendor rev                                : 1A
>         Vendor SN                                 : TN02000302
>         Date code                                 : 180919
>         Revision Compliance                       : SFF-8636 Rev 2.5/2.6/2.7
>         Module temperature                        : 39.53 degrees C / 103.15 degrees F
>         Module voltage                            : 3.3241 V
>         Alarm/warning flags implemented           : Yes
>         Laser tx bias current (Channel 1)         : 34.432 mA
>         Laser tx bias current (Channel 2)         : 34.432 mA
>         Laser tx bias current (Channel 3)         : 33.408 mA
>         Laser tx bias current (Channel 4)         : 33.920 mA
>         Transmit avg optical power (Channel 1)    : 0.9048 mW / -0.43 dBm
>         Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
>         Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
>         Transmit avg optical power (Channel 4)    : 0.7014 mW / -1.54 dBm
>         Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
>         Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
>         Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
>         Rcvr signal avg optical power(Channel 4)  : 0.6847 mW / -1.64 dBm
>         Laser bias current high alarm   (Chan 1)  : Off
>         Laser bias current low alarm    (Chan 1)  : Off
>         Laser bias current high warning (Chan 1)  : Off
>         Laser bias current low warning  (Chan 1)  : Off
>         Laser bias current high alarm   (Chan 2)  : Off
>         Laser bias current low alarm    (Chan 2)  : Off
>         Laser bias current high warning (Chan 2)  : Off
>         Laser bias current low warning  (Chan 2)  : Off
>         Laser bias current high alarm   (Chan 3)  : Off
>         Laser bias current low alarm    (Chan 3)  : Off
>         Laser bias current high warning (Chan 3)  : Off
>         Laser bias current low warning  (Chan 3)  : Off
>         Laser bias current high alarm   (Chan 4)  : Off
>         Laser bias current low alarm    (Chan 4)  : Off
>         Laser bias current high warning (Chan 4)  : Off
>         Laser bias current low warning  (Chan 4)  : Off
>         Module temperature high alarm             : Off
>         Module temperature low alarm              : Off
>         Module temperature high warning           : Off
>         Module temperature low warning            : Off
>         Module voltage high alarm                 : Off
>         Module voltage low alarm                  : Off
>         Module voltage high warning               : Off
>         Module voltage low warning                : Off
>         Laser tx power high alarm   (Channel 1)   : Off
>         Laser tx power low alarm    (Channel 1)   : Off
>         Laser tx power high warning (Channel 1)   : Off
>         Laser tx power low warning  (Channel 1)   : Off
>         Laser tx power high alarm   (Channel 2)   : Off
>         Laser tx power low alarm    (Channel 2)   : Off
>         Laser tx power high warning (Channel 2)   : Off
>         Laser tx power low warning  (Channel 2)   : Off
>         Laser tx power high alarm   (Channel 3)   : Off
>         Laser tx power low alarm    (Channel 3)   : Off
>         Laser tx power high warning (Channel 3)   : Off
>         Laser tx power low warning  (Channel 3)   : Off
>         Laser tx power high alarm   (Channel 4)   : Off
>         Laser tx power low alarm    (Channel 4)   : Off
>         Laser tx power high warning (Channel 4)   : Off
>         Laser tx power low warning  (Channel 4)   : Off
>         Laser rx power high alarm   (Channel 1)   : Off
>         Laser rx power low alarm    (Channel 1)   : Off
>         Laser rx power high warning (Channel 1)   : Off
>         Laser rx power low warning  (Channel 1)   : Off
>         Laser rx power high alarm   (Channel 2)   : Off
>         Laser rx power low alarm    (Channel 2)   : Off
>         Laser rx power high warning (Channel 2)   : Off
>         Laser rx power low warning  (Channel 2)   : Off
>         Laser rx power high alarm   (Channel 3)   : Off
>         Laser rx power low alarm    (Channel 3)   : Off
>         Laser rx power high warning (Channel 3)   : Off
>         Laser rx power low warning  (Channel 3)   : Off
>         Laser rx power high alarm   (Channel 4)   : Off
>         Laser rx power low alarm    (Channel 4)   : Off
>         Laser rx power high warning (Channel 4)   : Off
>         Laser rx power low warning  (Channel 4)   : Off
>         Laser bias current high alarm threshold   : 0.000 mA
>         Laser bias current low alarm threshold    : 0.000 mA
>         Laser bias current high warning threshold : 0.000 mA
>         Laser bias current low warning threshold  : 0.000 mA
>         Laser output power high alarm threshold   : 0.0000 mW / -inf dBm
>         Laser output power low alarm threshold    : 0.0000 mW / -inf dBm
>         Laser output power high warning threshold : 0.0000 mW / -inf dBm
>         Laser output power low warning threshold  : 0.0000 mW / -inf dBm
>         Module temperature high alarm threshold   : 0.00 degrees C / 32.00 degrees F
>         Module temperature low alarm threshold    : 0.00 degrees C / 32.00 degrees F
>         Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F
>         Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
>         Module voltage high alarm threshold       : 0.0000 V
>         Module voltage low alarm threshold        : 0.0000 V
>         Module voltage high warning threshold     : 0.0000 V
>         Module voltage low warning threshold      : 0.0000 V
>         Laser rx power high alarm threshold       : 0.0000 mW / -inf dBm
>         Laser rx power low alarm threshold        : 0.0000 mW / -inf dBm
>         Laser rx power high warning threshold     : 0.0000 mW / -inf dBm
>         Laser rx power low warning threshold      : 0.0000 mW / -inf dBm
> 
> 
> tech1@D8:~$ sudo ethtool -m enp1s0 | cat
>         Identifier                                : 0x11 (QSFP28)
>         Extended identifier                       : 0xfc
>         Extended identifier description           : 3.5W max. Power consumption
>         Extended identifier description           : CDR present in TX, CDR present in RX
>         Extended identifier description           : High Power Class (> 3.5 W) not enabled
>         Connector                                 : 0x07 (LC)
>         Transceiver codes                         : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>         Transceiver type                          : 100G Ethernet: 100G CWDM4 MSA with FEC
>         Encoding                                  : 0x03 (NRZ)
>         BR, Nominal                               : 25500Mbps
>         Rate identifier                           : 0x00
>         Length (SMF,km)                           : 2km
>         Length (OM3 50um)                         : 0m
>         Length (OM2 50um)                         : 0m
>         Length (OM1 62.5um)                       : 0m
>         Length (Copper or Active cable)           : 0m
>         Transmitter technology                    : 0x40 (1310 nm DFB)
>         Laser wavelength                          : 1310.000nm
>         Laser wavelength tolerance                : 47.500nm
>         Vendor name                               : TRANSITION
>         Vendor OUI                                : 00:c0:f2
>         Vendor PN                                 : TNQSFP100GCWDM4
>         Vendor rev                                : 1A
>         Vendor SN                                 : TN02000302
>         Date code                                 : 180919
>         Revision Compliance                       : SFF-8636 Rev 2.5/2.6/2.7
>         Module temperature                        : 39.53 degrees C / 103.15 degrees F
>         Module voltage                            : 3.3249 V
>         Alarm/warning flags implemented           : Yes
>         Laser tx bias current (Channel 1)         : 34.432 mA
>         Laser tx bias current (Channel 2)         : 34.432 mA
>         Laser tx bias current (Channel 3)         : 33.408 mA
>         Laser tx bias current (Channel 4)         : 33.920 mA
>         Transmit avg optical power (Channel 1)    : 0.9043 mW / -0.44 dBm
>         Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
>         Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
>         Transmit avg optical power (Channel 4)    : 0.7009 mW / -1.54 dBm
>         Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
>         Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
>         Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
>         Rcvr signal avg optical power(Channel 4)  : 0.6847 mW / -1.64 dBm
>         Laser bias current high alarm   (Chan 1)  : Off
>         Laser bias current low alarm    (Chan 1)  : Off
>         Laser bias current high warning (Chan 1)  : Off
>         Laser bias current low warning  (Chan 1)  : Off
>         Laser bias current high alarm   (Chan 2)  : Off
>         Laser bias current low alarm    (Chan 2)  : Off
>         Laser bias current high warning (Chan 2)  : Off
>         Laser bias current low warning  (Chan 2)  : Off
>         Laser bias current high alarm   (Chan 3)  : Off
>         Laser bias current low alarm    (Chan 3)  : Off
>         Laser bias current high warning (Chan 3)  : Off
>         Laser bias current low warning  (Chan 3)  : Off
>         Laser bias current high alarm   (Chan 4)  : Off
>         Laser bias current low alarm    (Chan 4)  : Off
>         Laser bias current high warning (Chan 4)  : Off
>         Laser bias current low warning  (Chan 4)  : Off
>         Module temperature high alarm             : Off
>         Module temperature low alarm              : Off
>         Module temperature high warning           : Off
>         Module temperature low warning            : Off
>         Module voltage high alarm                 : Off
>         Module voltage low alarm                  : Off
>         Module voltage high warning               : Off
>         Module voltage low warning                : Off
>         Laser tx power high alarm   (Channel 1)   : Off
>         Laser tx power low alarm    (Channel 1)   : Off
>         Laser tx power high warning (Channel 1)   : Off
>         Laser tx power low warning  (Channel 1)   : Off
>         Laser tx power high alarm   (Channel 2)   : Off
>         Laser tx power low alarm    (Channel 2)   : Off
>         Laser tx power high warning (Channel 2)   : Off
>         Laser tx power low warning  (Channel 2)   : Off
>         Laser tx power high alarm   (Channel 3)   : Off
>         Laser tx power low alarm    (Channel 3)   : Off
>         Laser tx power high warning (Channel 3)   : Off
>         Laser tx power low warning  (Channel 3)   : Off
>         Laser tx power high alarm   (Channel 4)   : Off
>         Laser tx power low alarm    (Channel 4)   : Off
>         Laser tx power high warning (Channel 4)   : Off
>         Laser tx power low warning  (Channel 4)   : Off
>         Laser rx power high alarm   (Channel 1)   : Off
>         Laser rx power low alarm    (Channel 1)   : Off
>         Laser rx power high warning (Channel 1)   : Off
>         Laser rx power low warning  (Channel 1)   : Off
>         Laser rx power high alarm   (Channel 2)   : Off
>         Laser rx power low alarm    (Channel 2)   : Off
>         Laser rx power high warning (Channel 2)   : Off
>         Laser rx power low warning  (Channel 2)   : Off
>         Laser rx power high alarm   (Channel 3)   : Off
>         Laser rx power low alarm    (Channel 3)   : Off
>         Laser rx power high warning (Channel 3)   : Off
>         Laser rx power low warning  (Channel 3)   : Off
>         Laser rx power high alarm   (Channel 4)   : Off
>         Laser rx power low alarm    (Channel 4)   : Off
>         Laser rx power high warning (Channel 4)   : Off
>         Laser rx power low warning  (Channel 4)   : Off
>         Laser bias current high alarm threshold   : 16.448 mA
>         Laser bias current low alarm threshold    : 16.448 mA
>         Laser bias current high warning threshold : 16.448 mA
>         Laser bias current low warning threshold  : 16.448 mA
>         Laser output power high alarm threshold   : 0.8224 mW / -0.85 dBm
>         Laser output power low alarm threshold    : 0.8250 mW / -0.84 dBm
>         Laser output power high warning threshold : 0.8264 mW / -0.83 dBm
>         Laser output power low warning threshold  : 2.6983 mW / 4.31 dBm
>         Module temperature high alarm threshold   : 110.12 degrees C / 230.22 degrees F
>         Module temperature low alarm threshold    : 84.34 degrees C / 183.82 degrees F
>         Module temperature high warning threshold : 44.12 degrees C / 111.42 degrees F
>         Module temperature low warning threshold  : 67.27 degrees C / 153.08 degrees F
>         Module voltage high alarm threshold       : 2.9728 V
>         Module voltage low alarm threshold        : 2.6990 V
>         Module voltage high warning threshold     : 0.8274 V
>         Module voltage low warning threshold      : 2.2538 V
>         Laser rx power high alarm threshold       : 2.5458 mW / 4.06 dBm
>         Laser rx power low alarm threshold        : 2.6992 mW / 4.31 dBm
>         Laser rx power high warning threshold     : 2.9801 mW / 4.74 dBm
>         Laser rx power low warning threshold      : 2.8526 mW / 4.55 dBm
> 
> 
> Bug # 2. All of the alarm and warning threshold values reported in the above commands are spurious.
> At first glance, one would assume that the threshold values reported by the piped ethtool command are correct, but they're not.  I know the programmed values for the above transceiver, so that makes it easy for me to spot the spurious values, but even without knowing the programmed values of a given transceiver, one can use logic to detect when the ethtool displayed values don't make sense.
> For example, lets scrutinize the values for voltage warnings and alarms reported by ethtool on this transceiver.  We will look at each voltage threshold, and scrutinize that value relative to the other voltage thresholds, and look for contradictions to determine whether the reported values seem legit.  
>                                 Known           ethtool
>                                 Actual          Reported
>          Values          Values
> High Voltage Alarm              3.70V           2.9728 V
> High Voltage Warning            3.59V           0.8274 V
> (Operating spec = 3.30V)        
> Low Voltage Warning             3.00V           2.2538 V
> Low Voltage Alarm               2.90V           2.6990 V
> 
> Contradictions for the ethtool reported voltage warning and alarm thresholds:
> 1. The high voltage alarm should occur at higher voltage than the operating voltage, but ethtool didn't report that.
> 2. The high voltage warning should occur at higher voltage than the low voltage warning and alarm, but ethtool didn't report that.
> 3. The low voltage warning should occur at higher voltage than the low voltage alarm, but ethtool didn't report that.
> 4. The low voltage alarm should occur at a lower voltage than any of the other voltage warnings and alarms, but ethtool didn't report that.
> 5. The current voltage value was reported as 3.3249V, which should trigger high voltage warning and alarm, according to the reported thresholds, but no warnings or alarms are indicated.  
>  
> Each of the 4 voltage thresholds reported by ethtool have contradictions, so we know something is not right.  This same kind of logic can be applied to the thresholds for temperature, laser TX power, etc.. to find that those values are also spurious.
> 
> 
> Installing the above transceiver in a Cisco switch reveals that the Cisco correctly retrieves the true warning and alarm threshold values from the transceiver's EEPROM, so we trust that the transceiver has been correctly programmed.  Cisco CLI output for that transceiver shown here:
> 
> switch# show interface ethernet 1/3 transceiver details 
> Ethernet1/3
>     transceiver is present
>     type is QSFP-100G-CWDM4-MSA-FEC
>     name is TRANSITION
>     part number is TNQSFP100GCWDM4
>     revision is 1A
>     serial number is TN02000302
>     nominal bitrate is 25500 MBit/sec per channel
>     Link length supported for 9/125um fiber is 2 km
>     cisco id is 17
>     cisco extended id number is 252
> 
> Lane Number:1 Network Lane
>            SFP Detail Diagnostics Information (internal calibration)
>   ----------------------------------------------------------------------------
>                 Current              Alarms                  Warnings
>                 Measurement     High        Low         High          Low
>   ----------------------------------------------------------------------------
>   Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
>   Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
>   Current       34.24 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
>   Tx Power      -0.44 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
>   Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
>   Transmit Fault Count = 0
>   ----------------------------------------------------------------------------
>   Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning
> 
> Lane Number:2 Network Lane
>            SFP Detail Diagnostics Information (internal calibration)
>   ----------------------------------------------------------------------------
>                 Current              Alarms                  Warnings
>                 Measurement     High        Low         High          Low
>   ----------------------------------------------------------------------------
>   Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
>   Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
>   Current       34.24 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
>   Tx Power      -1.20 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
>   Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
>   Transmit Fault Count = 0
>   ----------------------------------------------------------------------------
>   Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning
> 
> Lane Number:3 Network Lane
>            SFP Detail Diagnostics Information (internal calibration)
>   ----------------------------------------------------------------------------
>                 Current              Alarms                  Warnings
>                 Measurement     High        Low         High          Low
>   ----------------------------------------------------------------------------
>   Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
>   Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
>   Current       33.21 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
>   Tx Power      -0.96 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
>   Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
>  Transmit Fault Count = 0
>   ----------------------------------------------------------------------------
>   Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning
> 
> Lane Number:4 Network Lane
>            SFP Detail Diagnostics Information (internal calibration)
>   ----------------------------------------------------------------------------
>                 Current              Alarms                  Warnings
>                 Measurement     High        Low         High          Low
>   ----------------------------------------------------------------------------
>   Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
>   Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
>   Current       33.72 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
>   Tx Power      -1.59 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
>   Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
>   Transmit Fault Count = 0
>   ----------------------------------------------------------------------------
>   Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning
> 
> switch#
> 
> 
> Any help with these issues is greatly appreciated.  If you have any questions or advice, please let me know.  I'll be glad to continue troubleshooting this until it's resolved.  Thank you.    
> 
> 
> Chris Preimesberger | Test & Validation Engineer
> Transition Networks, Inc.
> 
> chrisp@transition.com
> direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com
> ________________________________________
> 
> 
> 
> 
> 
> 
> 

> For comparison to ethtool's output that shows incorrect threshold values, when installing the same transceiver in a Cisco Nexus switch, and issuing the Cisco command "show interface ethernet 1/3 transceiver details", the switch correctly correctly reads/displays the transceiver's Alarm and Warning thresholds, as shown below:
> 
> 
> switch# show interface ethernet 1/3 transceiver details 
> Ethernet1/3
>     transceiver is present
>     type is QSFP-100G-CWDM4-MSA-FEC
>     name is TRANSITION
>     part number is TNQSFP100GCWDM4
>     revision is 1A
>     serial number is TN02000302
>     nominal bitrate is 25500 MBit/sec per channel
>     Link length supported for 9/125um fiber is 2 km
>     cisco id is 17
>     cisco extended id number is 252
> 
> Lane Number:1 Network Lane
>            SFP Detail Diagnostics Information (internal calibration)
>   ----------------------------------------------------------------------------
>                 Current              Alarms                  Warnings
>                 Measurement     High        Low         High          Low
>   ----------------------------------------------------------------------------
>   Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
>   Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
>   Current       34.24 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
>   Tx Power      -0.44 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
>   Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
>   Transmit Fault Count = 0
>   ----------------------------------------------------------------------------
>   Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning
> 
> Lane Number:2 Network Lane
>            SFP Detail Diagnostics Information (internal calibration)
>   ----------------------------------------------------------------------------
>                 Current              Alarms                  Warnings
>                 Measurement     High        Low         High          Low
>   ----------------------------------------------------------------------------
>   Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
>   Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
>   Current       34.24 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
>   Tx Power      -1.20 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
>   Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
>   Transmit Fault Count = 0
>   ----------------------------------------------------------------------------
>   Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning
> 
> Lane Number:3 Network Lane
>            SFP Detail Diagnostics Information (internal calibration)
>   ----------------------------------------------------------------------------
>                 Current              Alarms                  Warnings
>                 Measurement     High        Low         High          Low
>   ----------------------------------------------------------------------------
>   Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
>   Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
>   Current       33.21 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
>   Tx Power      -0.96 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
>   Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
>   Transmit Fault Count = 0
>   ----------------------------------------------------------------------------
>   Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning
> 
> Lane Number:4 Network Lane
>            SFP Detail Diagnostics Information (internal calibration)
>   ----------------------------------------------------------------------------
>                 Current              Alarms                  Warnings
>                 Measurement     High        Low         High          Low
>   ----------------------------------------------------------------------------
>   Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
>   Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
>   Current       33.72 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
>   Tx Power      -1.59 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
>   Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
>   Transmit Fault Count = 0
>   ----------------------------------------------------------------------------
>   Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning
> 
> switch# 
> 

> 
> Look at each line in the ethtool output below that includes the word "threshold".  This file has been hand-edited to show the threshold values that have been programmed into the transceiver, which should be displayed by ethtool.  The threshold values shown below are copied and pasted from the output of the Cisco NX-OS command "show interface ethernet 1/3 transceiver details", while the transceiver was installed in a Cisco Nexus switch.
> 
> Note - I only copied the threshold values in the units that were displayed by the Cisco switch.  The "?" symbols are just a placeholder for the converted values; I was too lazy to do conversions between dBm and mW, or between degrees C and degrees F.  Ethtool would be expected to report the true / converted values.
> 
> 
> 
> 
> tech1@D8:~$ sudo ethtool -m enp1s0
> 	Identifier                                : 0x11 (QSFP28)
> 	Extended identifier                       : 0xfc
> 	Extended identifier description           : 3.5W max. Power consumption
> 	Extended identifier description           : CDR present in TX, CDR present in RX
> 	Extended identifier description           : High Power Class (> 3.5 W) not enabled
> 	Connector                                 : 0x07 (LC)
> 	Transceiver codes                         : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 	Transceiver type                          : 100G Ethernet: 100G CWDM4 MSA with FEC
> 	Encoding                                  : 0x03 (NRZ)
> 	BR, Nominal                               : 25500Mbps
> 	Rate identifier                           : 0x00
> 	Length (SMF,km)                           : 2km
> 	Length (OM3 50um)                         : 0m
> 	Length (OM2 50um)                         : 0m
> 	Length (OM1 62.5um)                       : 0m
> 	Length (Copper or Active cable)           : 0m
> 	Transmitter technology                    : 0x40 (1310 nm DFB)
> 	Laser wavelength                          : 1310.000nm
> 	Laser wavelength tolerance                : 47.500nm
> 	Vendor name                               : TRANSITION
> 	Vendor OUI                                : 00:c0:f2
> 	Vendor PN                                 : TNQSFP100GCWDM4
> 	Vendor rev                                : 1A
> 	Vendor SN                                 : TN02000302
> 	Date code                                 : 180919
> 	Revision Compliance                       : SFF-8636 Rev 2.5/2.6/2.7
> 	Module temperature                        : 39.53 degrees C / 103.15 degrees F
> 	Module voltage                            : 3.3233 V
> 	Alarm/warning flags implemented           : Yes
> 	Laser tx bias current (Channel 1)         : 34.432 mA
> 	Laser tx bias current (Channel 2)         : 34.432 mA
> 	Laser tx bias current (Channel 3)         : 33.408 mA
> 	Laser tx bias current (Channel 4)         : 33.920 mA
> 	Transmit avg optical power (Channel 1)    : 0.9052 mW / -0.43 dBm
> 	Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
> 	Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
> 	Transmit avg optical power (Channel 4)    : 0.7009 mW / -1.54 dBm
> 	Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
> 	Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
> 	Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
> 	Rcvr signal avg optical power(Channel 4)  : 0.6948 mW / -1.58 dBm
> 	Laser bias current high alarm   (Chan 1)  : Off
> 	Laser bias current low alarm    (Chan 1)  : Off
> 	Laser bias current high warning (Chan 1)  : Off
> 	Laser bias current low warning  (Chan 1)  : Off
> 	Laser bias current high alarm   (Chan 2)  : Off
> 	Laser bias current low alarm    (Chan 2)  : Off
> 	Laser bias current high warning (Chan 2)  : Off
> 	Laser bias current low warning  (Chan 2)  : Off
> 	Laser bias current high alarm   (Chan 3)  : Off
> 	Laser bias current low alarm    (Chan 3)  : Off
> 	Laser bias current high warning (Chan 3)  : Off
> 	Laser bias current low warning  (Chan 3)  : Off
> 	Laser bias current high alarm   (Chan 4)  : Off
> 	Laser bias current low alarm    (Chan 4)  : Off
> 	Laser bias current high warning (Chan 4)  : Off
> 	Laser bias current low warning  (Chan 4)  : Off
> 	Module temperature high alarm             : Off
> 	Module temperature low alarm              : Off
> 	Module temperature high warning           : Off
> 	Module temperature low warning            : Off
> 	Module voltage high alarm                 : Off
> 	Module voltage low alarm                  : Off
> 	Module voltage high warning               : Off
> 	Module voltage low warning                : Off
> 	Laser tx power high alarm   (Channel 1)   : Off
> 	Laser tx power low alarm    (Channel 1)   : Off
> 	Laser tx power high warning (Channel 1)   : Off
> 	Laser tx power low warning  (Channel 1)   : Off
> 	Laser tx power high alarm   (Channel 2)   : Off
> 	Laser tx power low alarm    (Channel 2)   : Off
> 	Laser tx power high warning (Channel 2)   : Off
> 	Laser tx power low warning  (Channel 2)   : Off
> 	Laser tx power high alarm   (Channel 3)   : Off
> 	Laser tx power low alarm    (Channel 3)   : Off
> 	Laser tx power high warning (Channel 3)   : Off
> 	Laser tx power low warning  (Channel 3)   : Off
> 	Laser tx power high alarm   (Channel 4)   : Off
> 	Laser tx power low alarm    (Channel 4)   : Off
> 	Laser tx power high warning (Channel 4)   : Off
> 	Laser tx power low warning  (Channel 4)   : Off
> 	Laser rx power high alarm   (Channel 1)   : Off
> 	Laser rx power low alarm    (Channel 1)   : Off
> 	Laser rx power high warning (Channel 1)   : Off
> 	Laser rx power low warning  (Channel 1)   : Off
> 	Laser rx power high alarm   (Channel 2)   : Off
> 	Laser rx power low alarm    (Channel 2)   : Off
> 	Laser rx power high warning (Channel 2)   : Off
> 	Laser rx power low warning  (Channel 2)   : Off
> 	Laser rx power high alarm   (Channel 3)   : Off
> 	Laser rx power low alarm    (Channel 3)   : Off
> 	Laser rx power high warning (Channel 3)   : Off
> 	Laser rx power low warning  (Channel 3)   : Off
> 	Laser rx power high alarm   (Channel 4)   : Off
> 	Laser rx power low alarm    (Channel 4)   : Off
> 	Laser rx power high warning (Channel 4)   : Off
> 	Laser rx power low warning  (Channel 4)   : Off
> 	Laser bias current high alarm threshold   : 75.000 mA
> 	Laser bias current low alarm threshold    : 10.000 mA
> 	Laser bias current high warning threshold : 70.000 mA
> 	Laser bias current low warning threshold  : 15.000 mA
> 	Laser output power high alarm threshold   : ? mW / 4.49 dBm
> 	Laser output power low alarm threshold    : ? mW / -8.50 dBm
> 	Laser output power high warning threshold : ? mW / 3.49 dBm
> 	Laser output power low warning threshold  : ? mW / -7.52 dBm
> 	Module temperature high alarm threshold   : 80.00 degrees C / ? degrees F
> 	Module temperature low alarm threshold    : -10.00 degrees C / ? degrees F
> 	Module temperature high warning threshold : 75.00 degrees C / ? degrees F
> 	Module temperature low warning threshold  : -5.00 degrees C / ? degrees F
> 	Module voltage high alarm threshold       : 3.7000 V
> 	Module voltage low alarm threshold        : 2.9000 V
> 	Module voltage high warning threshold     : 3.5900 V
> 	Module voltage low warning threshold      : 3.0000 V
> 	Laser rx power high alarm threshold       : ? mW / 4.49 dBm
> 	Laser rx power low alarm threshold        : ? mW / -14.55 dBm
> 	Laser rx power high warning threshold     : ? mW / 3.49 dBm
> 	Laser rx power low warning threshold      : ? mW / -12.51 dBm
> 

> 
> Look at each line in the ethtool output below that includes the word "threshold".  This file shows the actual output from ethtool v4.18, when the output is not piped to another command.  Notice that all of the displayed threshold values are 0 (which is incorrect), while other values report as expected.
> 
> tech1@D8:~$ sudo ethtool -m enp1s0
> 	Identifier                                : 0x11 (QSFP28)
> 	Extended identifier                       : 0xfc
> 	Extended identifier description           : 3.5W max. Power consumption
> 	Extended identifier description           : CDR present in TX, CDR present in RX
> 	Extended identifier description           : High Power Class (> 3.5 W) not enabled
> 	Connector                                 : 0x07 (LC)
> 	Transceiver codes                         : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 	Transceiver type                          : 100G Ethernet: 100G CWDM4 MSA with FEC
> 	Encoding                                  : 0x03 (NRZ)
> 	BR, Nominal                               : 25500Mbps
> 	Rate identifier                           : 0x00
> 	Length (SMF,km)                           : 2km
> 	Length (OM3 50um)                         : 0m
> 	Length (OM2 50um)                         : 0m
> 	Length (OM1 62.5um)                       : 0m
> 	Length (Copper or Active cable)           : 0m
> 	Transmitter technology                    : 0x40 (1310 nm DFB)
> 	Laser wavelength                          : 1310.000nm
> 	Laser wavelength tolerance                : 47.500nm
> 	Vendor name                               : TRANSITION
> 	Vendor OUI                                : 00:c0:f2
> 	Vendor PN                                 : TNQSFP100GCWDM4
> 	Vendor rev                                : 1A
> 	Vendor SN                                 : TN02000302
> 	Date code                                 : 180919
> 	Revision Compliance                       : SFF-8636 Rev 2.5/2.6/2.7
> 	Module temperature                        : 39.53 degrees C / 103.15 degrees F
> 	Module voltage                            : 3.3241 V
> 	Alarm/warning flags implemented           : Yes
> 	Laser tx bias current (Channel 1)         : 34.432 mA
> 	Laser tx bias current (Channel 2)         : 34.432 mA
> 	Laser tx bias current (Channel 3)         : 33.408 mA
> 	Laser tx bias current (Channel 4)         : 33.920 mA
> 	Transmit avg optical power (Channel 1)    : 0.9048 mW / -0.43 dBm
> 	Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
> 	Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
> 	Transmit avg optical power (Channel 4)    : 0.7014 mW / -1.54 dBm
> 	Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
> 	Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
> 	Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
> 	Rcvr signal avg optical power(Channel 4)  : 0.6847 mW / -1.64 dBm
> 	Laser bias current high alarm   (Chan 1)  : Off
> 	Laser bias current low alarm    (Chan 1)  : Off
> 	Laser bias current high warning (Chan 1)  : Off
> 	Laser bias current low warning  (Chan 1)  : Off
> 	Laser bias current high alarm   (Chan 2)  : Off
> 	Laser bias current low alarm    (Chan 2)  : Off
> 	Laser bias current high warning (Chan 2)  : Off
> 	Laser bias current low warning  (Chan 2)  : Off
> 	Laser bias current high alarm   (Chan 3)  : Off
> 	Laser bias current low alarm    (Chan 3)  : Off
> 	Laser bias current high warning (Chan 3)  : Off
> 	Laser bias current low warning  (Chan 3)  : Off
> 	Laser bias current high alarm   (Chan 4)  : Off
> 	Laser bias current low alarm    (Chan 4)  : Off
> 	Laser bias current high warning (Chan 4)  : Off
> 	Laser bias current low warning  (Chan 4)  : Off
> 	Module temperature high alarm             : Off
> 	Module temperature low alarm              : Off
> 	Module temperature high warning           : Off
> 	Module temperature low warning            : Off
> 	Module voltage high alarm                 : Off
> 	Module voltage low alarm                  : Off
> 	Module voltage high warning               : Off
> 	Module voltage low warning                : Off
> 	Laser tx power high alarm   (Channel 1)   : Off
> 	Laser tx power low alarm    (Channel 1)   : Off
> 	Laser tx power high warning (Channel 1)   : Off
> 	Laser tx power low warning  (Channel 1)   : Off
> 	Laser tx power high alarm   (Channel 2)   : Off
> 	Laser tx power low alarm    (Channel 2)   : Off
> 	Laser tx power high warning (Channel 2)   : Off
> 	Laser tx power low warning  (Channel 2)   : Off
> 	Laser tx power high alarm   (Channel 3)   : Off
> 	Laser tx power low alarm    (Channel 3)   : Off
> 	Laser tx power high warning (Channel 3)   : Off
> 	Laser tx power low warning  (Channel 3)   : Off
> 	Laser tx power high alarm   (Channel 4)   : Off
> 	Laser tx power low alarm    (Channel 4)   : Off
> 	Laser tx power high warning (Channel 4)   : Off
> 	Laser tx power low warning  (Channel 4)   : Off
> 	Laser rx power high alarm   (Channel 1)   : Off
> 	Laser rx power low alarm    (Channel 1)   : Off
> 	Laser rx power high warning (Channel 1)   : Off
> 	Laser rx power low warning  (Channel 1)   : Off
> 	Laser rx power high alarm   (Channel 2)   : Off
> 	Laser rx power low alarm    (Channel 2)   : Off
> 	Laser rx power high warning (Channel 2)   : Off
> 	Laser rx power low warning  (Channel 2)   : Off
> 	Laser rx power high alarm   (Channel 3)   : Off
> 	Laser rx power low alarm    (Channel 3)   : Off
> 	Laser rx power high warning (Channel 3)   : Off
> 	Laser rx power low warning  (Channel 3)   : Off
> 	Laser rx power high alarm   (Channel 4)   : Off
> 	Laser rx power low alarm    (Channel 4)   : Off
> 	Laser rx power high warning (Channel 4)   : Off
> 	Laser rx power low warning  (Channel 4)   : Off
> 	Laser bias current high alarm threshold   : 0.000 mA
> 	Laser bias current low alarm threshold    : 0.000 mA
> 	Laser bias current high warning threshold : 0.000 mA
> 	Laser bias current low warning threshold  : 0.000 mA
> 	Laser output power high alarm threshold   : 0.0000 mW / -inf dBm
> 	Laser output power low alarm threshold    : 0.0000 mW / -inf dBm
> 	Laser output power high warning threshold : 0.0000 mW / -inf dBm
> 	Laser output power low warning threshold  : 0.0000 mW / -inf dBm
> 	Module temperature high alarm threshold   : 0.00 degrees C / 32.00 degrees F
> 	Module temperature low alarm threshold    : 0.00 degrees C / 32.00 degrees F
> 	Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F
> 	Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
> 	Module voltage high alarm threshold       : 0.0000 V
> 	Module voltage low alarm threshold        : 0.0000 V
> 	Module voltage high warning threshold     : 0.0000 V
> 	Module voltage low warning threshold      : 0.0000 V
> 	Laser rx power high alarm threshold       : 0.0000 mW / -inf dBm
> 	Laser rx power low alarm threshold        : 0.0000 mW / -inf dBm
> 	Laser rx power high warning threshold     : 0.0000 mW / -inf dBm
> 	Laser rx power low warning threshold      : 0.0000 mW / -inf dBm
> 
> 

> 
> Look at each line in the ethtool output below that includes the word "threshold".  This file shows the actual output from ethtool v4.18, when the ethtool output is piped to another command.  Notice that all of the displayed threshold values are spurious while other values report as expected.
> 
> tech1@D8:~$ sudo ethtool -m enp1s0 | cat
> 	Identifier                                : 0x11 (QSFP28)
> 	Extended identifier                       : 0xfc
> 	Extended identifier description           : 3.5W max. Power consumption
> 	Extended identifier description           : CDR present in TX, CDR present in RX
> 	Extended identifier description           : High Power Class (> 3.5 W) not enabled
> 	Connector                                 : 0x07 (LC)
> 	Transceiver codes                         : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 	Transceiver type                          : 100G Ethernet: 100G CWDM4 MSA with FEC
> 	Encoding                                  : 0x03 (NRZ)
> 	BR, Nominal                               : 25500Mbps
> 	Rate identifier                           : 0x00
> 	Length (SMF,km)                           : 2km
> 	Length (OM3 50um)                         : 0m
> 	Length (OM2 50um)                         : 0m
> 	Length (OM1 62.5um)                       : 0m
> 	Length (Copper or Active cable)           : 0m
> 	Transmitter technology                    : 0x40 (1310 nm DFB)
> 	Laser wavelength                          : 1310.000nm
> 	Laser wavelength tolerance                : 47.500nm
> 	Vendor name                               : TRANSITION
> 	Vendor OUI                                : 00:c0:f2
> 	Vendor PN                                 : TNQSFP100GCWDM4
> 	Vendor rev                                : 1A
> 	Vendor SN                                 : TN02000302
> 	Date code                                 : 180919
> 	Revision Compliance                       : SFF-8636 Rev 2.5/2.6/2.7
> 	Module temperature                        : 39.53 degrees C / 103.15 degrees F
> 	Module voltage                            : 3.3249 V
> 	Alarm/warning flags implemented           : Yes
> 	Laser tx bias current (Channel 1)         : 34.432 mA
> 	Laser tx bias current (Channel 2)         : 34.432 mA
> 	Laser tx bias current (Channel 3)         : 33.408 mA
> 	Laser tx bias current (Channel 4)         : 33.920 mA
> 	Transmit avg optical power (Channel 1)    : 0.9043 mW / -0.44 dBm
> 	Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
> 	Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
> 	Transmit avg optical power (Channel 4)    : 0.7009 mW / -1.54 dBm
> 	Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
> 	Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
> 	Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
> 	Rcvr signal avg optical power(Channel 4)  : 0.6847 mW / -1.64 dBm
> 	Laser bias current high alarm   (Chan 1)  : Off
> 	Laser bias current low alarm    (Chan 1)  : Off
> 	Laser bias current high warning (Chan 1)  : Off
> 	Laser bias current low warning  (Chan 1)  : Off
> 	Laser bias current high alarm   (Chan 2)  : Off
> 	Laser bias current low alarm    (Chan 2)  : Off
> 	Laser bias current high warning (Chan 2)  : Off
> 	Laser bias current low warning  (Chan 2)  : Off
> 	Laser bias current high alarm   (Chan 3)  : Off
> 	Laser bias current low alarm    (Chan 3)  : Off
> 	Laser bias current high warning (Chan 3)  : Off
> 	Laser bias current low warning  (Chan 3)  : Off
> 	Laser bias current high alarm   (Chan 4)  : Off
> 	Laser bias current low alarm    (Chan 4)  : Off
> 	Laser bias current high warning (Chan 4)  : Off
> 	Laser bias current low warning  (Chan 4)  : Off
> 	Module temperature high alarm             : Off
> 	Module temperature low alarm              : Off
> 	Module temperature high warning           : Off
> 	Module temperature low warning            : Off
> 	Module voltage high alarm                 : Off
> 	Module voltage low alarm                  : Off
> 	Module voltage high warning               : Off
> 	Module voltage low warning                : Off
> 	Laser tx power high alarm   (Channel 1)   : Off
> 	Laser tx power low alarm    (Channel 1)   : Off
> 	Laser tx power high warning (Channel 1)   : Off
> 	Laser tx power low warning  (Channel 1)   : Off
> 	Laser tx power high alarm   (Channel 2)   : Off
> 	Laser tx power low alarm    (Channel 2)   : Off
> 	Laser tx power high warning (Channel 2)   : Off
> 	Laser tx power low warning  (Channel 2)   : Off
> 	Laser tx power high alarm   (Channel 3)   : Off
> 	Laser tx power low alarm    (Channel 3)   : Off
> 	Laser tx power high warning (Channel 3)   : Off
> 	Laser tx power low warning  (Channel 3)   : Off
> 	Laser tx power high alarm   (Channel 4)   : Off
> 	Laser tx power low alarm    (Channel 4)   : Off
> 	Laser tx power high warning (Channel 4)   : Off
> 	Laser tx power low warning  (Channel 4)   : Off
> 	Laser rx power high alarm   (Channel 1)   : Off
> 	Laser rx power low alarm    (Channel 1)   : Off
> 	Laser rx power high warning (Channel 1)   : Off
> 	Laser rx power low warning  (Channel 1)   : Off
> 	Laser rx power high alarm   (Channel 2)   : Off
> 	Laser rx power low alarm    (Channel 2)   : Off
> 	Laser rx power high warning (Channel 2)   : Off
> 	Laser rx power low warning  (Channel 2)   : Off
> 	Laser rx power high alarm   (Channel 3)   : Off
> 	Laser rx power low alarm    (Channel 3)   : Off
> 	Laser rx power high warning (Channel 3)   : Off
> 	Laser rx power low warning  (Channel 3)   : Off
> 	Laser rx power high alarm   (Channel 4)   : Off
> 	Laser rx power low alarm    (Channel 4)   : Off
> 	Laser rx power high warning (Channel 4)   : Off
> 	Laser rx power low warning  (Channel 4)   : Off
> 	Laser bias current high alarm threshold   : 16.448 mA
> 	Laser bias current low alarm threshold    : 16.448 mA
> 	Laser bias current high warning threshold : 16.448 mA
> 	Laser bias current low warning threshold  : 16.448 mA
> 	Laser output power high alarm threshold   : 0.8224 mW / -0.85 dBm
> 	Laser output power low alarm threshold    : 0.8250 mW / -0.84 dBm
> 	Laser output power high warning threshold : 0.8264 mW / -0.83 dBm
> 	Laser output power low warning threshold  : 2.6983 mW / 4.31 dBm
> 	Module temperature high alarm threshold   : 110.12 degrees C / 230.22 degrees F
> 	Module temperature low alarm threshold    : 84.34 degrees C / 183.82 degrees F
> 	Module temperature high warning threshold : 44.12 degrees C / 111.42 degrees F
> 	Module temperature low warning threshold  : 67.27 degrees C / 153.08 degrees F
> 	Module voltage high alarm threshold       : 2.9728 V
> 	Module voltage low alarm threshold        : 2.6990 V
> 	Module voltage high warning threshold     : 0.8274 V
> 	Module voltage low warning threshold      : 2.2538 V
> 	Laser rx power high alarm threshold       : 2.5458 mW / 4.06 dBm
> 	Laser rx power low alarm threshold        : 2.6992 mW / 4.31 dBm
> 	Laser rx power high warning threshold     : 2.9801 mW / 4.74 dBm
> 	Laser rx power low warning threshold      : 2.8526 mW / 4.55 dBm
> tech1@D8:~$ 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-26 20:47   ` Chris Preimesberger
@ 2018-09-26 21:46     ` Andrew Lunn
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Lunn @ 2018-09-26 21:46 UTC (permalink / raw)
  To: Chris Preimesberger; +Cc: linville@tuxdriver.com, netdev@vger.kernel.org

On Wed, Sep 26, 2018 at 08:47:34PM +0000, Chris Preimesberger wrote:
> Hello Andrew,
> 
> Thank you for the quick response!!
> Apologies in advance for my use of outlook and top-posting, etc...
> 
> I've run the raw option and the hex option, and pasted the results below.
> Since the raw option printed strange characters on the CLI, I re-ran it,
> Sending the output to a file (raw.txt) and attached that file as well.
> 
> Pasted from Ubuntu CLI:
> 
> tech1@D7:~$ 
> tech1@D7:~$ 
> tech1@D7:~$ 
> tech1@D7:~$ 
> tech1@D7:~$ sudo ethtool -m enp1s0 raw on
> \x11UU$��pA`?�@�G\x10#
>                  �\x12v\x01\x11��\x03�\x02@TRANSITION      ��TNQSFP100GCWDM4 1AfX%\x1cF?\x06?�TN02000301      180919  
>     h�\x02I��_��'\x16��Ri=\x02`��Zntech1@D7:~$ 
> tech1@D7:~$ 
> tech1@D7:~$ 
> tech1@D7:~$ 
> tech1@D7:~$ sudo ethtool -m enp1s0 hex on
> Offset		Values
> ------		------
> 0x0000:		11 00 00 0f 00 00 00 00 00 55 55 00 00 00 00 00 
> 0x0010:		00 00 00 00 00 00 24 e2 00 00 81 68 00 00 00 00 
> 0x0020:		00 00 00 00 00 00 00 00 00 00 41 60 3f e0 40 e0 
> 0x0030:		47 00 1f 10 0e 1e 0b f7 12 76 00 00 00 00 00 00 
> 0x0040:		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 0x0050:		00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 
> 0x0060:		00 00 00 00 00 00 00 00 00 00 1f 00 00 00 00 00 
> 0x0070:		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 0x0080:		11 fc 07 80 00 00 00 00 00 00 00 03 ff 00 02 00 
> 0x0090:		00 00 00 40 54 52 41 4e 53 49 54 49 4f 4e 20 20 
> 0x00a0:		20 20 20 20 00 00 c0 f2 54 4e 51 53 46 50 31 30 
> 0x00b0:		30 47 43 57 44 4d 34 20 31 41 66 58 25 1c 46 3f 
> 0x00c0:		06 00 3f d6 54 4e 30 32 30 30 30 33 30 31 20 20 
> 0x00d0:		20 20 20 20 31 38 30 39 31 39 20 20 0c 00 68 f3 
> 0x00e0:		00 00 02 49 80 a0 5f 1f de c9 27 16 f8 ae 52 69 
> 0x00f0:		3d 02 60 00 00 00 00 00 00 00 00 00 83 f4 5a 6e 

Hi Chris

I've only recently got involved with SFP modules. ethtool says this is
a SFF-8636 SFP. So a QSFP. It has multiple pages, each 128 bytes in
length, which should be returned in a concatenated form. Here we see
256 bytes, meaning there are two pages. There can be up to 5 pages.

ethtool is looking for the temperature alarms at offset 0x200. So that
does not exist in this hex dump. But the raw dump you provided has
more bytes, 0x400 of them.

So i would say the first bug is that ethtool dumps different amounts
of data in hex than raw.

The fact you get different alarm thresholds on different runs suggests
to me we might only be getting two pages from the kernel?

Can you build ethtool from source and run it inside a debugger?
ethtool makes two IOCTL calls. The first is ETHTOOL_GMODULEINFO.
Could you print out the modinfo which is returned. It then does a
ETHTOOL_GMODULEEEPROM. Can you print out eeprom after the second
IOCTL.

Thanks
	Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-26 21:34 ` Neil Horman
@ 2018-09-26 21:58   ` Andrew Lunn
  2018-09-27 13:23     ` Neil Horman
  2018-09-27 13:25   ` Eran Ben Elisha
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Lunn @ 2018-09-26 21:58 UTC (permalink / raw)
  To: Neil Horman
  Cc: Chris Preimesberger, linville@tuxdriver.com,
	netdev@vger.kernel.org

> When you run ethtool -m on this driver, the kernel calls mlx4_en_get_module_info
> to determine the length of the eeprom, and that value will be either 256 or 512
> bytes.

So it sounds like QSFP modules using 8636 are not supported. You would
expect a size to be one of 256, 384, 512 or 640.

> Next it calls mlx4_en_get_module_eeprom, passing in that size 256 to actually
> read the eeprom data, which in turn calls mlx4_get_module_info to fetch the data
> from hardware, again, passing in 256 as the size for the first call (theres a
> loop, but it will only get executed once in this scenario)
> 
> mlx4_get_module_info then issues the appropriate mailbox commands to dump the
> eeprom.  Here it starts to go sideways.  The mailbox buffer allocated for the
> return data is of type mlx4_mad_ifc, which has some front matter information and
> a data buffer that is 192 bytes long!

Which suggests all SFP dumps are broken as well, not just QSFP.

Oh dear.

   Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-26 21:58   ` Andrew Lunn
@ 2018-09-27 13:23     ` Neil Horman
  0 siblings, 0 replies; 16+ messages in thread
From: Neil Horman @ 2018-09-27 13:23 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Chris Preimesberger, linville@tuxdriver.com,
	netdev@vger.kernel.org

On Wed, Sep 26, 2018 at 11:58:12PM +0200, Andrew Lunn wrote:
> > When you run ethtool -m on this driver, the kernel calls mlx4_en_get_module_info
> > to determine the length of the eeprom, and that value will be either 256 or 512
> > bytes.
> 
> So it sounds like QSFP modules using 8636 are not supported. You would
> expect a size to be one of 256, 384, 512 or 640.
> 
> > Next it calls mlx4_en_get_module_eeprom, passing in that size 256 to actually
> > read the eeprom data, which in turn calls mlx4_get_module_info to fetch the data
> > from hardware, again, passing in 256 as the size for the first call (theres a
> > loop, but it will only get executed once in this scenario)
> > 
> > mlx4_get_module_info then issues the appropriate mailbox commands to dump the
> > eeprom.  Here it starts to go sideways.  The mailbox buffer allocated for the
> > return data is of type mlx4_mad_ifc, which has some front matter information and
> > a data buffer that is 192 bytes long!
> 
> Which suggests all SFP dumps are broken as well, not just QSFP.
> 
No, not at all.  Each driver that implements a get_eeprom ethtool method, is
capable of doing multiple reads at various offsets, and filling up the user
buffer with real data.  The bug here is that the mellanox data structures are
not sized properly vis a vis the amount of eeprom data that user space might
expect, or more specifically that the driver isn't smart enough to do several
small reads to fill up the full sized request buffer

Neil

> Oh dear.
> 
>    Andrew
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-26 21:34 ` Neil Horman
  2018-09-26 21:58   ` Andrew Lunn
@ 2018-09-27 13:25   ` Eran Ben Elisha
  2018-09-27 14:52     ` Andrew Lunn
  1 sibling, 1 reply; 16+ messages in thread
From: Eran Ben Elisha @ 2018-09-27 13:25 UTC (permalink / raw)
  To: Neil Horman, Chris Preimesberger
  Cc: linville@tuxdriver.com, netdev@vger.kernel.org

> This is just a drive by guess, but I think this is a driver issue.
> 
> 
> Issue 1 seems like a red herring, cat doesn't modify output, nor does ethtool
> know if its output is going to a console or a pipe, its all the same.  And given
> issue 2 (that the output of the thresholds, etc are spurriously changing and
> wrong), suggests that they are spurriously changing and wrong regardless of what
> cat does.
> 
> That said, I think issue two is a problem with the mlx4 driver.  Specifically
> that the driver is copying garbage data.
> 
> The three ethtool functions at work here are:
> mlx4_en_get_module_info
> mlx4_en_get_module_eeprom
> mlx4_get_module_info
> 
> When you run ethtool -m on this driver, the kernel calls mlx4_en_get_module_info
> to determine the length of the eeprom, and that value will be either 256 or 512
> bytes.  Lets assume that the value is 256 for the sake of argument
> 
> Next it calls mlx4_en_get_module_eeprom, passing in that size 256 to actually
> read the eeprom data, which in turn calls mlx4_get_module_info to fetch the data
> from hardware, again, passing in 256 as the size for the first call (theres a
> loop, but it will only get executed once in this scenario)
> 
> mlx4_get_module_info then issues the appropriate mailbox commands to dump the
> eeprom.  Here it starts to go sideways.  The mailbox buffer allocated for the
> return data is of type mlx4_mad_ifc, which has some front matter information and
> a data buffer that is 192 bytes long!
> 
> A little further down in the function, size gets restricted if the buffer
> crosses a page boundary, but given that the size is 256 on the first call here,
> and offset is zero on the first call, we're not crossing anything, so size
> remains unchanged.
> 
> The output mailbox buffer outmad->data (a 192 byte array), then gets cast to a
> sturt mlx4_cable_info structure, which has its own internal data buffer that is
> only 48 bytes long.

Hi guys,
Thanks for digging into it.
Here are some observations I found:

1. Chris system has CX4 (which is served by mlx5 driver), all analysis 
by Neil was done over mlx4 driver (which serves the older generation of 
NICs. e.h CX3Pro).
2. In general, MAD commands are limited to 192 bytes of data.
3. CableInfo MAD command info is limited to 48 Bytes.
4. First check that mlx4_get_module_info is having:
         if (size > MODULE_INFO_MAX_READ)
                 size = MODULE_INFO_MAX_READ;
    So this is the info that were missing in the analysis. x <= 48 is 
     also returned by this function. No trash copy or overrun. It is 
expected from the caller(also inside mlx4) to recall with new offset in 
order to fetch more data.

5. I reviewed mlx5 driver, and it have reading mechanism (small diff: 
via MCIA register and not via MAD)

Both drivers read up to 256 bytes. 0-127 (from page 0). and 128-256 
(from page 0). Driver is not capable of reading over 256 bytes currently.

looking on qsfp.c parser in ethtool.c (user space), I see an 
uninitialized bug issue that have caused bug #1 + #2.
Applied it locally solved the issue (Not showing alarm data, which 
should be expected as driver do not fill it).

diff --git a/qsfp.c b/qsfp.c
index 32e195d12dc0..d196aa1753de 100644
--- a/qsfp.c
+++ b/qsfp.c
@@ -671,7 +671,7 @@ static void sff8636_dom_parse(const __u8 *id, struct 
sff_diags *sd)

  static void sff8636_show_dom(const __u8 *id, __u32 eeprom_len)
  {
-   struct sff_diags sd;
+ struct sff_diags sd = {0};
         char *rx_power_string = NULL;
         char power_string[MAX_DESC_SIZE];
         int i;

I will soon post a fix for it.

Thanks,
Eran


Thanks,
Eran

> 
> The memcpy in this functionthen copies cable_info->data to the buffer that gets
> returned to ethtool, but it copies size bytes (256), even though the source data
> buffer is only 48 bytes long.  That 48 byte array is embedded in the larger 192
> byte structure, so there won't be a panic on the overrun, but theres no telling
> what garbage is in the buffer beyond those first 48 bytes.  Even if the
> remaining 144 bytes have valid eeprom data, its less than the required 256
> bytes.  The additional copy may cause a panic, but if the buffer commonly bumps
> up against other allocated memory, that will go unnoticed.
> 
> after the memcpy, mlx4_get_module_info just returns the size of the passed in
> buffer (256), and so the calling function thinks its work is done, and lets  the
> kernel send back the buffer with garbage data to ethtool.
> 
> I think the mlx4 guys have some work to do here.
> 
> My $0.02
> Neil
> 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-27 13:25   ` Eran Ben Elisha
@ 2018-09-27 14:52     ` Andrew Lunn
  2018-09-27 15:20       ` Eran Ben Elisha
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Lunn @ 2018-09-27 14:52 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: Neil Horman, Chris Preimesberger, linville@tuxdriver.com,
	netdev@vger.kernel.org

> Both drivers read up to 256 bytes. 0-127 (from page 0). and 128-256 (from
> page 0). Driver is not capable of reading over 256 bytes currently.

Hi Erin

There should not be any need to read more than 256 bytes. For older
SFP devices, two addresses on the i2c bus are used, each with 256
bytes. For QSFP, one address is used, and you swap page by writing to
offset 127.

> looking on qsfp.c parser in ethtool.c (user space), I see an uninitialized
> bug issue that have caused bug #1 + #2.
> Applied it locally solved the issue (Not showing alarm data, which should be
> expected as driver do not fill it).

There appears to be a second bug somewhere. dumping the module info
using HEX returned 256 bytes. But the binary dump had more bytes.
Since you have the hardware, could you look into this?

Thanks
	Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-27 14:52     ` Andrew Lunn
@ 2018-09-27 15:20       ` Eran Ben Elisha
  2018-09-27 15:32         ` Andrew Lunn
  0 siblings, 1 reply; 16+ messages in thread
From: Eran Ben Elisha @ 2018-09-27 15:20 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Neil Horman, Chris Preimesberger, linville@tuxdriver.com,
	netdev@vger.kernel.org



On 9/27/2018 5:52 PM, Andrew Lunn wrote:
>> Both drivers read up to 256 bytes. 0-127 (from page 0). and 128-256 (from
>> page 0). Driver is not capable of reading over 256 bytes currently.
> 
> Hi Erin
> 
> There should not be any need to read more than 256 bytes. For older
> SFP devices, two addresses on the i2c bus are used, each with 256
> bytes. For QSFP, one address is used, and you swap page by writing to
> offset 127.
> 
>> looking on qsfp.c parser in ethtool.c (user space), I see an uninitialized
>> bug issue that have caused bug #1 + #2.
>> Applied it locally solved the issue (Not showing alarm data, which should be
>> expected as driver do not fill it).
> 
> There appears to be a second bug somewhere. dumping the module info
> using HEX returned 256 bytes. But the binary dump had more bytes.
> Since you have the hardware, could you look into this?

See fix I posted few minutes ago.
title: [PATCH ethtool] ethtool: Fix uninitialized variable use at qsfp dump

This is HEX dump, similar for both with/without the fix:
Offset          Values
------          ------
0x0000:         11 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0010:         00 00 00 00 00 00 2a 2a 00 00 7f 0b 00 00 00 00
0x0020:         00 00 38 b6 3e 50 2b e9 40 0d 47 0d 47 ac 48 58
0x0030:         49 0f 3a 09 36 77 39 c9 3a 6a 00 00 00 00 00 00
0x0040:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0050:         00 00 00 00 00 00 00 aa aa 00 00 00 00 01 00 00
0x0060:         00 00 ff 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0070:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0080:         11 cc 07 80 00 00 00 00 00 00 00 05 ff 00 0a 00
0x0090:         00 00 00 44 4d 65 6c 6c 61 6e 6f 78 20 20 20 20
0x00a0:         20 20 20 20 00 00 02 c9 4d 4d 41 31 4c 31 30 2d
0x00b0:         43 52 20 20 20 20 20 20 41 31 65 bf 00 ce 00 60
0x00c0:         03 07 ff de 4d 54 31 36 33 39 44 4d 30 30 30 32
0x00d0:         36 20 20 20 31 36 30 39 32 36 20 20 0c 10 68 40
0x00e0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00f0:         00 00 00 00 00 00 00 00 00 00 00 00 14 31 00 00

This is parsed output before the fix:
         Identifier                                : 0x11 (QSFP28)
         Extended identifier                       : 0xcc
         Extended identifier description           : 3.5W max. Power 
consumption
         Extended identifier description           : CDR present in TX, 
CDR present in RX
         Extended identifier description           : High Power Class (> 
3.5 W) not enabled
         Connector                                 : 0x07 (LC)
         Transceiver codes                         : 0x80 0x00 0x00 0x00 
0x00 0x00 0x00 0x00
         Transceiver type                          : 100G Ethernet: 100G 
Base-LR4
         Encoding                                  : 0x05 (64B/66B)
         BR, Nominal                               : 25500Mbps
         Rate identifier                           : 0x00
         Length (SMF,km)                           : 10km
         Length (OM3 50um)                         : 0m
         Length (OM2 50um)                         : 0m
         Length (OM1 62.5um)                       : 0m
         Length (Copper or Active cable)           : 0m
         Transmitter technology                    : 0x40 (1310 nm DFB)
         Laser wavelength                          : 1302.350nm
         Laser wavelength tolerance                : 1.030nm
         Vendor name                               : Mellanox
         Vendor OUI                                : 00:02:c9
         Vendor PN                                 : MMA1L10-CR
         Vendor rev                                : A1
         Vendor SN                                 : MT1639DM00026
         Date code                                 : 160926
         Revision Compliance                       : SFF-8636 Rev 
2.5/2.6/2.7
         Module temperature                        : 42.16 degrees C / 
107.90 degrees F
         Module voltage                            : 3.2523 V
         Alarm/warning flags implemented           : Yes
         Laser tx bias current (Channel 1)         : 36.454 mA
         Laser tx bias current (Channel 2)         : 36.696 mA
         Laser tx bias current (Channel 3)         : 37.006 mA
         Laser tx bias current (Channel 4)         : 37.404 mA
         Transmit avg optical power (Channel 1)    : 1.4812 mW / 1.71 dBm
         Transmit avg optical power (Channel 2)    : 1.3942 mW / 1.44 dBm
         Transmit avg optical power (Channel 3)    : 1.4793 mW / 1.70 dBm
         Transmit avg optical power (Channel 4)    : 1.4949 mW / 1.75 dBm
         Rcvr signal avg optical power(Channel 1)  : 1.4489 mW / 1.61 dBm
         Rcvr signal avg optical power(Channel 2)  : 1.5911 mW / 2.02 dBm
         Rcvr signal avg optical power(Channel 3)  : 1.1196 mW / 0.49 dBm
         Rcvr signal avg optical power(Channel 4)  : 1.6397 mW / 2.15 dBm
         Laser bias current high alarm   (Chan 1)  : Off
         Laser bias current low alarm    (Chan 1)  : Off
         Laser bias current high warning (Chan 1)  : Off
         Laser bias current low warning  (Chan 1)  : Off
         Laser bias current high alarm   (Chan 2)  : Off
         Laser bias current low alarm    (Chan 2)  : Off
         Laser bias current high warning (Chan 2)  : Off
         Laser bias current low warning  (Chan 2)  : Off
         Laser bias current high alarm   (Chan 3)  : Off
         Laser bias current low alarm    (Chan 3)  : Off
         Laser bias current high warning (Chan 3)  : Off
         Laser bias current low warning  (Chan 3)  : Off
         Laser bias current high alarm   (Chan 4)  : Off
         Laser bias current low alarm    (Chan 4)  : Off
         Laser bias current high warning (Chan 4)  : Off
         Laser bias current low warning  (Chan 4)  : Off
         Module temperature high alarm             : Off
         Module temperature low alarm              : Off
         Module temperature high warning           : Off
         Module temperature low warning            : Off
         Module voltage high alarm                 : Off
         Module voltage low alarm                  : Off
         Module voltage high warning               : Off
         Module voltage low warning                : Off
         Laser tx power high alarm   (Channel 1)   : Off
         Laser tx power low alarm    (Channel 1)   : Off
         Laser tx power high warning (Channel 1)   : Off
         Laser tx power low warning  (Channel 1)   : Off
         Laser tx power high alarm   (Channel 2)   : Off
         Laser tx power low alarm    (Channel 2)   : Off
         Laser tx power high warning (Channel 2)   : Off
         Laser tx power low warning  (Channel 2)   : Off
         Laser tx power high alarm   (Channel 3)   : Off
         Laser tx power low alarm    (Channel 3)   : Off
         Laser tx power high warning (Channel 3)   : Off
         Laser tx power low warning  (Channel 3)   : Off
         Laser tx power high alarm   (Channel 4)   : Off
         Laser tx power low alarm    (Channel 4)   : Off
         Laser tx power high warning (Channel 4)   : Off
         Laser tx power low warning  (Channel 4)   : Off
         Laser rx power high alarm   (Channel 1)   : Off
         Laser rx power low alarm    (Channel 1)   : Off
         Laser rx power high warning (Channel 1)   : Off
         Laser rx power low warning  (Channel 1)   : Off
         Laser rx power high alarm   (Channel 2)   : Off
         Laser rx power low alarm    (Channel 2)   : Off
         Laser rx power high warning (Channel 2)   : Off
         Laser rx power low warning  (Channel 2)   : Off
         Laser rx power high alarm   (Channel 3)   : Off
         Laser rx power low alarm    (Channel 3)   : Off
         Laser rx power high warning (Channel 3)   : Off
         Laser rx power low warning  (Channel 3)   : Off
         Laser rx power high alarm   (Channel 4)   : Off
         Laser rx power low alarm    (Channel 4)   : Off
         Laser rx power high warning (Channel 4)   : Off
         Laser rx power low warning  (Channel 4)   : Off
         Laser bias current high alarm threshold   : 0.000 mA
         Laser bias current low alarm threshold    : 0.000 mA
         Laser bias current high warning threshold : 0.000 mA
         Laser bias current low warning threshold  : 0.000 mA
         Laser output power high alarm threshold   : 0.0000 mW / -inf dBm
         Laser output power low alarm threshold    : 0.0000 mW / -inf dBm
         Laser output power high warning threshold : 0.0000 mW / -inf dBm
         Laser output power low warning threshold  : 0.0000 mW / -inf dBm
         Module temperature high alarm threshold   : 0.00 degrees C / 
32.00 degrees F
         Module temperature low alarm threshold    : 0.00 degrees C / 
32.00 degrees F
         Module temperature high warning threshold : 0.00 degrees C / 
32.00 degrees F
         Module temperature low warning threshold  : 0.00 degrees C / 
32.00 degrees F
         Module voltage high alarm threshold       : 0.0000 V
         Module voltage low alarm threshold        : 0.0000 V
         Module voltage high warning threshold     : 0.0000 V
         Module voltage low warning threshold      : 0.0000 V
         Laser rx power high alarm threshold       : 0.0000 mW / -inf dBm
         Laser rx power low alarm threshold        : 0.0000 mW / -inf dBm
         Laser rx power high warning threshold     : 0.0000 mW / -inf dBm
         Laser rx power low warning threshold      : 0.0000 mW / -inf dBm




This is parsed output after the fix:
         Identifier                                : 0x11 (QSFP28)
         Extended identifier                       : 0xcc
         Extended identifier description           : 3.5W max. Power 
consumption
         Extended identifier description           : CDR present in TX, 
CDR present in RX
         Extended identifier description           : High Power Class (> 
3.5 W) not enabled
         Connector                                 : 0x07 (LC)
         Transceiver codes                         : 0x80 0x00 0x00 0x00 
0x00 0x00 0x00 0x00
         Transceiver type                          : 100G Ethernet: 100G 
Base-LR4
         Encoding                                  : 0x05 (64B/66B)
         BR, Nominal                               : 25500Mbps
         Rate identifier                           : 0x00
         Length (SMF,km)                           : 10km
         Length (OM3 50um)                         : 0m
         Length (OM2 50um)                         : 0m
         Length (OM1 62.5um)                       : 0m
         Length (Copper or Active cable)           : 0m
         Transmitter technology                    : 0x40 (1310 nm DFB)
         Laser wavelength                          : 1302.350nm
         Laser wavelength tolerance                : 1.030nm
         Vendor name                               : Mellanox
         Vendor OUI                                : 00:02:c9
         Vendor PN                                 : MMA1L10-CR
         Vendor rev                                : A1
         Vendor SN                                 : MT1639DM00026
         Date code                                 : 160926
         Revision Compliance                       : SFF-8636 Rev 
2.5/2.6/2.7
         Module temperature                        : 42.16 degrees C / 
107.90 degrees F
         Module voltage                            : 3.2523 V
         Alarm/warning flags implemented           : No
         Laser tx bias current (Channel 1)         : 36.462 mA
         Laser tx bias current (Channel 2)         : 36.668 mA
         Laser tx bias current (Channel 3)         : 37.000 mA
         Laser tx bias current (Channel 4)         : 37.416 mA
         Transmit avg optical power (Channel 1)    : 1.4812 mW / 1.71 dBm
         Transmit avg optical power (Channel 2)    : 1.3940 mW / 1.44 dBm
         Transmit avg optical power (Channel 3)    : 1.4829 mW / 1.71 dBm
         Transmit avg optical power (Channel 4)    : 1.4866 mW / 1.72 dBm
         Rcvr signal avg optical power(Channel 1)  : 1.4518 mW / 1.62 dBm
         Rcvr signal avg optical power(Channel 2)  : 1.5938 mW / 2.02 dBm
         Rcvr signal avg optical power(Channel 3)  : 1.1211 mW / 0.50 dBm
         Rcvr signal avg optical power(Channel 4)  : 1.6378 mW / 2.14 dBm




Major diff:
* Alarm/warning flags implemented           : No
* All alarm data is not presented.

Driver return 256 bytes (reading it correctly, I verified it, no 
overruns), however the extra bytes are presented due to this bug 
(expecting to parse 640 bytes).

Do you see another bug here? Am I missing something?

Eran


> 
> Thanks
> 	Andrew
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-27 15:20       ` Eran Ben Elisha
@ 2018-09-27 15:32         ` Andrew Lunn
  2018-09-27 16:08           ` Chris Preimesberger
  2018-10-02  7:10           ` Eran Ben Elisha
  0 siblings, 2 replies; 16+ messages in thread
From: Andrew Lunn @ 2018-09-27 15:32 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: Neil Horman, Chris Preimesberger, linville@tuxdriver.com,
	netdev@vger.kernel.org

> Driver return 256 bytes (reading it correctly, I verified it, no overruns),
> however the extra bytes are presented due to this bug (expecting to parse
> 640 bytes).
> 
> Do you see another bug here? Am I missing something?

Hi Erin

Please could you try ethtool -m raw on so you get a binary dump.  The
file which Chris provided had more bytes in it than 256.

Thanks
	Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-27 15:32         ` Andrew Lunn
@ 2018-09-27 16:08           ` Chris Preimesberger
  2018-09-27 16:38             ` Andrew Lunn
  2018-10-02  7:10           ` Eran Ben Elisha
  1 sibling, 1 reply; 16+ messages in thread
From: Chris Preimesberger @ 2018-09-27 16:08 UTC (permalink / raw)
  To: Andrew Lunn, Eran Ben Elisha
  Cc: Neil Horman, linville@tuxdriver.com, netdev@vger.kernel.org

Please correct me if I'm wrong, but...
It looks like Eran's proposed fix would remove all warning and
alarm indications from ethtool's output. It's worth mentioning
that for me, the following fields always reported correctly
as Off while no alarm condition was present
and On while alarm condition(s) were present
*per the QSFP's true/programmed threshold values*
*not per the incorrectly reported threshold values*

         Laser bias current high alarm   (Chan 1)  : Off
         Laser bias current low alarm    (Chan 1)  : Off
         Laser bias current high warning (Chan 1)  : Off
         Laser bias current low warning  (Chan 1)  : Off
         Laser bias current high alarm   (Chan 2)  : Off
         Laser bias current low alarm    (Chan 2)  : Off
         Laser bias current high warning (Chan 2)  : Off
         Laser bias current low warning  (Chan 2)  : Off
         Laser bias current high alarm   (Chan 3)  : Off
         Laser bias current low alarm    (Chan 3)  : Off
         Laser bias current high warning (Chan 3)  : Off
         Laser bias current low warning  (Chan 3)  : Off
         Laser bias current high alarm   (Chan 4)  : Off
         Laser bias current low alarm    (Chan 4)  : Off
         Laser bias current high warning (Chan 4)  : Off
         Laser bias current low warning  (Chan 4)  : Off
         Module temperature high alarm             : Off
         Module temperature low alarm              : Off
         Module temperature high warning           : Off
         Module temperature low warning            : Off
         Module voltage high alarm                 : Off
         Module voltage low alarm                  : Off
         Module voltage high warning               : Off
         Module voltage low warning                : Off
         Laser tx power high alarm   (Channel 1)   : Off
         Laser tx power low alarm    (Channel 1)   : Off
         Laser tx power high warning (Channel 1)   : Off
         Laser tx power low warning  (Channel 1)   : Off
         Laser tx power high alarm   (Channel 2)   : Off
         Laser tx power low alarm    (Channel 2)   : Off
         Laser tx power high warning (Channel 2)   : Off
         Laser tx power low warning  (Channel 2)   : Off
         Laser tx power high alarm   (Channel 3)   : Off
         Laser tx power low alarm    (Channel 3)   : Off
         Laser tx power high warning (Channel 3)   : Off
         Laser tx power low warning  (Channel 3)   : Off
         Laser tx power high alarm   (Channel 4)   : Off
         Laser tx power low alarm    (Channel 4)   : Off
         Laser tx power high warning (Channel 4)   : Off
         Laser tx power low warning  (Channel 4)   : Off
         Laser rx power high alarm   (Channel 1)   : Off
         Laser rx power low alarm    (Channel 1)   : Off
         Laser rx power high warning (Channel 1)   : Off
         Laser rx power low warning  (Channel 1)   : Off
         Laser rx power high alarm   (Channel 2)   : Off
         Laser rx power low alarm    (Channel 2)   : Off
         Laser rx power high warning (Channel 2)   : Off
         Laser rx power low warning  (Channel 2)   : Off
         Laser rx power high alarm   (Channel 3)   : Off
         Laser rx power low alarm    (Channel 3)   : Off
         Laser rx power high warning (Channel 3)   : Off
         Laser rx power low warning  (Channel 3)   : Off
         Laser rx power high alarm   (Channel 4)   : Off
         Laser rx power low alarm    (Channel 4)   : Off
         Laser rx power high warning (Channel 4)   : Off
         Laser rx power low warning  (Channel 4)   : Off


I would like to request that any fix keeps the above information
included in the ethtool -m output because it is working and valuable.

The only values that report incorrectly can be seen
by issuing the command:
ethtool -m interfaceXXX | grep  threshold

Ideally, any fix would display the thresholds correctly
instead of omit them.


Thank you and best regards,
Chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-27 16:08           ` Chris Preimesberger
@ 2018-09-27 16:38             ` Andrew Lunn
  2018-09-27 18:56               ` Chris Preimesberger
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Lunn @ 2018-09-27 16:38 UTC (permalink / raw)
  To: Chris Preimesberger
  Cc: Eran Ben Elisha, Neil Horman, linville@tuxdriver.com,
	netdev@vger.kernel.org

On Thu, Sep 27, 2018 at 04:08:24PM +0000, Chris Preimesberger wrote:
> Please correct me if I'm wrong, but...
> It looks like Eran's proposed fix would remove all warning and
> alarm indications from ethtool's output. It's worth mentioning
> that for me, the following fields always reported correctly
> as Off while no alarm condition was present
> and On while alarm condition(s) were present
> *per the QSFP's true/programmed threshold values*
> *not per the incorrectly reported threshold values*

These alarm values are in the first page. So the information the
driver returns does contain this information. What is missing is the
thresholds, which are not provided by the driver.

But there is a comment in the code:

        /*
         * There is no clear identifier to signify the existence of
         * optical diagnostics similar to SFF-8472. So checking existence
         * of page 3, will provide the gurantee for existence of alarms
         * and thresholds
         * If pagging support exists, then supports_alarms is marked as 1
         */

These alarm values are optional. The spec says so. So in order to
decide if they are implemented, ethtool looks to see if the thresholds
are available. If there are thresholds, it makes sense the alarms are
implemented.

Unfortunately, the driver never returns the thresholds. So ethtool has
no real choice and won't display the alarms since it cannot determine
if they are valid.

In order to get alarms, the driver needs to be extended to return all
the pages.

    Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-27 16:38             ` Andrew Lunn
@ 2018-09-27 18:56               ` Chris Preimesberger
  2018-09-27 20:17                 ` Chris Preimesberger
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Preimesberger @ 2018-09-27 18:56 UTC (permalink / raw)
  To: 'Andrew Lunn', Eran Ben Elisha
  Cc: Neil Horman, linville@tuxdriver.com, netdev@vger.kernel.org

I greatly appreciate everyone's work on this.  Thank you to all.

I've had Mellanox support case # 00508027 open for this issue,
and just now requested an updated driver from them to resolve,
explaining that really smart ethtool developers figured out this
was due to the Mellanox driver not reporting thresholds to ethtool.

I intend to post back here for posterity if/when I get an updated
driver that fixes the issue.

Thanks again!!


Chris Preimesberger | Test & Validation Engineer
Transition Networks, Inc.

chrisp@transition.com
direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com


-----Original Message-----
From: Andrew Lunn [mailto:andrew@lunn.ch] 
Sent: Thursday, September 27, 2018 11:38 AM
To: Chris Preimesberger
Cc: Eran Ben Elisha; Neil Horman; linville@tuxdriver.com; netdev@vger.kernel.org
Subject: Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers

On Thu, Sep 27, 2018 at 04:08:24PM +0000, Chris Preimesberger wrote:
> Please correct me if I'm wrong, but...
> It looks like Eran's proposed fix would remove all warning and alarm 
> indications from ethtool's output. It's worth mentioning that for me, 
> the following fields always reported correctly as Off while no alarm 
> condition was present and On while alarm condition(s) were present 
> *per the QSFP's true/programmed threshold values* *not per the 
> incorrectly reported threshold values*

These alarm values are in the first page. So the information the driver returns does contain this information. What is missing is the thresholds, which are not provided by the driver.

But there is a comment in the code:

        /*
         * There is no clear identifier to signify the existence of
         * optical diagnostics similar to SFF-8472. So checking existence
         * of page 3, will provide the gurantee for existence of alarms
         * and thresholds
         * If pagging support exists, then supports_alarms is marked as 1
         */

These alarm values are optional. The spec says so. So in order to decide if they are implemented, ethtool looks to see if the thresholds are available. If there are thresholds, it makes sense the alarms are implemented.

Unfortunately, the driver never returns the thresholds. So ethtool has no real choice and won't display the alarms since it cannot determine if they are valid.

In order to get alarms, the driver needs to be extended to return all the pages.

    Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-27 18:56               ` Chris Preimesberger
@ 2018-09-27 20:17                 ` Chris Preimesberger
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Preimesberger @ 2018-09-27 20:17 UTC (permalink / raw)
  To: 'Andrew Lunn', 'Eran Ben Elisha'
  Cc: 'Neil Horman', 'linville@tuxdriver.com',
	'netdev@vger.kernel.org'

Update for posterity-

Mellanox support provided a work-around of using mlxcables instead of
ethtool to read alarm/warning info for an installed transceiver.

I was told that a couple of their engineers are currently looking into the
discrepancy between threshold reporting by mlxcables and ethtool, and
that they are deciding what to do about it...

Work-around steps:
1. add a cable with "sudo mst cable add".
2. find the cable name with "sudo mlxcables".  The name of my cable is
   01:00.0_cable_0 so I copy that name for insertion into the next command.
3. probe the cable for DDM with "sudo mlxcables -d 01:00.0_cable_0 --DDM".


Example copied/pasted from my CLI here.
All reported thresholds appear to be correct.

tech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ sudo mst cable add
-I- Added 1 cable devices ..
tech1@D7:~$ sudo mlxcables
Querying Cables ....

Cable #1:
---------
Cable name    : 01:00.0_cable_0
>> No FW data to show
-------- Cable EEPROM --------
Identifier    : QSFP28 (11h)
Technology    : 850 nm VCSEL (00h)
Compliance    : Extended Specification Compliance is valid, 100GBASE-SR4 or 25GBASE-SR
Wavelength    : 850 nm
OUI           : 0x00c0f2
Vendor        : TRANSITION      
Serial number : TN02000263      
Part number   : TN-QSFP-100G-SR4
Revision      : 02
Temperature   : 34 C
Length        : 50 m

tech1@D7:~$ sudo mlxcables -d 01:00.0_cable_0 --DDM
Cable DDM:
----------
Temperature    : 34C
Voltage        : 3.2918V
Channel 1:
	RX Power : 0.1695dBm
	TX Power : 0.8622dBm
	TX Bias  : 7.0720mA
Channel 2:
	RX Power : 0.1355dBm
	TX Power : 1.1042dBm
	TX Bias  : 6.9240mA
Channel 3:
	RX Power : -0.1592dBm
	TX Power : 0.6547dBm
	TX Bias  : 6.9420mA
Channel 4:
	RX Power : -0.1300dBm
	TX Power : 0.4653dBm
	TX Bias  : 6.9120mA
----- Thresholds -----
Temperature:
	High Warning  : 70C
	Low  Warning  : 0C
	High Alarm    : 75C
	Low  Alarm    : -5C
	Warning mask  : 0
	Alarm mask    : 0
Voltage:
	High Warning : 3.4600V
	Low  Warning : 3.1300V
	High Alarm   : 3.6300V
	Low  Alarm   : 2.9700V
	Warning mask : 0
	Alarm mask   : 0
Channel 1:
	RX Power high warn   : 2.4000dBm
	RX Power low  warn   : -9.5001dBm
	RX Power high alarm  : 5.4103dBm
	RX Power low  alarm  : -12.5104dBm
	RX Power Warning mask: 0
	RX Power Alarm mask  : 0
	TX Power high warn   : 2.4000dBm
	TX Power low  warn   : -7.6020dBm
	TX Power high alarm  : 3.1917dBm
	TX Power low  alarm  : -8.5699dBm
	TX Power Warning mask: 0
	TX Power Alarm mask  : 0
	TX Bias high warn    : 12.0000mA
	TX Bias low  warn    : 2.0000mA
	TX Bias high alarm   : 15.0000mA
	TX Bias low  alarm   : 1.0000mA
	TX Bias Warning mask : 0
	TX Bias Alarm mask   : 0
Channel 2:
	RX Power high warn   : 2.4000dBm
	RX Power low  warn   : -9.5001dBm
	RX Power high alarm  : 5.4103dBm
	RX Power low  alarm  : -12.5104dBm
	RX Power Warning mask: 0
	RX Power Alarm mask  : 0
	TX Power high warn   : 2.4000dBm
	TX Power low  warn   : -7.6020dBm
	TX Power high alarm  : 3.1917dBm
	TX Power low  alarm  : -8.5699dBm
	TX Power Warning mask: 0
	TX Power Alarm mask  : 0
	TX Bias high warn    : 12.0000mA
	TX Bias low  warn    : 2.0000mA
	TX Bias high alarm   : 15.0000mA
	TX Bias low  alarm   : 1.0000mA
	TX Bias Warning mask : 0
	TX Bias Alarm mask   : 0
Channel 3:
	RX Power high warn   : 2.4000dBm
	RX Power low  warn   : -9.5001dBm
	RX Power high alarm  : 5.4103dBm
	RX Power low  alarm  : -12.5104dBm
	RX Power Warning mask: 0
	RX Power Alarm mask  : 0
	TX Power high warn   : 2.4000dBm
	TX Power low  warn   : -7.6020dBm
	TX Power high alarm  : 3.1917dBm
	TX Power low  alarm  : -8.5699dBm
	TX Power Warning mask: 0
	TX Power Alarm mask  : 0
	TX Bias high warn    : 12.0000mA
	TX Bias low  warn    : 2.0000mA
	TX Bias high alarm   : 15.0000mA
	TX Bias low  alarm   : 1.0000mA
	TX Bias Warning mask : 0
	TX Bias Alarm mask   : 0
Channel 4:
	RX Power high warn   : 2.4000dBm
	RX Power low  warn   : -9.5001dBm
	RX Power high alarm  : 5.4103dBm
	RX Power low  alarm  : -12.5104dBm
	RX Power Warning mask: 0
	RX Power Alarm mask  : 0
	TX Power high warn   : 2.4000dBm
	TX Power low  warn   : -7.6020dBm
	TX Power high alarm  : 3.1917dBm
	TX Power low  alarm  : -8.5699dBm
	TX Power Warning mask: 0
	TX Power Alarm mask  : 0
	TX Bias high warn    : 12.0000mA
	TX Bias low  warn    : 2.0000mA
	TX Bias high alarm   : 15.0000mA
	TX Bias low  alarm   : 1.0000mA
	TX Bias Warning mask : 0
	TX Bias Alarm mask   : 0
tech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ 



Chris Preimesberger

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
  2018-09-27 15:32         ` Andrew Lunn
  2018-09-27 16:08           ` Chris Preimesberger
@ 2018-10-02  7:10           ` Eran Ben Elisha
  1 sibling, 0 replies; 16+ messages in thread
From: Eran Ben Elisha @ 2018-10-02  7:10 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Eran Ben Elisha, nhorman, chrisp, John W. Linville,
	Linux Netdev List

On Thu, Sep 27, 2018 at 6:34 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > Driver return 256 bytes (reading it correctly, I verified it, no overruns),
> > however the extra bytes are presented due to this bug (expecting to parse
> > 640 bytes).
> >
> > Do you see another bug here? Am I missing something?
>
> Hi Erin
Eran...
>
> Please could you try ethtool -m raw on so you get a binary dump.  The
> file which Chris provided had more bytes in it than 256.

I ran '-m raw on' on QSFP28.
File size is 256 bytes.
(with and without my suggested patch...)

Eran

>
> Thanks
>         Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-10-02 13:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-26 19:29 bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers Chris Preimesberger
2018-09-26 19:44 ` Andrew Lunn
2018-09-26 20:47   ` Chris Preimesberger
2018-09-26 21:46     ` Andrew Lunn
2018-09-26 21:34 ` Neil Horman
2018-09-26 21:58   ` Andrew Lunn
2018-09-27 13:23     ` Neil Horman
2018-09-27 13:25   ` Eran Ben Elisha
2018-09-27 14:52     ` Andrew Lunn
2018-09-27 15:20       ` Eran Ben Elisha
2018-09-27 15:32         ` Andrew Lunn
2018-09-27 16:08           ` Chris Preimesberger
2018-09-27 16:38             ` Andrew Lunn
2018-09-27 18:56               ` Chris Preimesberger
2018-09-27 20:17                 ` Chris Preimesberger
2018-10-02  7:10           ` Eran Ben Elisha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).