* RE: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
@ 2018-09-26 19:29 Chris Preimesberger
2018-09-26 19:44 ` Andrew Lunn
2018-09-26 21:34 ` Neil Horman
0 siblings, 2 replies; 16+ messages in thread
From: Chris Preimesberger @ 2018-09-26 19:29 UTC (permalink / raw)
To: linville@tuxdriver.com, netdev@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 24434 bytes --]
Hello,
I'm re-sending in plain text per the auto-reply from a spam filter. I have attached some text files this time, which explain the situation below, in case the below email's font & formatting is now too messed up for easy comprehension.
Thank you and best regards.
Chris Preimesberger | Test & Validation Engineer
Transition Networks, Inc.
chrisp@transition.com
direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com
________________________________________
From: Chris Preimesberger
Sent: Wednesday, September 26, 2018 2:14 PM
To: 'linville@tuxdriver.com'; 'netdev@vger.kernel.org'
Subject: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
Hello John, All,
I think I may have found a bug or two in ethtool, with respect to its reporting of a QSFP28 transceiver's diagnostic information. Ethtool seems to correctly report all diagnostic information about QSFP28 transceivers, except for the transceiver's warning and alarm thresholds. I'm not sure whether the spurious warning and alarm values that get reported are the fault of ethtool or my NIC/driver, and I have no other models of 100GbE NICs to test with. I've contacted Mellanox support about this, and they point the finger at ethtool. Can these issues be investigated by ethtool developers? Here is some background information about the equipment and software used when I observe these issues:
Equipment used:
NIC: Mellanox ConnectX-4 100GbE, part number MCX415A-CCAT
Transceiver: Any 40Gb or 100Gb QSFP28 transceiver installed in the NIC (Intel, Mellanox, Transition Networks, etc..)
Software used:
Ubuntu 18.04 with the distro's packaged NIC driver and ethtool v4.15
also tested were ethtool v4.18 compiled from source and the current Mellanox OFED driver.
All test scenarios produced the same bugs.
Bug #1. Ethtool's reporting of the installed transceiver's alarm and warning thresholds will differ, depending on whether or not ethtool is piped to another command. Example commands are below, with their respective differing output values highlighted:
tech1@D8:~$ sudo ethtool -m enp1s0
Identifier : 0x11 (QSFP28)
Extended identifier : 0xfc
Extended identifier description : 3.5W max. Power consumption
Extended identifier description : CDR present in TX, CDR present in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Connector : 0x07 (LC)
Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 100G Ethernet: 100G CWDM4 MSA with FEC
Encoding : 0x03 (NRZ)
BR, Nominal : 25500Mbps
Rate identifier : 0x00
Length (SMF,km) : 2km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 0m
Transmitter technology : 0x40 (1310 nm DFB)
Laser wavelength : 1310.000nm
Laser wavelength tolerance : 47.500nm
Vendor name : TRANSITION
Vendor OUI : 00:c0:f2
Vendor PN : TNQSFP100GCWDM4
Vendor rev : 1A
Vendor SN : TN02000302
Date code : 180919
Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7
Module temperature : 39.53 degrees C / 103.15 degrees F
Module voltage : 3.3241 V
Alarm/warning flags implemented : Yes
Laser tx bias current (Channel 1) : 34.432 mA
Laser tx bias current (Channel 2) : 34.432 mA
Laser tx bias current (Channel 3) : 33.408 mA
Laser tx bias current (Channel 4) : 33.920 mA
Transmit avg optical power (Channel 1) : 0.9048 mW / -0.43 dBm
Transmit avg optical power (Channel 2) : 0.7832 mW / -1.06 dBm
Transmit avg optical power (Channel 3) : 0.8057 mW / -0.94 dBm
Transmit avg optical power (Channel 4) : 0.7014 mW / -1.54 dBm
Rcvr signal avg optical power(Channel 1) : 0.7378 mW / -1.32 dBm
Rcvr signal avg optical power(Channel 2) : 0.7553 mW / -1.22 dBm
Rcvr signal avg optical power(Channel 3) : 0.6529 mW / -1.85 dBm
Rcvr signal avg optical power(Channel 4) : 0.6847 mW / -1.64 dBm
Laser bias current high alarm (Chan 1) : Off
Laser bias current low alarm (Chan 1) : Off
Laser bias current high warning (Chan 1) : Off
Laser bias current low warning (Chan 1) : Off
Laser bias current high alarm (Chan 2) : Off
Laser bias current low alarm (Chan 2) : Off
Laser bias current high warning (Chan 2) : Off
Laser bias current low warning (Chan 2) : Off
Laser bias current high alarm (Chan 3) : Off
Laser bias current low alarm (Chan 3) : Off
Laser bias current high warning (Chan 3) : Off
Laser bias current low warning (Chan 3) : Off
Laser bias current high alarm (Chan 4) : Off
Laser bias current low alarm (Chan 4) : Off
Laser bias current high warning (Chan 4) : Off
Laser bias current low warning (Chan 4) : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser tx power high alarm (Channel 1) : Off
Laser tx power low alarm (Channel 1) : Off
Laser tx power high warning (Channel 1) : Off
Laser tx power low warning (Channel 1) : Off
Laser tx power high alarm (Channel 2) : Off
Laser tx power low alarm (Channel 2) : Off
Laser tx power high warning (Channel 2) : Off
Laser tx power low warning (Channel 2) : Off
Laser tx power high alarm (Channel 3) : Off
Laser tx power low alarm (Channel 3) : Off
Laser tx power high warning (Channel 3) : Off
Laser tx power low warning (Channel 3) : Off
Laser tx power high alarm (Channel 4) : Off
Laser tx power low alarm (Channel 4) : Off
Laser tx power high warning (Channel 4) : Off
Laser tx power low warning (Channel 4) : Off
Laser rx power high alarm (Channel 1) : Off
Laser rx power low alarm (Channel 1) : Off
Laser rx power high warning (Channel 1) : Off
Laser rx power low warning (Channel 1) : Off
Laser rx power high alarm (Channel 2) : Off
Laser rx power low alarm (Channel 2) : Off
Laser rx power high warning (Channel 2) : Off
Laser rx power low warning (Channel 2) : Off
Laser rx power high alarm (Channel 3) : Off
Laser rx power low alarm (Channel 3) : Off
Laser rx power high warning (Channel 3) : Off
Laser rx power low warning (Channel 3) : Off
Laser rx power high alarm (Channel 4) : Off
Laser rx power low alarm (Channel 4) : Off
Laser rx power high warning (Channel 4) : Off
Laser rx power low warning (Channel 4) : Off
Laser bias current high alarm threshold : 0.000 mA
Laser bias current low alarm threshold : 0.000 mA
Laser bias current high warning threshold : 0.000 mA
Laser bias current low warning threshold : 0.000 mA
Laser output power high alarm threshold : 0.0000 mW / -inf dBm
Laser output power low alarm threshold : 0.0000 mW / -inf dBm
Laser output power high warning threshold : 0.0000 mW / -inf dBm
Laser output power low warning threshold : 0.0000 mW / -inf dBm
Module temperature high alarm threshold : 0.00 degrees C / 32.00 degrees F
Module temperature low alarm threshold : 0.00 degrees C / 32.00 degrees F
Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F
Module temperature low warning threshold : 0.00 degrees C / 32.00 degrees F
Module voltage high alarm threshold : 0.0000 V
Module voltage low alarm threshold : 0.0000 V
Module voltage high warning threshold : 0.0000 V
Module voltage low warning threshold : 0.0000 V
Laser rx power high alarm threshold : 0.0000 mW / -inf dBm
Laser rx power low alarm threshold : 0.0000 mW / -inf dBm
Laser rx power high warning threshold : 0.0000 mW / -inf dBm
Laser rx power low warning threshold : 0.0000 mW / -inf dBm
tech1@D8:~$ sudo ethtool -m enp1s0 | cat
Identifier : 0x11 (QSFP28)
Extended identifier : 0xfc
Extended identifier description : 3.5W max. Power consumption
Extended identifier description : CDR present in TX, CDR present in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Connector : 0x07 (LC)
Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 100G Ethernet: 100G CWDM4 MSA with FEC
Encoding : 0x03 (NRZ)
BR, Nominal : 25500Mbps
Rate identifier : 0x00
Length (SMF,km) : 2km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 0m
Transmitter technology : 0x40 (1310 nm DFB)
Laser wavelength : 1310.000nm
Laser wavelength tolerance : 47.500nm
Vendor name : TRANSITION
Vendor OUI : 00:c0:f2
Vendor PN : TNQSFP100GCWDM4
Vendor rev : 1A
Vendor SN : TN02000302
Date code : 180919
Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7
Module temperature : 39.53 degrees C / 103.15 degrees F
Module voltage : 3.3249 V
Alarm/warning flags implemented : Yes
Laser tx bias current (Channel 1) : 34.432 mA
Laser tx bias current (Channel 2) : 34.432 mA
Laser tx bias current (Channel 3) : 33.408 mA
Laser tx bias current (Channel 4) : 33.920 mA
Transmit avg optical power (Channel 1) : 0.9043 mW / -0.44 dBm
Transmit avg optical power (Channel 2) : 0.7832 mW / -1.06 dBm
Transmit avg optical power (Channel 3) : 0.8057 mW / -0.94 dBm
Transmit avg optical power (Channel 4) : 0.7009 mW / -1.54 dBm
Rcvr signal avg optical power(Channel 1) : 0.7378 mW / -1.32 dBm
Rcvr signal avg optical power(Channel 2) : 0.7553 mW / -1.22 dBm
Rcvr signal avg optical power(Channel 3) : 0.6529 mW / -1.85 dBm
Rcvr signal avg optical power(Channel 4) : 0.6847 mW / -1.64 dBm
Laser bias current high alarm (Chan 1) : Off
Laser bias current low alarm (Chan 1) : Off
Laser bias current high warning (Chan 1) : Off
Laser bias current low warning (Chan 1) : Off
Laser bias current high alarm (Chan 2) : Off
Laser bias current low alarm (Chan 2) : Off
Laser bias current high warning (Chan 2) : Off
Laser bias current low warning (Chan 2) : Off
Laser bias current high alarm (Chan 3) : Off
Laser bias current low alarm (Chan 3) : Off
Laser bias current high warning (Chan 3) : Off
Laser bias current low warning (Chan 3) : Off
Laser bias current high alarm (Chan 4) : Off
Laser bias current low alarm (Chan 4) : Off
Laser bias current high warning (Chan 4) : Off
Laser bias current low warning (Chan 4) : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser tx power high alarm (Channel 1) : Off
Laser tx power low alarm (Channel 1) : Off
Laser tx power high warning (Channel 1) : Off
Laser tx power low warning (Channel 1) : Off
Laser tx power high alarm (Channel 2) : Off
Laser tx power low alarm (Channel 2) : Off
Laser tx power high warning (Channel 2) : Off
Laser tx power low warning (Channel 2) : Off
Laser tx power high alarm (Channel 3) : Off
Laser tx power low alarm (Channel 3) : Off
Laser tx power high warning (Channel 3) : Off
Laser tx power low warning (Channel 3) : Off
Laser tx power high alarm (Channel 4) : Off
Laser tx power low alarm (Channel 4) : Off
Laser tx power high warning (Channel 4) : Off
Laser tx power low warning (Channel 4) : Off
Laser rx power high alarm (Channel 1) : Off
Laser rx power low alarm (Channel 1) : Off
Laser rx power high warning (Channel 1) : Off
Laser rx power low warning (Channel 1) : Off
Laser rx power high alarm (Channel 2) : Off
Laser rx power low alarm (Channel 2) : Off
Laser rx power high warning (Channel 2) : Off
Laser rx power low warning (Channel 2) : Off
Laser rx power high alarm (Channel 3) : Off
Laser rx power low alarm (Channel 3) : Off
Laser rx power high warning (Channel 3) : Off
Laser rx power low warning (Channel 3) : Off
Laser rx power high alarm (Channel 4) : Off
Laser rx power low alarm (Channel 4) : Off
Laser rx power high warning (Channel 4) : Off
Laser rx power low warning (Channel 4) : Off
Laser bias current high alarm threshold : 16.448 mA
Laser bias current low alarm threshold : 16.448 mA
Laser bias current high warning threshold : 16.448 mA
Laser bias current low warning threshold : 16.448 mA
Laser output power high alarm threshold : 0.8224 mW / -0.85 dBm
Laser output power low alarm threshold : 0.8250 mW / -0.84 dBm
Laser output power high warning threshold : 0.8264 mW / -0.83 dBm
Laser output power low warning threshold : 2.6983 mW / 4.31 dBm
Module temperature high alarm threshold : 110.12 degrees C / 230.22 degrees F
Module temperature low alarm threshold : 84.34 degrees C / 183.82 degrees F
Module temperature high warning threshold : 44.12 degrees C / 111.42 degrees F
Module temperature low warning threshold : 67.27 degrees C / 153.08 degrees F
Module voltage high alarm threshold : 2.9728 V
Module voltage low alarm threshold : 2.6990 V
Module voltage high warning threshold : 0.8274 V
Module voltage low warning threshold : 2.2538 V
Laser rx power high alarm threshold : 2.5458 mW / 4.06 dBm
Laser rx power low alarm threshold : 2.6992 mW / 4.31 dBm
Laser rx power high warning threshold : 2.9801 mW / 4.74 dBm
Laser rx power low warning threshold : 2.8526 mW / 4.55 dBm
Bug # 2. All of the alarm and warning threshold values reported in the above commands are spurious.
At first glance, one would assume that the threshold values reported by the piped ethtool command are correct, but they're not. I know the programmed values for the above transceiver, so that makes it easy for me to spot the spurious values, but even without knowing the programmed values of a given transceiver, one can use logic to detect when the ethtool displayed values don't make sense.
For example, lets scrutinize the values for voltage warnings and alarms reported by ethtool on this transceiver. We will look at each voltage threshold, and scrutinize that value relative to the other voltage thresholds, and look for contradictions to determine whether the reported values seem legit.
Known ethtool
Actual Reported
Values Values
High Voltage Alarm 3.70V 2.9728 V
High Voltage Warning 3.59V 0.8274 V
(Operating spec = 3.30V)
Low Voltage Warning 3.00V 2.2538 V
Low Voltage Alarm 2.90V 2.6990 V
Contradictions for the ethtool reported voltage warning and alarm thresholds:
1. The high voltage alarm should occur at higher voltage than the operating voltage, but ethtool didn't report that.
2. The high voltage warning should occur at higher voltage than the low voltage warning and alarm, but ethtool didn't report that.
3. The low voltage warning should occur at higher voltage than the low voltage alarm, but ethtool didn't report that.
4. The low voltage alarm should occur at a lower voltage than any of the other voltage warnings and alarms, but ethtool didn't report that.
5. The current voltage value was reported as 3.3249V, which should trigger high voltage warning and alarm, according to the reported thresholds, but no warnings or alarms are indicated.
Each of the 4 voltage thresholds reported by ethtool have contradictions, so we know something is not right. This same kind of logic can be applied to the thresholds for temperature, laser TX power, etc.. to find that those values are also spurious.
Installing the above transceiver in a Cisco switch reveals that the Cisco correctly retrieves the true warning and alarm threshold values from the transceiver's EEPROM, so we trust that the transceiver has been correctly programmed. Cisco CLI output for that transceiver shown here:
switch# show interface ethernet 1/3 transceiver details
Ethernet1/3
transceiver is present
type is QSFP-100G-CWDM4-MSA-FEC
name is TRANSITION
part number is TNQSFP100GCWDM4
revision is 1A
serial number is TN02000302
nominal bitrate is 25500 MBit/sec per channel
Link length supported for 9/125um fiber is 2 km
cisco id is 17
cisco extended id number is 252
Lane Number:1 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C
Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V
Current 34.24 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA
Tx Power -0.44 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm
Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
Lane Number:2 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C
Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V
Current 34.24 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA
Tx Power -1.20 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm
Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
Lane Number:3 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C
Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V
Current 33.21 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA
Tx Power -0.96 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm
Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
Lane Number:4 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C
Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V
Current 33.72 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA
Tx Power -1.59 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm
Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
switch#
Any help with these issues is greatly appreciated. If you have any questions or advice, please let me know. I'll be glad to continue troubleshooting this until it's resolved. Thank you.
Chris Preimesberger | Test & Validation Engineer
Transition Networks, Inc.
chrisp@transition.com
direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com
________________________________________
[-- Attachment #2: ethtoolQSFP28thresholdsCiscoComparison.txt --]
[-- Type: text/plain, Size: 4602 bytes --]
For comparison to ethtool's output that shows incorrect threshold values, when installing the same transceiver in a Cisco Nexus switch, and issuing the Cisco command "show interface ethernet 1/3 transceiver details", the switch correctly correctly reads/displays the transceiver's Alarm and Warning thresholds, as shown below:
switch# show interface ethernet 1/3 transceiver details
Ethernet1/3
transceiver is present
type is QSFP-100G-CWDM4-MSA-FEC
name is TRANSITION
part number is TNQSFP100GCWDM4
revision is 1A
serial number is TN02000302
nominal bitrate is 25500 MBit/sec per channel
Link length supported for 9/125um fiber is 2 km
cisco id is 17
cisco extended id number is 252
Lane Number:1 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C
Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V
Current 34.24 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA
Tx Power -0.44 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm
Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
Lane Number:2 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C
Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V
Current 34.24 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA
Tx Power -1.20 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm
Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
Lane Number:3 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C
Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V
Current 33.21 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA
Tx Power -0.96 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm
Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
Lane Number:4 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C
Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V
Current 33.72 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA
Tx Power -1.59 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm
Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
switch#
[-- Attachment #3: ethtoolQSFP28thresholdsExpectedOutput.txt --]
[-- Type: text/plain, Size: 7258 bytes --]
Look at each line in the ethtool output below that includes the word "threshold". This file has been hand-edited to show the threshold values that have been programmed into the transceiver, which should be displayed by ethtool. The threshold values shown below are copied and pasted from the output of the Cisco NX-OS command "show interface ethernet 1/3 transceiver details", while the transceiver was installed in a Cisco Nexus switch.
Note - I only copied the threshold values in the units that were displayed by the Cisco switch. The "?" symbols are just a placeholder for the converted values; I was too lazy to do conversions between dBm and mW, or between degrees C and degrees F. Ethtool would be expected to report the true / converted values.
tech1@D8:~$ sudo ethtool -m enp1s0
Identifier : 0x11 (QSFP28)
Extended identifier : 0xfc
Extended identifier description : 3.5W max. Power consumption
Extended identifier description : CDR present in TX, CDR present in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Connector : 0x07 (LC)
Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 100G Ethernet: 100G CWDM4 MSA with FEC
Encoding : 0x03 (NRZ)
BR, Nominal : 25500Mbps
Rate identifier : 0x00
Length (SMF,km) : 2km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 0m
Transmitter technology : 0x40 (1310 nm DFB)
Laser wavelength : 1310.000nm
Laser wavelength tolerance : 47.500nm
Vendor name : TRANSITION
Vendor OUI : 00:c0:f2
Vendor PN : TNQSFP100GCWDM4
Vendor rev : 1A
Vendor SN : TN02000302
Date code : 180919
Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7
Module temperature : 39.53 degrees C / 103.15 degrees F
Module voltage : 3.3233 V
Alarm/warning flags implemented : Yes
Laser tx bias current (Channel 1) : 34.432 mA
Laser tx bias current (Channel 2) : 34.432 mA
Laser tx bias current (Channel 3) : 33.408 mA
Laser tx bias current (Channel 4) : 33.920 mA
Transmit avg optical power (Channel 1) : 0.9052 mW / -0.43 dBm
Transmit avg optical power (Channel 2) : 0.7832 mW / -1.06 dBm
Transmit avg optical power (Channel 3) : 0.8057 mW / -0.94 dBm
Transmit avg optical power (Channel 4) : 0.7009 mW / -1.54 dBm
Rcvr signal avg optical power(Channel 1) : 0.7378 mW / -1.32 dBm
Rcvr signal avg optical power(Channel 2) : 0.7553 mW / -1.22 dBm
Rcvr signal avg optical power(Channel 3) : 0.6529 mW / -1.85 dBm
Rcvr signal avg optical power(Channel 4) : 0.6948 mW / -1.58 dBm
Laser bias current high alarm (Chan 1) : Off
Laser bias current low alarm (Chan 1) : Off
Laser bias current high warning (Chan 1) : Off
Laser bias current low warning (Chan 1) : Off
Laser bias current high alarm (Chan 2) : Off
Laser bias current low alarm (Chan 2) : Off
Laser bias current high warning (Chan 2) : Off
Laser bias current low warning (Chan 2) : Off
Laser bias current high alarm (Chan 3) : Off
Laser bias current low alarm (Chan 3) : Off
Laser bias current high warning (Chan 3) : Off
Laser bias current low warning (Chan 3) : Off
Laser bias current high alarm (Chan 4) : Off
Laser bias current low alarm (Chan 4) : Off
Laser bias current high warning (Chan 4) : Off
Laser bias current low warning (Chan 4) : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser tx power high alarm (Channel 1) : Off
Laser tx power low alarm (Channel 1) : Off
Laser tx power high warning (Channel 1) : Off
Laser tx power low warning (Channel 1) : Off
Laser tx power high alarm (Channel 2) : Off
Laser tx power low alarm (Channel 2) : Off
Laser tx power high warning (Channel 2) : Off
Laser tx power low warning (Channel 2) : Off
Laser tx power high alarm (Channel 3) : Off
Laser tx power low alarm (Channel 3) : Off
Laser tx power high warning (Channel 3) : Off
Laser tx power low warning (Channel 3) : Off
Laser tx power high alarm (Channel 4) : Off
Laser tx power low alarm (Channel 4) : Off
Laser tx power high warning (Channel 4) : Off
Laser tx power low warning (Channel 4) : Off
Laser rx power high alarm (Channel 1) : Off
Laser rx power low alarm (Channel 1) : Off
Laser rx power high warning (Channel 1) : Off
Laser rx power low warning (Channel 1) : Off
Laser rx power high alarm (Channel 2) : Off
Laser rx power low alarm (Channel 2) : Off
Laser rx power high warning (Channel 2) : Off
Laser rx power low warning (Channel 2) : Off
Laser rx power high alarm (Channel 3) : Off
Laser rx power low alarm (Channel 3) : Off
Laser rx power high warning (Channel 3) : Off
Laser rx power low warning (Channel 3) : Off
Laser rx power high alarm (Channel 4) : Off
Laser rx power low alarm (Channel 4) : Off
Laser rx power high warning (Channel 4) : Off
Laser rx power low warning (Channel 4) : Off
Laser bias current high alarm threshold : 75.000 mA
Laser bias current low alarm threshold : 10.000 mA
Laser bias current high warning threshold : 70.000 mA
Laser bias current low warning threshold : 15.000 mA
Laser output power high alarm threshold : ? mW / 4.49 dBm
Laser output power low alarm threshold : ? mW / -8.50 dBm
Laser output power high warning threshold : ? mW / 3.49 dBm
Laser output power low warning threshold : ? mW / -7.52 dBm
Module temperature high alarm threshold : 80.00 degrees C / ? degrees F
Module temperature low alarm threshold : -10.00 degrees C / ? degrees F
Module temperature high warning threshold : 75.00 degrees C / ? degrees F
Module temperature low warning threshold : -5.00 degrees C / ? degrees F
Module voltage high alarm threshold : 3.7000 V
Module voltage low alarm threshold : 2.9000 V
Module voltage high warning threshold : 3.5900 V
Module voltage low warning threshold : 3.0000 V
Laser rx power high alarm threshold : ? mW / 4.49 dBm
Laser rx power low alarm threshold : ? mW / -14.55 dBm
Laser rx power high warning threshold : ? mW / 3.49 dBm
Laser rx power low warning threshold : ? mW / -12.51 dBm
[-- Attachment #4: ethtoolQSFP28thresholdsSpuriousOutput1of2.txt --]
[-- Type: text/plain, Size: 6843 bytes --]
Look at each line in the ethtool output below that includes the word "threshold". This file shows the actual output from ethtool v4.18, when the output is not piped to another command. Notice that all of the displayed threshold values are 0 (which is incorrect), while other values report as expected.
tech1@D8:~$ sudo ethtool -m enp1s0
Identifier : 0x11 (QSFP28)
Extended identifier : 0xfc
Extended identifier description : 3.5W max. Power consumption
Extended identifier description : CDR present in TX, CDR present in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Connector : 0x07 (LC)
Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 100G Ethernet: 100G CWDM4 MSA with FEC
Encoding : 0x03 (NRZ)
BR, Nominal : 25500Mbps
Rate identifier : 0x00
Length (SMF,km) : 2km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 0m
Transmitter technology : 0x40 (1310 nm DFB)
Laser wavelength : 1310.000nm
Laser wavelength tolerance : 47.500nm
Vendor name : TRANSITION
Vendor OUI : 00:c0:f2
Vendor PN : TNQSFP100GCWDM4
Vendor rev : 1A
Vendor SN : TN02000302
Date code : 180919
Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7
Module temperature : 39.53 degrees C / 103.15 degrees F
Module voltage : 3.3241 V
Alarm/warning flags implemented : Yes
Laser tx bias current (Channel 1) : 34.432 mA
Laser tx bias current (Channel 2) : 34.432 mA
Laser tx bias current (Channel 3) : 33.408 mA
Laser tx bias current (Channel 4) : 33.920 mA
Transmit avg optical power (Channel 1) : 0.9048 mW / -0.43 dBm
Transmit avg optical power (Channel 2) : 0.7832 mW / -1.06 dBm
Transmit avg optical power (Channel 3) : 0.8057 mW / -0.94 dBm
Transmit avg optical power (Channel 4) : 0.7014 mW / -1.54 dBm
Rcvr signal avg optical power(Channel 1) : 0.7378 mW / -1.32 dBm
Rcvr signal avg optical power(Channel 2) : 0.7553 mW / -1.22 dBm
Rcvr signal avg optical power(Channel 3) : 0.6529 mW / -1.85 dBm
Rcvr signal avg optical power(Channel 4) : 0.6847 mW / -1.64 dBm
Laser bias current high alarm (Chan 1) : Off
Laser bias current low alarm (Chan 1) : Off
Laser bias current high warning (Chan 1) : Off
Laser bias current low warning (Chan 1) : Off
Laser bias current high alarm (Chan 2) : Off
Laser bias current low alarm (Chan 2) : Off
Laser bias current high warning (Chan 2) : Off
Laser bias current low warning (Chan 2) : Off
Laser bias current high alarm (Chan 3) : Off
Laser bias current low alarm (Chan 3) : Off
Laser bias current high warning (Chan 3) : Off
Laser bias current low warning (Chan 3) : Off
Laser bias current high alarm (Chan 4) : Off
Laser bias current low alarm (Chan 4) : Off
Laser bias current high warning (Chan 4) : Off
Laser bias current low warning (Chan 4) : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser tx power high alarm (Channel 1) : Off
Laser tx power low alarm (Channel 1) : Off
Laser tx power high warning (Channel 1) : Off
Laser tx power low warning (Channel 1) : Off
Laser tx power high alarm (Channel 2) : Off
Laser tx power low alarm (Channel 2) : Off
Laser tx power high warning (Channel 2) : Off
Laser tx power low warning (Channel 2) : Off
Laser tx power high alarm (Channel 3) : Off
Laser tx power low alarm (Channel 3) : Off
Laser tx power high warning (Channel 3) : Off
Laser tx power low warning (Channel 3) : Off
Laser tx power high alarm (Channel 4) : Off
Laser tx power low alarm (Channel 4) : Off
Laser tx power high warning (Channel 4) : Off
Laser tx power low warning (Channel 4) : Off
Laser rx power high alarm (Channel 1) : Off
Laser rx power low alarm (Channel 1) : Off
Laser rx power high warning (Channel 1) : Off
Laser rx power low warning (Channel 1) : Off
Laser rx power high alarm (Channel 2) : Off
Laser rx power low alarm (Channel 2) : Off
Laser rx power high warning (Channel 2) : Off
Laser rx power low warning (Channel 2) : Off
Laser rx power high alarm (Channel 3) : Off
Laser rx power low alarm (Channel 3) : Off
Laser rx power high warning (Channel 3) : Off
Laser rx power low warning (Channel 3) : Off
Laser rx power high alarm (Channel 4) : Off
Laser rx power low alarm (Channel 4) : Off
Laser rx power high warning (Channel 4) : Off
Laser rx power low warning (Channel 4) : Off
Laser bias current high alarm threshold : 0.000 mA
Laser bias current low alarm threshold : 0.000 mA
Laser bias current high warning threshold : 0.000 mA
Laser bias current low warning threshold : 0.000 mA
Laser output power high alarm threshold : 0.0000 mW / -inf dBm
Laser output power low alarm threshold : 0.0000 mW / -inf dBm
Laser output power high warning threshold : 0.0000 mW / -inf dBm
Laser output power low warning threshold : 0.0000 mW / -inf dBm
Module temperature high alarm threshold : 0.00 degrees C / 32.00 degrees F
Module temperature low alarm threshold : 0.00 degrees C / 32.00 degrees F
Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F
Module temperature low warning threshold : 0.00 degrees C / 32.00 degrees F
Module voltage high alarm threshold : 0.0000 V
Module voltage low alarm threshold : 0.0000 V
Module voltage high warning threshold : 0.0000 V
Module voltage low warning threshold : 0.0000 V
Laser rx power high alarm threshold : 0.0000 mW / -inf dBm
Laser rx power low alarm threshold : 0.0000 mW / -inf dBm
Laser rx power high warning threshold : 0.0000 mW / -inf dBm
Laser rx power low warning threshold : 0.0000 mW / -inf dBm
[-- Attachment #5: ethtoolQSFP28thresholdsSpuriousOutput2of2.txt --]
[-- Type: text/plain, Size: 6866 bytes --]
Look at each line in the ethtool output below that includes the word "threshold". This file shows the actual output from ethtool v4.18, when the ethtool output is piped to another command. Notice that all of the displayed threshold values are spurious while other values report as expected.
tech1@D8:~$ sudo ethtool -m enp1s0 | cat
Identifier : 0x11 (QSFP28)
Extended identifier : 0xfc
Extended identifier description : 3.5W max. Power consumption
Extended identifier description : CDR present in TX, CDR present in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Connector : 0x07 (LC)
Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 100G Ethernet: 100G CWDM4 MSA with FEC
Encoding : 0x03 (NRZ)
BR, Nominal : 25500Mbps
Rate identifier : 0x00
Length (SMF,km) : 2km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 0m
Transmitter technology : 0x40 (1310 nm DFB)
Laser wavelength : 1310.000nm
Laser wavelength tolerance : 47.500nm
Vendor name : TRANSITION
Vendor OUI : 00:c0:f2
Vendor PN : TNQSFP100GCWDM4
Vendor rev : 1A
Vendor SN : TN02000302
Date code : 180919
Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7
Module temperature : 39.53 degrees C / 103.15 degrees F
Module voltage : 3.3249 V
Alarm/warning flags implemented : Yes
Laser tx bias current (Channel 1) : 34.432 mA
Laser tx bias current (Channel 2) : 34.432 mA
Laser tx bias current (Channel 3) : 33.408 mA
Laser tx bias current (Channel 4) : 33.920 mA
Transmit avg optical power (Channel 1) : 0.9043 mW / -0.44 dBm
Transmit avg optical power (Channel 2) : 0.7832 mW / -1.06 dBm
Transmit avg optical power (Channel 3) : 0.8057 mW / -0.94 dBm
Transmit avg optical power (Channel 4) : 0.7009 mW / -1.54 dBm
Rcvr signal avg optical power(Channel 1) : 0.7378 mW / -1.32 dBm
Rcvr signal avg optical power(Channel 2) : 0.7553 mW / -1.22 dBm
Rcvr signal avg optical power(Channel 3) : 0.6529 mW / -1.85 dBm
Rcvr signal avg optical power(Channel 4) : 0.6847 mW / -1.64 dBm
Laser bias current high alarm (Chan 1) : Off
Laser bias current low alarm (Chan 1) : Off
Laser bias current high warning (Chan 1) : Off
Laser bias current low warning (Chan 1) : Off
Laser bias current high alarm (Chan 2) : Off
Laser bias current low alarm (Chan 2) : Off
Laser bias current high warning (Chan 2) : Off
Laser bias current low warning (Chan 2) : Off
Laser bias current high alarm (Chan 3) : Off
Laser bias current low alarm (Chan 3) : Off
Laser bias current high warning (Chan 3) : Off
Laser bias current low warning (Chan 3) : Off
Laser bias current high alarm (Chan 4) : Off
Laser bias current low alarm (Chan 4) : Off
Laser bias current high warning (Chan 4) : Off
Laser bias current low warning (Chan 4) : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser tx power high alarm (Channel 1) : Off
Laser tx power low alarm (Channel 1) : Off
Laser tx power high warning (Channel 1) : Off
Laser tx power low warning (Channel 1) : Off
Laser tx power high alarm (Channel 2) : Off
Laser tx power low alarm (Channel 2) : Off
Laser tx power high warning (Channel 2) : Off
Laser tx power low warning (Channel 2) : Off
Laser tx power high alarm (Channel 3) : Off
Laser tx power low alarm (Channel 3) : Off
Laser tx power high warning (Channel 3) : Off
Laser tx power low warning (Channel 3) : Off
Laser tx power high alarm (Channel 4) : Off
Laser tx power low alarm (Channel 4) : Off
Laser tx power high warning (Channel 4) : Off
Laser tx power low warning (Channel 4) : Off
Laser rx power high alarm (Channel 1) : Off
Laser rx power low alarm (Channel 1) : Off
Laser rx power high warning (Channel 1) : Off
Laser rx power low warning (Channel 1) : Off
Laser rx power high alarm (Channel 2) : Off
Laser rx power low alarm (Channel 2) : Off
Laser rx power high warning (Channel 2) : Off
Laser rx power low warning (Channel 2) : Off
Laser rx power high alarm (Channel 3) : Off
Laser rx power low alarm (Channel 3) : Off
Laser rx power high warning (Channel 3) : Off
Laser rx power low warning (Channel 3) : Off
Laser rx power high alarm (Channel 4) : Off
Laser rx power low alarm (Channel 4) : Off
Laser rx power high warning (Channel 4) : Off
Laser rx power low warning (Channel 4) : Off
Laser bias current high alarm threshold : 16.448 mA
Laser bias current low alarm threshold : 16.448 mA
Laser bias current high warning threshold : 16.448 mA
Laser bias current low warning threshold : 16.448 mA
Laser output power high alarm threshold : 0.8224 mW / -0.85 dBm
Laser output power low alarm threshold : 0.8250 mW / -0.84 dBm
Laser output power high warning threshold : 0.8264 mW / -0.83 dBm
Laser output power low warning threshold : 2.6983 mW / 4.31 dBm
Module temperature high alarm threshold : 110.12 degrees C / 230.22 degrees F
Module temperature low alarm threshold : 84.34 degrees C / 183.82 degrees F
Module temperature high warning threshold : 44.12 degrees C / 111.42 degrees F
Module temperature low warning threshold : 67.27 degrees C / 153.08 degrees F
Module voltage high alarm threshold : 2.9728 V
Module voltage low alarm threshold : 2.6990 V
Module voltage high warning threshold : 0.8274 V
Module voltage low warning threshold : 2.2538 V
Laser rx power high alarm threshold : 2.5458 mW / 4.06 dBm
Laser rx power low alarm threshold : 2.6992 mW / 4.31 dBm
Laser rx power high warning threshold : 2.9801 mW / 4.74 dBm
Laser rx power low warning threshold : 2.8526 mW / 4.55 dBm
tech1@D8:~$
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-26 19:29 bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers Chris Preimesberger @ 2018-09-26 19:44 ` Andrew Lunn 2018-09-26 20:47 ` Chris Preimesberger 2018-09-26 21:34 ` Neil Horman 1 sibling, 1 reply; 16+ messages in thread From: Andrew Lunn @ 2018-09-26 19:44 UTC (permalink / raw) To: Chris Preimesberger; +Cc: linville@tuxdriver.com, netdev@vger.kernel.org On Wed, Sep 26, 2018 at 07:29:23PM +0000, Chris Preimesberger wrote: > Hello, > > I'm re-sending in plain text per the auto-reply from a spam filter. Yep. no html obfustication accepted here. Please ASCII only please :-) Please can you also wrap your lines at about 75 characters. > I have attached some text files this time, which explain the situation below, in case the below email's font & formatting is now too messed up for easy comprehension. > Bug #1. Ethtool's reporting of the installed transceiver's alarm and warning thresholds will differ, depending on whether or not ethtool is piped to another command. Example commands are below, with their respective differing output values highlighted: Could you dump the raw values. That will make it easier for us to reproduce this issue, assuming it is ethtool, and not the kernel driver. Thanks Andrew ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-26 19:44 ` Andrew Lunn @ 2018-09-26 20:47 ` Chris Preimesberger 2018-09-26 21:46 ` Andrew Lunn 0 siblings, 1 reply; 16+ messages in thread From: Chris Preimesberger @ 2018-09-26 20:47 UTC (permalink / raw) To: Andrew Lunn; +Cc: linville@tuxdriver.com, netdev@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 3140 bytes --] Hello Andrew, Thank you for the quick response!! Apologies in advance for my use of outlook and top-posting, etc... I've run the raw option and the hex option, and pasted the results below. Since the raw option printed strange characters on the CLI, I re-ran it, Sending the output to a file (raw.txt) and attached that file as well. Pasted from Ubuntu CLI: tech1@D7:~$ tech1@D7:~$ tech1@D7:~$ tech1@D7:~$ tech1@D7:~$ sudo ethtool -m enp1s0 raw on \x11UU$��pA`?�@�G\x10# �\x12v\x01\x11��\x03�\x02@TRANSITION ��TNQSFP100GCWDM4 1AfX%\x1cF?\x06?�TN02000301 180919 h�\x02I��_��'\x16��Ri=\x02`��Zntech1@D7:~$ tech1@D7:~$ tech1@D7:~$ tech1@D7:~$ tech1@D7:~$ sudo ethtool -m enp1s0 hex on Offset Values ------ ------ 0x0000: 11 00 00 0f 00 00 00 00 00 55 55 00 00 00 00 00 0x0010: 00 00 00 00 00 00 24 e2 00 00 81 68 00 00 00 00 0x0020: 00 00 00 00 00 00 00 00 00 00 41 60 3f e0 40 e0 0x0030: 47 00 1f 10 0e 1e 0b f7 12 76 00 00 00 00 00 00 0x0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 0x0060: 00 00 00 00 00 00 00 00 00 00 1f 00 00 00 00 00 0x0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0080: 11 fc 07 80 00 00 00 00 00 00 00 03 ff 00 02 00 0x0090: 00 00 00 40 54 52 41 4e 53 49 54 49 4f 4e 20 20 0x00a0: 20 20 20 20 00 00 c0 f2 54 4e 51 53 46 50 31 30 0x00b0: 30 47 43 57 44 4d 34 20 31 41 66 58 25 1c 46 3f 0x00c0: 06 00 3f d6 54 4e 30 32 30 30 30 33 30 31 20 20 0x00d0: 20 20 20 20 31 38 30 39 31 39 20 20 0c 00 68 f3 0x00e0: 00 00 02 49 80 a0 5f 1f de c9 27 16 f8 ae 52 69 0x00f0: 3d 02 60 00 00 00 00 00 00 00 00 00 83 f4 5a 6e tech1@D7:~$ tech1@D7:~$ Chris Preimesberger | Test & Validation Engineer Transition Networks, Inc. chrisp@transition.com direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com -----Original Message----- From: Andrew Lunn [mailto:andrew@lunn.ch] Sent: Wednesday, September 26, 2018 2:45 PM To: Chris Preimesberger Cc: linville@tuxdriver.com; netdev@vger.kernel.org Subject: Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers On Wed, Sep 26, 2018 at 07:29:23PM +0000, Chris Preimesberger wrote: > Hello, > > I'm re-sending in plain text per the auto-reply from a spam filter. Yep. no html obfustication accepted here. Please ASCII only please :-) Please can you also wrap your lines at about 75 characters. > I have attached some text files this time, which explain the situation below, in case the below email's font & formatting is now too messed up for easy comprehension. > Bug #1. Ethtool's reporting of the installed transceiver's alarm and warning thresholds will differ, depending on whether or not ethtool is piped to another command. Example commands are below, with their respective differing output values highlighted: Could you dump the raw values. That will make it easier for us to reproduce this issue, assuming it is ethtool, and not the kernel driver. Thanks Andrew [-- Attachment #2: raw.txt --] [-- Type: text/plain, Size: 256 bytes --] \x11\0\0\x0f\0\0\0\0\0UU\0\0\0\0\0\0\0\0\0\0\0$ò\0\0h\0\0\0\0\0\0\0\0\0\0\0\0\0\0A`?à@àG\0\x1f\x10\x0e\x1e\v÷\x12{\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\x1f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x11ü\a\0\0\0\0\0\0\0\x03ÿ\0\x02\0\0\0\0@TRANSITION \0\0ÀòTNQSFP100GCWDM4 1AfX%\x1cF?\x06\0?ÖTN02000301 180919 \f\0hó\0\0\x02I _\x1fÞÉ'\x16ø®Ri=\x02`\0\0\0\0\0\0\0\0\0ôZn ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-26 20:47 ` Chris Preimesberger @ 2018-09-26 21:46 ` Andrew Lunn 0 siblings, 0 replies; 16+ messages in thread From: Andrew Lunn @ 2018-09-26 21:46 UTC (permalink / raw) To: Chris Preimesberger; +Cc: linville@tuxdriver.com, netdev@vger.kernel.org On Wed, Sep 26, 2018 at 08:47:34PM +0000, Chris Preimesberger wrote: > Hello Andrew, > > Thank you for the quick response!! > Apologies in advance for my use of outlook and top-posting, etc... > > I've run the raw option and the hex option, and pasted the results below. > Since the raw option printed strange characters on the CLI, I re-ran it, > Sending the output to a file (raw.txt) and attached that file as well. > > Pasted from Ubuntu CLI: > > tech1@D7:~$ > tech1@D7:~$ > tech1@D7:~$ > tech1@D7:~$ > tech1@D7:~$ sudo ethtool -m enp1s0 raw on > \x11UU$��pA`?�@�G\x10# > �\x12v\x01\x11��\x03�\x02@TRANSITION ��TNQSFP100GCWDM4 1AfX%\x1cF?\x06?�TN02000301 180919 > h�\x02I��_��'\x16��Ri=\x02`��Zntech1@D7:~$ > tech1@D7:~$ > tech1@D7:~$ > tech1@D7:~$ > tech1@D7:~$ sudo ethtool -m enp1s0 hex on > Offset Values > ------ ------ > 0x0000: 11 00 00 0f 00 00 00 00 00 55 55 00 00 00 00 00 > 0x0010: 00 00 00 00 00 00 24 e2 00 00 81 68 00 00 00 00 > 0x0020: 00 00 00 00 00 00 00 00 00 00 41 60 3f e0 40 e0 > 0x0030: 47 00 1f 10 0e 1e 0b f7 12 76 00 00 00 00 00 00 > 0x0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 > 0x0060: 00 00 00 00 00 00 00 00 00 00 1f 00 00 00 00 00 > 0x0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x0080: 11 fc 07 80 00 00 00 00 00 00 00 03 ff 00 02 00 > 0x0090: 00 00 00 40 54 52 41 4e 53 49 54 49 4f 4e 20 20 > 0x00a0: 20 20 20 20 00 00 c0 f2 54 4e 51 53 46 50 31 30 > 0x00b0: 30 47 43 57 44 4d 34 20 31 41 66 58 25 1c 46 3f > 0x00c0: 06 00 3f d6 54 4e 30 32 30 30 30 33 30 31 20 20 > 0x00d0: 20 20 20 20 31 38 30 39 31 39 20 20 0c 00 68 f3 > 0x00e0: 00 00 02 49 80 a0 5f 1f de c9 27 16 f8 ae 52 69 > 0x00f0: 3d 02 60 00 00 00 00 00 00 00 00 00 83 f4 5a 6e Hi Chris I've only recently got involved with SFP modules. ethtool says this is a SFF-8636 SFP. So a QSFP. It has multiple pages, each 128 bytes in length, which should be returned in a concatenated form. Here we see 256 bytes, meaning there are two pages. There can be up to 5 pages. ethtool is looking for the temperature alarms at offset 0x200. So that does not exist in this hex dump. But the raw dump you provided has more bytes, 0x400 of them. So i would say the first bug is that ethtool dumps different amounts of data in hex than raw. The fact you get different alarm thresholds on different runs suggests to me we might only be getting two pages from the kernel? Can you build ethtool from source and run it inside a debugger? ethtool makes two IOCTL calls. The first is ETHTOOL_GMODULEINFO. Could you print out the modinfo which is returned. It then does a ETHTOOL_GMODULEEEPROM. Can you print out eeprom after the second IOCTL. Thanks Andrew ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-26 19:29 bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers Chris Preimesberger 2018-09-26 19:44 ` Andrew Lunn @ 2018-09-26 21:34 ` Neil Horman 2018-09-26 21:58 ` Andrew Lunn 2018-09-27 13:25 ` Eran Ben Elisha 1 sibling, 2 replies; 16+ messages in thread From: Neil Horman @ 2018-09-26 21:34 UTC (permalink / raw) To: Chris Preimesberger; +Cc: linville@tuxdriver.com, netdev@vger.kernel.org On Wed, Sep 26, 2018 at 07:29:23PM +0000, Chris Preimesberger wrote: > Hello, > > I'm re-sending in plain text per the auto-reply from a spam filter. I have attached some text files this time, which explain the situation below, in case the below email's font & formatting is now too messed up for easy comprehension. > > Thank you and best regards. > > > Chris Preimesberger | Test & Validation Engineer > Transition Networks, Inc. > > chrisp@transition.com > direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com > ________________________________________ > > > This is just a drive by guess, but I think this is a driver issue. Issue 1 seems like a red herring, cat doesn't modify output, nor does ethtool know if its output is going to a console or a pipe, its all the same. And given issue 2 (that the output of the thresholds, etc are spurriously changing and wrong), suggests that they are spurriously changing and wrong regardless of what cat does. That said, I think issue two is a problem with the mlx4 driver. Specifically that the driver is copying garbage data. The three ethtool functions at work here are: mlx4_en_get_module_info mlx4_en_get_module_eeprom mlx4_get_module_info When you run ethtool -m on this driver, the kernel calls mlx4_en_get_module_info to determine the length of the eeprom, and that value will be either 256 or 512 bytes. Lets assume that the value is 256 for the sake of argument Next it calls mlx4_en_get_module_eeprom, passing in that size 256 to actually read the eeprom data, which in turn calls mlx4_get_module_info to fetch the data from hardware, again, passing in 256 as the size for the first call (theres a loop, but it will only get executed once in this scenario) mlx4_get_module_info then issues the appropriate mailbox commands to dump the eeprom. Here it starts to go sideways. The mailbox buffer allocated for the return data is of type mlx4_mad_ifc, which has some front matter information and a data buffer that is 192 bytes long! A little further down in the function, size gets restricted if the buffer crosses a page boundary, but given that the size is 256 on the first call here, and offset is zero on the first call, we're not crossing anything, so size remains unchanged. The output mailbox buffer outmad->data (a 192 byte array), then gets cast to a sturt mlx4_cable_info structure, which has its own internal data buffer that is only 48 bytes long. The memcpy in this functionthen copies cable_info->data to the buffer that gets returned to ethtool, but it copies size bytes (256), even though the source data buffer is only 48 bytes long. That 48 byte array is embedded in the larger 192 byte structure, so there won't be a panic on the overrun, but theres no telling what garbage is in the buffer beyond those first 48 bytes. Even if the remaining 144 bytes have valid eeprom data, its less than the required 256 bytes. The additional copy may cause a panic, but if the buffer commonly bumps up against other allocated memory, that will go unnoticed. after the memcpy, mlx4_get_module_info just returns the size of the passed in buffer (256), and so the calling function thinks its work is done, and lets the kernel send back the buffer with garbage data to ethtool. I think the mlx4 guys have some work to do here. My $0.02 Neil > > > From: Chris Preimesberger > Sent: Wednesday, September 26, 2018 2:14 PM > To: 'linville@tuxdriver.com'; 'netdev@vger.kernel.org' > Subject: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers > > Hello John, All, > > > I think I may have found a bug or two in ethtool, with respect to its reporting of a QSFP28 transceiver's diagnostic information. Ethtool seems to correctly report all diagnostic information about QSFP28 transceivers, except for the transceiver's warning and alarm thresholds. I'm not sure whether the spurious warning and alarm values that get reported are the fault of ethtool or my NIC/driver, and I have no other models of 100GbE NICs to test with. I've contacted Mellanox support about this, and they point the finger at ethtool. Can these issues be investigated by ethtool developers? Here is some background information about the equipment and software used when I observe these issues: > > Equipment used: > NIC: Mellanox ConnectX-4 100GbE, part number MCX415A-CCAT > Transceiver: Any 40Gb or 100Gb QSFP28 transceiver installed in the NIC (Intel, Mellanox, Transition Networks, etc..) > > Software used: > Ubuntu 18.04 with the distro's packaged NIC driver and ethtool v4.15 > also tested were ethtool v4.18 compiled from source and the current Mellanox OFED driver. > > All test scenarios produced the same bugs. > > > Bug #1. Ethtool's reporting of the installed transceiver's alarm and warning thresholds will differ, depending on whether or not ethtool is piped to another command. Example commands are below, with their respective differing output values highlighted: > > > tech1@D8:~$ sudo ethtool -m enp1s0 > Identifier : 0x11 (QSFP28) > Extended identifier : 0xfc > Extended identifier description : 3.5W max. Power consumption > Extended identifier description : CDR present in TX, CDR present in RX > Extended identifier description : High Power Class (> 3.5 W) not enabled > Connector : 0x07 (LC) > Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > Transceiver type : 100G Ethernet: 100G CWDM4 MSA with FEC > Encoding : 0x03 (NRZ) > BR, Nominal : 25500Mbps > Rate identifier : 0x00 > Length (SMF,km) : 2km > Length (OM3 50um) : 0m > Length (OM2 50um) : 0m > Length (OM1 62.5um) : 0m > Length (Copper or Active cable) : 0m > Transmitter technology : 0x40 (1310 nm DFB) > Laser wavelength : 1310.000nm > Laser wavelength tolerance : 47.500nm > Vendor name : TRANSITION > Vendor OUI : 00:c0:f2 > Vendor PN : TNQSFP100GCWDM4 > Vendor rev : 1A > Vendor SN : TN02000302 > Date code : 180919 > Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7 > Module temperature : 39.53 degrees C / 103.15 degrees F > Module voltage : 3.3241 V > Alarm/warning flags implemented : Yes > Laser tx bias current (Channel 1) : 34.432 mA > Laser tx bias current (Channel 2) : 34.432 mA > Laser tx bias current (Channel 3) : 33.408 mA > Laser tx bias current (Channel 4) : 33.920 mA > Transmit avg optical power (Channel 1) : 0.9048 mW / -0.43 dBm > Transmit avg optical power (Channel 2) : 0.7832 mW / -1.06 dBm > Transmit avg optical power (Channel 3) : 0.8057 mW / -0.94 dBm > Transmit avg optical power (Channel 4) : 0.7014 mW / -1.54 dBm > Rcvr signal avg optical power(Channel 1) : 0.7378 mW / -1.32 dBm > Rcvr signal avg optical power(Channel 2) : 0.7553 mW / -1.22 dBm > Rcvr signal avg optical power(Channel 3) : 0.6529 mW / -1.85 dBm > Rcvr signal avg optical power(Channel 4) : 0.6847 mW / -1.64 dBm > Laser bias current high alarm (Chan 1) : Off > Laser bias current low alarm (Chan 1) : Off > Laser bias current high warning (Chan 1) : Off > Laser bias current low warning (Chan 1) : Off > Laser bias current high alarm (Chan 2) : Off > Laser bias current low alarm (Chan 2) : Off > Laser bias current high warning (Chan 2) : Off > Laser bias current low warning (Chan 2) : Off > Laser bias current high alarm (Chan 3) : Off > Laser bias current low alarm (Chan 3) : Off > Laser bias current high warning (Chan 3) : Off > Laser bias current low warning (Chan 3) : Off > Laser bias current high alarm (Chan 4) : Off > Laser bias current low alarm (Chan 4) : Off > Laser bias current high warning (Chan 4) : Off > Laser bias current low warning (Chan 4) : Off > Module temperature high alarm : Off > Module temperature low alarm : Off > Module temperature high warning : Off > Module temperature low warning : Off > Module voltage high alarm : Off > Module voltage low alarm : Off > Module voltage high warning : Off > Module voltage low warning : Off > Laser tx power high alarm (Channel 1) : Off > Laser tx power low alarm (Channel 1) : Off > Laser tx power high warning (Channel 1) : Off > Laser tx power low warning (Channel 1) : Off > Laser tx power high alarm (Channel 2) : Off > Laser tx power low alarm (Channel 2) : Off > Laser tx power high warning (Channel 2) : Off > Laser tx power low warning (Channel 2) : Off > Laser tx power high alarm (Channel 3) : Off > Laser tx power low alarm (Channel 3) : Off > Laser tx power high warning (Channel 3) : Off > Laser tx power low warning (Channel 3) : Off > Laser tx power high alarm (Channel 4) : Off > Laser tx power low alarm (Channel 4) : Off > Laser tx power high warning (Channel 4) : Off > Laser tx power low warning (Channel 4) : Off > Laser rx power high alarm (Channel 1) : Off > Laser rx power low alarm (Channel 1) : Off > Laser rx power high warning (Channel 1) : Off > Laser rx power low warning (Channel 1) : Off > Laser rx power high alarm (Channel 2) : Off > Laser rx power low alarm (Channel 2) : Off > Laser rx power high warning (Channel 2) : Off > Laser rx power low warning (Channel 2) : Off > Laser rx power high alarm (Channel 3) : Off > Laser rx power low alarm (Channel 3) : Off > Laser rx power high warning (Channel 3) : Off > Laser rx power low warning (Channel 3) : Off > Laser rx power high alarm (Channel 4) : Off > Laser rx power low alarm (Channel 4) : Off > Laser rx power high warning (Channel 4) : Off > Laser rx power low warning (Channel 4) : Off > Laser bias current high alarm threshold : 0.000 mA > Laser bias current low alarm threshold : 0.000 mA > Laser bias current high warning threshold : 0.000 mA > Laser bias current low warning threshold : 0.000 mA > Laser output power high alarm threshold : 0.0000 mW / -inf dBm > Laser output power low alarm threshold : 0.0000 mW / -inf dBm > Laser output power high warning threshold : 0.0000 mW / -inf dBm > Laser output power low warning threshold : 0.0000 mW / -inf dBm > Module temperature high alarm threshold : 0.00 degrees C / 32.00 degrees F > Module temperature low alarm threshold : 0.00 degrees C / 32.00 degrees F > Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F > Module temperature low warning threshold : 0.00 degrees C / 32.00 degrees F > Module voltage high alarm threshold : 0.0000 V > Module voltage low alarm threshold : 0.0000 V > Module voltage high warning threshold : 0.0000 V > Module voltage low warning threshold : 0.0000 V > Laser rx power high alarm threshold : 0.0000 mW / -inf dBm > Laser rx power low alarm threshold : 0.0000 mW / -inf dBm > Laser rx power high warning threshold : 0.0000 mW / -inf dBm > Laser rx power low warning threshold : 0.0000 mW / -inf dBm > > > tech1@D8:~$ sudo ethtool -m enp1s0 | cat > Identifier : 0x11 (QSFP28) > Extended identifier : 0xfc > Extended identifier description : 3.5W max. Power consumption > Extended identifier description : CDR present in TX, CDR present in RX > Extended identifier description : High Power Class (> 3.5 W) not enabled > Connector : 0x07 (LC) > Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > Transceiver type : 100G Ethernet: 100G CWDM4 MSA with FEC > Encoding : 0x03 (NRZ) > BR, Nominal : 25500Mbps > Rate identifier : 0x00 > Length (SMF,km) : 2km > Length (OM3 50um) : 0m > Length (OM2 50um) : 0m > Length (OM1 62.5um) : 0m > Length (Copper or Active cable) : 0m > Transmitter technology : 0x40 (1310 nm DFB) > Laser wavelength : 1310.000nm > Laser wavelength tolerance : 47.500nm > Vendor name : TRANSITION > Vendor OUI : 00:c0:f2 > Vendor PN : TNQSFP100GCWDM4 > Vendor rev : 1A > Vendor SN : TN02000302 > Date code : 180919 > Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7 > Module temperature : 39.53 degrees C / 103.15 degrees F > Module voltage : 3.3249 V > Alarm/warning flags implemented : Yes > Laser tx bias current (Channel 1) : 34.432 mA > Laser tx bias current (Channel 2) : 34.432 mA > Laser tx bias current (Channel 3) : 33.408 mA > Laser tx bias current (Channel 4) : 33.920 mA > Transmit avg optical power (Channel 1) : 0.9043 mW / -0.44 dBm > Transmit avg optical power (Channel 2) : 0.7832 mW / -1.06 dBm > Transmit avg optical power (Channel 3) : 0.8057 mW / -0.94 dBm > Transmit avg optical power (Channel 4) : 0.7009 mW / -1.54 dBm > Rcvr signal avg optical power(Channel 1) : 0.7378 mW / -1.32 dBm > Rcvr signal avg optical power(Channel 2) : 0.7553 mW / -1.22 dBm > Rcvr signal avg optical power(Channel 3) : 0.6529 mW / -1.85 dBm > Rcvr signal avg optical power(Channel 4) : 0.6847 mW / -1.64 dBm > Laser bias current high alarm (Chan 1) : Off > Laser bias current low alarm (Chan 1) : Off > Laser bias current high warning (Chan 1) : Off > Laser bias current low warning (Chan 1) : Off > Laser bias current high alarm (Chan 2) : Off > Laser bias current low alarm (Chan 2) : Off > Laser bias current high warning (Chan 2) : Off > Laser bias current low warning (Chan 2) : Off > Laser bias current high alarm (Chan 3) : Off > Laser bias current low alarm (Chan 3) : Off > Laser bias current high warning (Chan 3) : Off > Laser bias current low warning (Chan 3) : Off > Laser bias current high alarm (Chan 4) : Off > Laser bias current low alarm (Chan 4) : Off > Laser bias current high warning (Chan 4) : Off > Laser bias current low warning (Chan 4) : Off > Module temperature high alarm : Off > Module temperature low alarm : Off > Module temperature high warning : Off > Module temperature low warning : Off > Module voltage high alarm : Off > Module voltage low alarm : Off > Module voltage high warning : Off > Module voltage low warning : Off > Laser tx power high alarm (Channel 1) : Off > Laser tx power low alarm (Channel 1) : Off > Laser tx power high warning (Channel 1) : Off > Laser tx power low warning (Channel 1) : Off > Laser tx power high alarm (Channel 2) : Off > Laser tx power low alarm (Channel 2) : Off > Laser tx power high warning (Channel 2) : Off > Laser tx power low warning (Channel 2) : Off > Laser tx power high alarm (Channel 3) : Off > Laser tx power low alarm (Channel 3) : Off > Laser tx power high warning (Channel 3) : Off > Laser tx power low warning (Channel 3) : Off > Laser tx power high alarm (Channel 4) : Off > Laser tx power low alarm (Channel 4) : Off > Laser tx power high warning (Channel 4) : Off > Laser tx power low warning (Channel 4) : Off > Laser rx power high alarm (Channel 1) : Off > Laser rx power low alarm (Channel 1) : Off > Laser rx power high warning (Channel 1) : Off > Laser rx power low warning (Channel 1) : Off > Laser rx power high alarm (Channel 2) : Off > Laser rx power low alarm (Channel 2) : Off > Laser rx power high warning (Channel 2) : Off > Laser rx power low warning (Channel 2) : Off > Laser rx power high alarm (Channel 3) : Off > Laser rx power low alarm (Channel 3) : Off > Laser rx power high warning (Channel 3) : Off > Laser rx power low warning (Channel 3) : Off > Laser rx power high alarm (Channel 4) : Off > Laser rx power low alarm (Channel 4) : Off > Laser rx power high warning (Channel 4) : Off > Laser rx power low warning (Channel 4) : Off > Laser bias current high alarm threshold : 16.448 mA > Laser bias current low alarm threshold : 16.448 mA > Laser bias current high warning threshold : 16.448 mA > Laser bias current low warning threshold : 16.448 mA > Laser output power high alarm threshold : 0.8224 mW / -0.85 dBm > Laser output power low alarm threshold : 0.8250 mW / -0.84 dBm > Laser output power high warning threshold : 0.8264 mW / -0.83 dBm > Laser output power low warning threshold : 2.6983 mW / 4.31 dBm > Module temperature high alarm threshold : 110.12 degrees C / 230.22 degrees F > Module temperature low alarm threshold : 84.34 degrees C / 183.82 degrees F > Module temperature high warning threshold : 44.12 degrees C / 111.42 degrees F > Module temperature low warning threshold : 67.27 degrees C / 153.08 degrees F > Module voltage high alarm threshold : 2.9728 V > Module voltage low alarm threshold : 2.6990 V > Module voltage high warning threshold : 0.8274 V > Module voltage low warning threshold : 2.2538 V > Laser rx power high alarm threshold : 2.5458 mW / 4.06 dBm > Laser rx power low alarm threshold : 2.6992 mW / 4.31 dBm > Laser rx power high warning threshold : 2.9801 mW / 4.74 dBm > Laser rx power low warning threshold : 2.8526 mW / 4.55 dBm > > > Bug # 2. All of the alarm and warning threshold values reported in the above commands are spurious. > At first glance, one would assume that the threshold values reported by the piped ethtool command are correct, but they're not. I know the programmed values for the above transceiver, so that makes it easy for me to spot the spurious values, but even without knowing the programmed values of a given transceiver, one can use logic to detect when the ethtool displayed values don't make sense. > For example, lets scrutinize the values for voltage warnings and alarms reported by ethtool on this transceiver. We will look at each voltage threshold, and scrutinize that value relative to the other voltage thresholds, and look for contradictions to determine whether the reported values seem legit. > Known ethtool > Actual Reported > Values Values > High Voltage Alarm 3.70V 2.9728 V > High Voltage Warning 3.59V 0.8274 V > (Operating spec = 3.30V) > Low Voltage Warning 3.00V 2.2538 V > Low Voltage Alarm 2.90V 2.6990 V > > Contradictions for the ethtool reported voltage warning and alarm thresholds: > 1. The high voltage alarm should occur at higher voltage than the operating voltage, but ethtool didn't report that. > 2. The high voltage warning should occur at higher voltage than the low voltage warning and alarm, but ethtool didn't report that. > 3. The low voltage warning should occur at higher voltage than the low voltage alarm, but ethtool didn't report that. > 4. The low voltage alarm should occur at a lower voltage than any of the other voltage warnings and alarms, but ethtool didn't report that. > 5. The current voltage value was reported as 3.3249V, which should trigger high voltage warning and alarm, according to the reported thresholds, but no warnings or alarms are indicated. > > Each of the 4 voltage thresholds reported by ethtool have contradictions, so we know something is not right. This same kind of logic can be applied to the thresholds for temperature, laser TX power, etc.. to find that those values are also spurious. > > > Installing the above transceiver in a Cisco switch reveals that the Cisco correctly retrieves the true warning and alarm threshold values from the transceiver's EEPROM, so we trust that the transceiver has been correctly programmed. Cisco CLI output for that transceiver shown here: > > switch# show interface ethernet 1/3 transceiver details > Ethernet1/3 > transceiver is present > type is QSFP-100G-CWDM4-MSA-FEC > name is TRANSITION > part number is TNQSFP100GCWDM4 > revision is 1A > serial number is TN02000302 > nominal bitrate is 25500 MBit/sec per channel > Link length supported for 9/125um fiber is 2 km > cisco id is 17 > cisco extended id number is 252 > > Lane Number:1 Network Lane > SFP Detail Diagnostics Information (internal calibration) > ---------------------------------------------------------------------------- > Current Alarms Warnings > Measurement High Low High Low > ---------------------------------------------------------------------------- > Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C > Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V > Current 34.24 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA > Tx Power -0.44 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm > Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm > Transmit Fault Count = 0 > ---------------------------------------------------------------------------- > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning > > Lane Number:2 Network Lane > SFP Detail Diagnostics Information (internal calibration) > ---------------------------------------------------------------------------- > Current Alarms Warnings > Measurement High Low High Low > ---------------------------------------------------------------------------- > Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C > Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V > Current 34.24 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA > Tx Power -1.20 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm > Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm > Transmit Fault Count = 0 > ---------------------------------------------------------------------------- > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning > > Lane Number:3 Network Lane > SFP Detail Diagnostics Information (internal calibration) > ---------------------------------------------------------------------------- > Current Alarms Warnings > Measurement High Low High Low > ---------------------------------------------------------------------------- > Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C > Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V > Current 33.21 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA > Tx Power -0.96 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm > Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm > Transmit Fault Count = 0 > ---------------------------------------------------------------------------- > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning > > Lane Number:4 Network Lane > SFP Detail Diagnostics Information (internal calibration) > ---------------------------------------------------------------------------- > Current Alarms Warnings > Measurement High Low High Low > ---------------------------------------------------------------------------- > Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C > Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V > Current 33.72 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA > Tx Power -1.59 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm > Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm > Transmit Fault Count = 0 > ---------------------------------------------------------------------------- > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning > > switch# > > > Any help with these issues is greatly appreciated. If you have any questions or advice, please let me know. I'll be glad to continue troubleshooting this until it's resolved. Thank you. > > > Chris Preimesberger | Test & Validation Engineer > Transition Networks, Inc. > > chrisp@transition.com > direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com > ________________________________________ > > > > > > > > For comparison to ethtool's output that shows incorrect threshold values, when installing the same transceiver in a Cisco Nexus switch, and issuing the Cisco command "show interface ethernet 1/3 transceiver details", the switch correctly correctly reads/displays the transceiver's Alarm and Warning thresholds, as shown below: > > > switch# show interface ethernet 1/3 transceiver details > Ethernet1/3 > transceiver is present > type is QSFP-100G-CWDM4-MSA-FEC > name is TRANSITION > part number is TNQSFP100GCWDM4 > revision is 1A > serial number is TN02000302 > nominal bitrate is 25500 MBit/sec per channel > Link length supported for 9/125um fiber is 2 km > cisco id is 17 > cisco extended id number is 252 > > Lane Number:1 Network Lane > SFP Detail Diagnostics Information (internal calibration) > ---------------------------------------------------------------------------- > Current Alarms Warnings > Measurement High Low High Low > ---------------------------------------------------------------------------- > Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C > Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V > Current 34.24 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA > Tx Power -0.44 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm > Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm > Transmit Fault Count = 0 > ---------------------------------------------------------------------------- > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning > > Lane Number:2 Network Lane > SFP Detail Diagnostics Information (internal calibration) > ---------------------------------------------------------------------------- > Current Alarms Warnings > Measurement High Low High Low > ---------------------------------------------------------------------------- > Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C > Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V > Current 34.24 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA > Tx Power -1.20 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm > Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm > Transmit Fault Count = 0 > ---------------------------------------------------------------------------- > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning > > Lane Number:3 Network Lane > SFP Detail Diagnostics Information (internal calibration) > ---------------------------------------------------------------------------- > Current Alarms Warnings > Measurement High Low High Low > ---------------------------------------------------------------------------- > Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C > Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V > Current 33.21 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA > Tx Power -0.96 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm > Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm > Transmit Fault Count = 0 > ---------------------------------------------------------------------------- > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning > > Lane Number:4 Network Lane > SFP Detail Diagnostics Information (internal calibration) > ---------------------------------------------------------------------------- > Current Alarms Warnings > Measurement High Low High Low > ---------------------------------------------------------------------------- > Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C > Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V > Current 33.72 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA > Tx Power -1.59 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm > Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm > Transmit Fault Count = 0 > ---------------------------------------------------------------------------- > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning > > switch# > > > Look at each line in the ethtool output below that includes the word "threshold". This file has been hand-edited to show the threshold values that have been programmed into the transceiver, which should be displayed by ethtool. The threshold values shown below are copied and pasted from the output of the Cisco NX-OS command "show interface ethernet 1/3 transceiver details", while the transceiver was installed in a Cisco Nexus switch. > > Note - I only copied the threshold values in the units that were displayed by the Cisco switch. The "?" symbols are just a placeholder for the converted values; I was too lazy to do conversions between dBm and mW, or between degrees C and degrees F. Ethtool would be expected to report the true / converted values. > > > > > tech1@D8:~$ sudo ethtool -m enp1s0 > Identifier : 0x11 (QSFP28) > Extended identifier : 0xfc > Extended identifier description : 3.5W max. Power consumption > Extended identifier description : CDR present in TX, CDR present in RX > Extended identifier description : High Power Class (> 3.5 W) not enabled > Connector : 0x07 (LC) > Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > Transceiver type : 100G Ethernet: 100G CWDM4 MSA with FEC > Encoding : 0x03 (NRZ) > BR, Nominal : 25500Mbps > Rate identifier : 0x00 > Length (SMF,km) : 2km > Length (OM3 50um) : 0m > Length (OM2 50um) : 0m > Length (OM1 62.5um) : 0m > Length (Copper or Active cable) : 0m > Transmitter technology : 0x40 (1310 nm DFB) > Laser wavelength : 1310.000nm > Laser wavelength tolerance : 47.500nm > Vendor name : TRANSITION > Vendor OUI : 00:c0:f2 > Vendor PN : TNQSFP100GCWDM4 > Vendor rev : 1A > Vendor SN : TN02000302 > Date code : 180919 > Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7 > Module temperature : 39.53 degrees C / 103.15 degrees F > Module voltage : 3.3233 V > Alarm/warning flags implemented : Yes > Laser tx bias current (Channel 1) : 34.432 mA > Laser tx bias current (Channel 2) : 34.432 mA > Laser tx bias current (Channel 3) : 33.408 mA > Laser tx bias current (Channel 4) : 33.920 mA > Transmit avg optical power (Channel 1) : 0.9052 mW / -0.43 dBm > Transmit avg optical power (Channel 2) : 0.7832 mW / -1.06 dBm > Transmit avg optical power (Channel 3) : 0.8057 mW / -0.94 dBm > Transmit avg optical power (Channel 4) : 0.7009 mW / -1.54 dBm > Rcvr signal avg optical power(Channel 1) : 0.7378 mW / -1.32 dBm > Rcvr signal avg optical power(Channel 2) : 0.7553 mW / -1.22 dBm > Rcvr signal avg optical power(Channel 3) : 0.6529 mW / -1.85 dBm > Rcvr signal avg optical power(Channel 4) : 0.6948 mW / -1.58 dBm > Laser bias current high alarm (Chan 1) : Off > Laser bias current low alarm (Chan 1) : Off > Laser bias current high warning (Chan 1) : Off > Laser bias current low warning (Chan 1) : Off > Laser bias current high alarm (Chan 2) : Off > Laser bias current low alarm (Chan 2) : Off > Laser bias current high warning (Chan 2) : Off > Laser bias current low warning (Chan 2) : Off > Laser bias current high alarm (Chan 3) : Off > Laser bias current low alarm (Chan 3) : Off > Laser bias current high warning (Chan 3) : Off > Laser bias current low warning (Chan 3) : Off > Laser bias current high alarm (Chan 4) : Off > Laser bias current low alarm (Chan 4) : Off > Laser bias current high warning (Chan 4) : Off > Laser bias current low warning (Chan 4) : Off > Module temperature high alarm : Off > Module temperature low alarm : Off > Module temperature high warning : Off > Module temperature low warning : Off > Module voltage high alarm : Off > Module voltage low alarm : Off > Module voltage high warning : Off > Module voltage low warning : Off > Laser tx power high alarm (Channel 1) : Off > Laser tx power low alarm (Channel 1) : Off > Laser tx power high warning (Channel 1) : Off > Laser tx power low warning (Channel 1) : Off > Laser tx power high alarm (Channel 2) : Off > Laser tx power low alarm (Channel 2) : Off > Laser tx power high warning (Channel 2) : Off > Laser tx power low warning (Channel 2) : Off > Laser tx power high alarm (Channel 3) : Off > Laser tx power low alarm (Channel 3) : Off > Laser tx power high warning (Channel 3) : Off > Laser tx power low warning (Channel 3) : Off > Laser tx power high alarm (Channel 4) : Off > Laser tx power low alarm (Channel 4) : Off > Laser tx power high warning (Channel 4) : Off > Laser tx power low warning (Channel 4) : Off > Laser rx power high alarm (Channel 1) : Off > Laser rx power low alarm (Channel 1) : Off > Laser rx power high warning (Channel 1) : Off > Laser rx power low warning (Channel 1) : Off > Laser rx power high alarm (Channel 2) : Off > Laser rx power low alarm (Channel 2) : Off > Laser rx power high warning (Channel 2) : Off > Laser rx power low warning (Channel 2) : Off > Laser rx power high alarm (Channel 3) : Off > Laser rx power low alarm (Channel 3) : Off > Laser rx power high warning (Channel 3) : Off > Laser rx power low warning (Channel 3) : Off > Laser rx power high alarm (Channel 4) : Off > Laser rx power low alarm (Channel 4) : Off > Laser rx power high warning (Channel 4) : Off > Laser rx power low warning (Channel 4) : Off > Laser bias current high alarm threshold : 75.000 mA > Laser bias current low alarm threshold : 10.000 mA > Laser bias current high warning threshold : 70.000 mA > Laser bias current low warning threshold : 15.000 mA > Laser output power high alarm threshold : ? mW / 4.49 dBm > Laser output power low alarm threshold : ? mW / -8.50 dBm > Laser output power high warning threshold : ? mW / 3.49 dBm > Laser output power low warning threshold : ? mW / -7.52 dBm > Module temperature high alarm threshold : 80.00 degrees C / ? degrees F > Module temperature low alarm threshold : -10.00 degrees C / ? degrees F > Module temperature high warning threshold : 75.00 degrees C / ? degrees F > Module temperature low warning threshold : -5.00 degrees C / ? degrees F > Module voltage high alarm threshold : 3.7000 V > Module voltage low alarm threshold : 2.9000 V > Module voltage high warning threshold : 3.5900 V > Module voltage low warning threshold : 3.0000 V > Laser rx power high alarm threshold : ? mW / 4.49 dBm > Laser rx power low alarm threshold : ? mW / -14.55 dBm > Laser rx power high warning threshold : ? mW / 3.49 dBm > Laser rx power low warning threshold : ? mW / -12.51 dBm > > > Look at each line in the ethtool output below that includes the word "threshold". This file shows the actual output from ethtool v4.18, when the output is not piped to another command. Notice that all of the displayed threshold values are 0 (which is incorrect), while other values report as expected. > > tech1@D8:~$ sudo ethtool -m enp1s0 > Identifier : 0x11 (QSFP28) > Extended identifier : 0xfc > Extended identifier description : 3.5W max. Power consumption > Extended identifier description : CDR present in TX, CDR present in RX > Extended identifier description : High Power Class (> 3.5 W) not enabled > Connector : 0x07 (LC) > Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > Transceiver type : 100G Ethernet: 100G CWDM4 MSA with FEC > Encoding : 0x03 (NRZ) > BR, Nominal : 25500Mbps > Rate identifier : 0x00 > Length (SMF,km) : 2km > Length (OM3 50um) : 0m > Length (OM2 50um) : 0m > Length (OM1 62.5um) : 0m > Length (Copper or Active cable) : 0m > Transmitter technology : 0x40 (1310 nm DFB) > Laser wavelength : 1310.000nm > Laser wavelength tolerance : 47.500nm > Vendor name : TRANSITION > Vendor OUI : 00:c0:f2 > Vendor PN : TNQSFP100GCWDM4 > Vendor rev : 1A > Vendor SN : TN02000302 > Date code : 180919 > Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7 > Module temperature : 39.53 degrees C / 103.15 degrees F > Module voltage : 3.3241 V > Alarm/warning flags implemented : Yes > Laser tx bias current (Channel 1) : 34.432 mA > Laser tx bias current (Channel 2) : 34.432 mA > Laser tx bias current (Channel 3) : 33.408 mA > Laser tx bias current (Channel 4) : 33.920 mA > Transmit avg optical power (Channel 1) : 0.9048 mW / -0.43 dBm > Transmit avg optical power (Channel 2) : 0.7832 mW / -1.06 dBm > Transmit avg optical power (Channel 3) : 0.8057 mW / -0.94 dBm > Transmit avg optical power (Channel 4) : 0.7014 mW / -1.54 dBm > Rcvr signal avg optical power(Channel 1) : 0.7378 mW / -1.32 dBm > Rcvr signal avg optical power(Channel 2) : 0.7553 mW / -1.22 dBm > Rcvr signal avg optical power(Channel 3) : 0.6529 mW / -1.85 dBm > Rcvr signal avg optical power(Channel 4) : 0.6847 mW / -1.64 dBm > Laser bias current high alarm (Chan 1) : Off > Laser bias current low alarm (Chan 1) : Off > Laser bias current high warning (Chan 1) : Off > Laser bias current low warning (Chan 1) : Off > Laser bias current high alarm (Chan 2) : Off > Laser bias current low alarm (Chan 2) : Off > Laser bias current high warning (Chan 2) : Off > Laser bias current low warning (Chan 2) : Off > Laser bias current high alarm (Chan 3) : Off > Laser bias current low alarm (Chan 3) : Off > Laser bias current high warning (Chan 3) : Off > Laser bias current low warning (Chan 3) : Off > Laser bias current high alarm (Chan 4) : Off > Laser bias current low alarm (Chan 4) : Off > Laser bias current high warning (Chan 4) : Off > Laser bias current low warning (Chan 4) : Off > Module temperature high alarm : Off > Module temperature low alarm : Off > Module temperature high warning : Off > Module temperature low warning : Off > Module voltage high alarm : Off > Module voltage low alarm : Off > Module voltage high warning : Off > Module voltage low warning : Off > Laser tx power high alarm (Channel 1) : Off > Laser tx power low alarm (Channel 1) : Off > Laser tx power high warning (Channel 1) : Off > Laser tx power low warning (Channel 1) : Off > Laser tx power high alarm (Channel 2) : Off > Laser tx power low alarm (Channel 2) : Off > Laser tx power high warning (Channel 2) : Off > Laser tx power low warning (Channel 2) : Off > Laser tx power high alarm (Channel 3) : Off > Laser tx power low alarm (Channel 3) : Off > Laser tx power high warning (Channel 3) : Off > Laser tx power low warning (Channel 3) : Off > Laser tx power high alarm (Channel 4) : Off > Laser tx power low alarm (Channel 4) : Off > Laser tx power high warning (Channel 4) : Off > Laser tx power low warning (Channel 4) : Off > Laser rx power high alarm (Channel 1) : Off > Laser rx power low alarm (Channel 1) : Off > Laser rx power high warning (Channel 1) : Off > Laser rx power low warning (Channel 1) : Off > Laser rx power high alarm (Channel 2) : Off > Laser rx power low alarm (Channel 2) : Off > Laser rx power high warning (Channel 2) : Off > Laser rx power low warning (Channel 2) : Off > Laser rx power high alarm (Channel 3) : Off > Laser rx power low alarm (Channel 3) : Off > Laser rx power high warning (Channel 3) : Off > Laser rx power low warning (Channel 3) : Off > Laser rx power high alarm (Channel 4) : Off > Laser rx power low alarm (Channel 4) : Off > Laser rx power high warning (Channel 4) : Off > Laser rx power low warning (Channel 4) : Off > Laser bias current high alarm threshold : 0.000 mA > Laser bias current low alarm threshold : 0.000 mA > Laser bias current high warning threshold : 0.000 mA > Laser bias current low warning threshold : 0.000 mA > Laser output power high alarm threshold : 0.0000 mW / -inf dBm > Laser output power low alarm threshold : 0.0000 mW / -inf dBm > Laser output power high warning threshold : 0.0000 mW / -inf dBm > Laser output power low warning threshold : 0.0000 mW / -inf dBm > Module temperature high alarm threshold : 0.00 degrees C / 32.00 degrees F > Module temperature low alarm threshold : 0.00 degrees C / 32.00 degrees F > Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F > Module temperature low warning threshold : 0.00 degrees C / 32.00 degrees F > Module voltage high alarm threshold : 0.0000 V > Module voltage low alarm threshold : 0.0000 V > Module voltage high warning threshold : 0.0000 V > Module voltage low warning threshold : 0.0000 V > Laser rx power high alarm threshold : 0.0000 mW / -inf dBm > Laser rx power low alarm threshold : 0.0000 mW / -inf dBm > Laser rx power high warning threshold : 0.0000 mW / -inf dBm > Laser rx power low warning threshold : 0.0000 mW / -inf dBm > > > > Look at each line in the ethtool output below that includes the word "threshold". This file shows the actual output from ethtool v4.18, when the ethtool output is piped to another command. Notice that all of the displayed threshold values are spurious while other values report as expected. > > tech1@D8:~$ sudo ethtool -m enp1s0 | cat > Identifier : 0x11 (QSFP28) > Extended identifier : 0xfc > Extended identifier description : 3.5W max. Power consumption > Extended identifier description : CDR present in TX, CDR present in RX > Extended identifier description : High Power Class (> 3.5 W) not enabled > Connector : 0x07 (LC) > Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > Transceiver type : 100G Ethernet: 100G CWDM4 MSA with FEC > Encoding : 0x03 (NRZ) > BR, Nominal : 25500Mbps > Rate identifier : 0x00 > Length (SMF,km) : 2km > Length (OM3 50um) : 0m > Length (OM2 50um) : 0m > Length (OM1 62.5um) : 0m > Length (Copper or Active cable) : 0m > Transmitter technology : 0x40 (1310 nm DFB) > Laser wavelength : 1310.000nm > Laser wavelength tolerance : 47.500nm > Vendor name : TRANSITION > Vendor OUI : 00:c0:f2 > Vendor PN : TNQSFP100GCWDM4 > Vendor rev : 1A > Vendor SN : TN02000302 > Date code : 180919 > Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7 > Module temperature : 39.53 degrees C / 103.15 degrees F > Module voltage : 3.3249 V > Alarm/warning flags implemented : Yes > Laser tx bias current (Channel 1) : 34.432 mA > Laser tx bias current (Channel 2) : 34.432 mA > Laser tx bias current (Channel 3) : 33.408 mA > Laser tx bias current (Channel 4) : 33.920 mA > Transmit avg optical power (Channel 1) : 0.9043 mW / -0.44 dBm > Transmit avg optical power (Channel 2) : 0.7832 mW / -1.06 dBm > Transmit avg optical power (Channel 3) : 0.8057 mW / -0.94 dBm > Transmit avg optical power (Channel 4) : 0.7009 mW / -1.54 dBm > Rcvr signal avg optical power(Channel 1) : 0.7378 mW / -1.32 dBm > Rcvr signal avg optical power(Channel 2) : 0.7553 mW / -1.22 dBm > Rcvr signal avg optical power(Channel 3) : 0.6529 mW / -1.85 dBm > Rcvr signal avg optical power(Channel 4) : 0.6847 mW / -1.64 dBm > Laser bias current high alarm (Chan 1) : Off > Laser bias current low alarm (Chan 1) : Off > Laser bias current high warning (Chan 1) : Off > Laser bias current low warning (Chan 1) : Off > Laser bias current high alarm (Chan 2) : Off > Laser bias current low alarm (Chan 2) : Off > Laser bias current high warning (Chan 2) : Off > Laser bias current low warning (Chan 2) : Off > Laser bias current high alarm (Chan 3) : Off > Laser bias current low alarm (Chan 3) : Off > Laser bias current high warning (Chan 3) : Off > Laser bias current low warning (Chan 3) : Off > Laser bias current high alarm (Chan 4) : Off > Laser bias current low alarm (Chan 4) : Off > Laser bias current high warning (Chan 4) : Off > Laser bias current low warning (Chan 4) : Off > Module temperature high alarm : Off > Module temperature low alarm : Off > Module temperature high warning : Off > Module temperature low warning : Off > Module voltage high alarm : Off > Module voltage low alarm : Off > Module voltage high warning : Off > Module voltage low warning : Off > Laser tx power high alarm (Channel 1) : Off > Laser tx power low alarm (Channel 1) : Off > Laser tx power high warning (Channel 1) : Off > Laser tx power low warning (Channel 1) : Off > Laser tx power high alarm (Channel 2) : Off > Laser tx power low alarm (Channel 2) : Off > Laser tx power high warning (Channel 2) : Off > Laser tx power low warning (Channel 2) : Off > Laser tx power high alarm (Channel 3) : Off > Laser tx power low alarm (Channel 3) : Off > Laser tx power high warning (Channel 3) : Off > Laser tx power low warning (Channel 3) : Off > Laser tx power high alarm (Channel 4) : Off > Laser tx power low alarm (Channel 4) : Off > Laser tx power high warning (Channel 4) : Off > Laser tx power low warning (Channel 4) : Off > Laser rx power high alarm (Channel 1) : Off > Laser rx power low alarm (Channel 1) : Off > Laser rx power high warning (Channel 1) : Off > Laser rx power low warning (Channel 1) : Off > Laser rx power high alarm (Channel 2) : Off > Laser rx power low alarm (Channel 2) : Off > Laser rx power high warning (Channel 2) : Off > Laser rx power low warning (Channel 2) : Off > Laser rx power high alarm (Channel 3) : Off > Laser rx power low alarm (Channel 3) : Off > Laser rx power high warning (Channel 3) : Off > Laser rx power low warning (Channel 3) : Off > Laser rx power high alarm (Channel 4) : Off > Laser rx power low alarm (Channel 4) : Off > Laser rx power high warning (Channel 4) : Off > Laser rx power low warning (Channel 4) : Off > Laser bias current high alarm threshold : 16.448 mA > Laser bias current low alarm threshold : 16.448 mA > Laser bias current high warning threshold : 16.448 mA > Laser bias current low warning threshold : 16.448 mA > Laser output power high alarm threshold : 0.8224 mW / -0.85 dBm > Laser output power low alarm threshold : 0.8250 mW / -0.84 dBm > Laser output power high warning threshold : 0.8264 mW / -0.83 dBm > Laser output power low warning threshold : 2.6983 mW / 4.31 dBm > Module temperature high alarm threshold : 110.12 degrees C / 230.22 degrees F > Module temperature low alarm threshold : 84.34 degrees C / 183.82 degrees F > Module temperature high warning threshold : 44.12 degrees C / 111.42 degrees F > Module temperature low warning threshold : 67.27 degrees C / 153.08 degrees F > Module voltage high alarm threshold : 2.9728 V > Module voltage low alarm threshold : 2.6990 V > Module voltage high warning threshold : 0.8274 V > Module voltage low warning threshold : 2.2538 V > Laser rx power high alarm threshold : 2.5458 mW / 4.06 dBm > Laser rx power low alarm threshold : 2.6992 mW / 4.31 dBm > Laser rx power high warning threshold : 2.9801 mW / 4.74 dBm > Laser rx power low warning threshold : 2.8526 mW / 4.55 dBm > tech1@D8:~$ > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-26 21:34 ` Neil Horman @ 2018-09-26 21:58 ` Andrew Lunn 2018-09-27 13:23 ` Neil Horman 2018-09-27 13:25 ` Eran Ben Elisha 1 sibling, 1 reply; 16+ messages in thread From: Andrew Lunn @ 2018-09-26 21:58 UTC (permalink / raw) To: Neil Horman Cc: Chris Preimesberger, linville@tuxdriver.com, netdev@vger.kernel.org > When you run ethtool -m on this driver, the kernel calls mlx4_en_get_module_info > to determine the length of the eeprom, and that value will be either 256 or 512 > bytes. So it sounds like QSFP modules using 8636 are not supported. You would expect a size to be one of 256, 384, 512 or 640. > Next it calls mlx4_en_get_module_eeprom, passing in that size 256 to actually > read the eeprom data, which in turn calls mlx4_get_module_info to fetch the data > from hardware, again, passing in 256 as the size for the first call (theres a > loop, but it will only get executed once in this scenario) > > mlx4_get_module_info then issues the appropriate mailbox commands to dump the > eeprom. Here it starts to go sideways. The mailbox buffer allocated for the > return data is of type mlx4_mad_ifc, which has some front matter information and > a data buffer that is 192 bytes long! Which suggests all SFP dumps are broken as well, not just QSFP. Oh dear. Andrew ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-26 21:58 ` Andrew Lunn @ 2018-09-27 13:23 ` Neil Horman 0 siblings, 0 replies; 16+ messages in thread From: Neil Horman @ 2018-09-27 13:23 UTC (permalink / raw) To: Andrew Lunn Cc: Chris Preimesberger, linville@tuxdriver.com, netdev@vger.kernel.org On Wed, Sep 26, 2018 at 11:58:12PM +0200, Andrew Lunn wrote: > > When you run ethtool -m on this driver, the kernel calls mlx4_en_get_module_info > > to determine the length of the eeprom, and that value will be either 256 or 512 > > bytes. > > So it sounds like QSFP modules using 8636 are not supported. You would > expect a size to be one of 256, 384, 512 or 640. > > > Next it calls mlx4_en_get_module_eeprom, passing in that size 256 to actually > > read the eeprom data, which in turn calls mlx4_get_module_info to fetch the data > > from hardware, again, passing in 256 as the size for the first call (theres a > > loop, but it will only get executed once in this scenario) > > > > mlx4_get_module_info then issues the appropriate mailbox commands to dump the > > eeprom. Here it starts to go sideways. The mailbox buffer allocated for the > > return data is of type mlx4_mad_ifc, which has some front matter information and > > a data buffer that is 192 bytes long! > > Which suggests all SFP dumps are broken as well, not just QSFP. > No, not at all. Each driver that implements a get_eeprom ethtool method, is capable of doing multiple reads at various offsets, and filling up the user buffer with real data. The bug here is that the mellanox data structures are not sized properly vis a vis the amount of eeprom data that user space might expect, or more specifically that the driver isn't smart enough to do several small reads to fill up the full sized request buffer Neil > Oh dear. > > Andrew > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-26 21:34 ` Neil Horman 2018-09-26 21:58 ` Andrew Lunn @ 2018-09-27 13:25 ` Eran Ben Elisha 2018-09-27 14:52 ` Andrew Lunn 1 sibling, 1 reply; 16+ messages in thread From: Eran Ben Elisha @ 2018-09-27 13:25 UTC (permalink / raw) To: Neil Horman, Chris Preimesberger Cc: linville@tuxdriver.com, netdev@vger.kernel.org > This is just a drive by guess, but I think this is a driver issue. > > > Issue 1 seems like a red herring, cat doesn't modify output, nor does ethtool > know if its output is going to a console or a pipe, its all the same. And given > issue 2 (that the output of the thresholds, etc are spurriously changing and > wrong), suggests that they are spurriously changing and wrong regardless of what > cat does. > > That said, I think issue two is a problem with the mlx4 driver. Specifically > that the driver is copying garbage data. > > The three ethtool functions at work here are: > mlx4_en_get_module_info > mlx4_en_get_module_eeprom > mlx4_get_module_info > > When you run ethtool -m on this driver, the kernel calls mlx4_en_get_module_info > to determine the length of the eeprom, and that value will be either 256 or 512 > bytes. Lets assume that the value is 256 for the sake of argument > > Next it calls mlx4_en_get_module_eeprom, passing in that size 256 to actually > read the eeprom data, which in turn calls mlx4_get_module_info to fetch the data > from hardware, again, passing in 256 as the size for the first call (theres a > loop, but it will only get executed once in this scenario) > > mlx4_get_module_info then issues the appropriate mailbox commands to dump the > eeprom. Here it starts to go sideways. The mailbox buffer allocated for the > return data is of type mlx4_mad_ifc, which has some front matter information and > a data buffer that is 192 bytes long! > > A little further down in the function, size gets restricted if the buffer > crosses a page boundary, but given that the size is 256 on the first call here, > and offset is zero on the first call, we're not crossing anything, so size > remains unchanged. > > The output mailbox buffer outmad->data (a 192 byte array), then gets cast to a > sturt mlx4_cable_info structure, which has its own internal data buffer that is > only 48 bytes long. Hi guys, Thanks for digging into it. Here are some observations I found: 1. Chris system has CX4 (which is served by mlx5 driver), all analysis by Neil was done over mlx4 driver (which serves the older generation of NICs. e.h CX3Pro). 2. In general, MAD commands are limited to 192 bytes of data. 3. CableInfo MAD command info is limited to 48 Bytes. 4. First check that mlx4_get_module_info is having: if (size > MODULE_INFO_MAX_READ) size = MODULE_INFO_MAX_READ; So this is the info that were missing in the analysis. x <= 48 is also returned by this function. No trash copy or overrun. It is expected from the caller(also inside mlx4) to recall with new offset in order to fetch more data. 5. I reviewed mlx5 driver, and it have reading mechanism (small diff: via MCIA register and not via MAD) Both drivers read up to 256 bytes. 0-127 (from page 0). and 128-256 (from page 0). Driver is not capable of reading over 256 bytes currently. looking on qsfp.c parser in ethtool.c (user space), I see an uninitialized bug issue that have caused bug #1 + #2. Applied it locally solved the issue (Not showing alarm data, which should be expected as driver do not fill it). diff --git a/qsfp.c b/qsfp.c index 32e195d12dc0..d196aa1753de 100644 --- a/qsfp.c +++ b/qsfp.c @@ -671,7 +671,7 @@ static void sff8636_dom_parse(const __u8 *id, struct sff_diags *sd) static void sff8636_show_dom(const __u8 *id, __u32 eeprom_len) { - struct sff_diags sd; + struct sff_diags sd = {0}; char *rx_power_string = NULL; char power_string[MAX_DESC_SIZE]; int i; I will soon post a fix for it. Thanks, Eran Thanks, Eran > > The memcpy in this functionthen copies cable_info->data to the buffer that gets > returned to ethtool, but it copies size bytes (256), even though the source data > buffer is only 48 bytes long. That 48 byte array is embedded in the larger 192 > byte structure, so there won't be a panic on the overrun, but theres no telling > what garbage is in the buffer beyond those first 48 bytes. Even if the > remaining 144 bytes have valid eeprom data, its less than the required 256 > bytes. The additional copy may cause a panic, but if the buffer commonly bumps > up against other allocated memory, that will go unnoticed. > > after the memcpy, mlx4_get_module_info just returns the size of the passed in > buffer (256), and so the calling function thinks its work is done, and lets the > kernel send back the buffer with garbage data to ethtool. > > I think the mlx4 guys have some work to do here. > > My $0.02 > Neil > ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-27 13:25 ` Eran Ben Elisha @ 2018-09-27 14:52 ` Andrew Lunn 2018-09-27 15:20 ` Eran Ben Elisha 0 siblings, 1 reply; 16+ messages in thread From: Andrew Lunn @ 2018-09-27 14:52 UTC (permalink / raw) To: Eran Ben Elisha Cc: Neil Horman, Chris Preimesberger, linville@tuxdriver.com, netdev@vger.kernel.org > Both drivers read up to 256 bytes. 0-127 (from page 0). and 128-256 (from > page 0). Driver is not capable of reading over 256 bytes currently. Hi Erin There should not be any need to read more than 256 bytes. For older SFP devices, two addresses on the i2c bus are used, each with 256 bytes. For QSFP, one address is used, and you swap page by writing to offset 127. > looking on qsfp.c parser in ethtool.c (user space), I see an uninitialized > bug issue that have caused bug #1 + #2. > Applied it locally solved the issue (Not showing alarm data, which should be > expected as driver do not fill it). There appears to be a second bug somewhere. dumping the module info using HEX returned 256 bytes. But the binary dump had more bytes. Since you have the hardware, could you look into this? Thanks Andrew ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-27 14:52 ` Andrew Lunn @ 2018-09-27 15:20 ` Eran Ben Elisha 2018-09-27 15:32 ` Andrew Lunn 0 siblings, 1 reply; 16+ messages in thread From: Eran Ben Elisha @ 2018-09-27 15:20 UTC (permalink / raw) To: Andrew Lunn Cc: Neil Horman, Chris Preimesberger, linville@tuxdriver.com, netdev@vger.kernel.org On 9/27/2018 5:52 PM, Andrew Lunn wrote: >> Both drivers read up to 256 bytes. 0-127 (from page 0). and 128-256 (from >> page 0). Driver is not capable of reading over 256 bytes currently. > > Hi Erin > > There should not be any need to read more than 256 bytes. For older > SFP devices, two addresses on the i2c bus are used, each with 256 > bytes. For QSFP, one address is used, and you swap page by writing to > offset 127. > >> looking on qsfp.c parser in ethtool.c (user space), I see an uninitialized >> bug issue that have caused bug #1 + #2. >> Applied it locally solved the issue (Not showing alarm data, which should be >> expected as driver do not fill it). > > There appears to be a second bug somewhere. dumping the module info > using HEX returned 256 bytes. But the binary dump had more bytes. > Since you have the hardware, could you look into this? See fix I posted few minutes ago. title: [PATCH ethtool] ethtool: Fix uninitialized variable use at qsfp dump This is HEX dump, similar for both with/without the fix: Offset Values ------ ------ 0x0000: 11 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0010: 00 00 00 00 00 00 2a 2a 00 00 7f 0b 00 00 00 00 0x0020: 00 00 38 b6 3e 50 2b e9 40 0d 47 0d 47 ac 48 58 0x0030: 49 0f 3a 09 36 77 39 c9 3a 6a 00 00 00 00 00 00 0x0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0050: 00 00 00 00 00 00 00 aa aa 00 00 00 00 01 00 00 0x0060: 00 00 ff 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0080: 11 cc 07 80 00 00 00 00 00 00 00 05 ff 00 0a 00 0x0090: 00 00 00 44 4d 65 6c 6c 61 6e 6f 78 20 20 20 20 0x00a0: 20 20 20 20 00 00 02 c9 4d 4d 41 31 4c 31 30 2d 0x00b0: 43 52 20 20 20 20 20 20 41 31 65 bf 00 ce 00 60 0x00c0: 03 07 ff de 4d 54 31 36 33 39 44 4d 30 30 30 32 0x00d0: 36 20 20 20 31 36 30 39 32 36 20 20 0c 10 68 40 0x00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x00f0: 00 00 00 00 00 00 00 00 00 00 00 00 14 31 00 00 This is parsed output before the fix: Identifier : 0x11 (QSFP28) Extended identifier : 0xcc Extended identifier description : 3.5W max. Power consumption Extended identifier description : CDR present in TX, CDR present in RX Extended identifier description : High Power Class (> 3.5 W) not enabled Connector : 0x07 (LC) Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Transceiver type : 100G Ethernet: 100G Base-LR4 Encoding : 0x05 (64B/66B) BR, Nominal : 25500Mbps Rate identifier : 0x00 Length (SMF,km) : 10km Length (OM3 50um) : 0m Length (OM2 50um) : 0m Length (OM1 62.5um) : 0m Length (Copper or Active cable) : 0m Transmitter technology : 0x40 (1310 nm DFB) Laser wavelength : 1302.350nm Laser wavelength tolerance : 1.030nm Vendor name : Mellanox Vendor OUI : 00:02:c9 Vendor PN : MMA1L10-CR Vendor rev : A1 Vendor SN : MT1639DM00026 Date code : 160926 Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7 Module temperature : 42.16 degrees C / 107.90 degrees F Module voltage : 3.2523 V Alarm/warning flags implemented : Yes Laser tx bias current (Channel 1) : 36.454 mA Laser tx bias current (Channel 2) : 36.696 mA Laser tx bias current (Channel 3) : 37.006 mA Laser tx bias current (Channel 4) : 37.404 mA Transmit avg optical power (Channel 1) : 1.4812 mW / 1.71 dBm Transmit avg optical power (Channel 2) : 1.3942 mW / 1.44 dBm Transmit avg optical power (Channel 3) : 1.4793 mW / 1.70 dBm Transmit avg optical power (Channel 4) : 1.4949 mW / 1.75 dBm Rcvr signal avg optical power(Channel 1) : 1.4489 mW / 1.61 dBm Rcvr signal avg optical power(Channel 2) : 1.5911 mW / 2.02 dBm Rcvr signal avg optical power(Channel 3) : 1.1196 mW / 0.49 dBm Rcvr signal avg optical power(Channel 4) : 1.6397 mW / 2.15 dBm Laser bias current high alarm (Chan 1) : Off Laser bias current low alarm (Chan 1) : Off Laser bias current high warning (Chan 1) : Off Laser bias current low warning (Chan 1) : Off Laser bias current high alarm (Chan 2) : Off Laser bias current low alarm (Chan 2) : Off Laser bias current high warning (Chan 2) : Off Laser bias current low warning (Chan 2) : Off Laser bias current high alarm (Chan 3) : Off Laser bias current low alarm (Chan 3) : Off Laser bias current high warning (Chan 3) : Off Laser bias current low warning (Chan 3) : Off Laser bias current high alarm (Chan 4) : Off Laser bias current low alarm (Chan 4) : Off Laser bias current high warning (Chan 4) : Off Laser bias current low warning (Chan 4) : Off Module temperature high alarm : Off Module temperature low alarm : Off Module temperature high warning : Off Module temperature low warning : Off Module voltage high alarm : Off Module voltage low alarm : Off Module voltage high warning : Off Module voltage low warning : Off Laser tx power high alarm (Channel 1) : Off Laser tx power low alarm (Channel 1) : Off Laser tx power high warning (Channel 1) : Off Laser tx power low warning (Channel 1) : Off Laser tx power high alarm (Channel 2) : Off Laser tx power low alarm (Channel 2) : Off Laser tx power high warning (Channel 2) : Off Laser tx power low warning (Channel 2) : Off Laser tx power high alarm (Channel 3) : Off Laser tx power low alarm (Channel 3) : Off Laser tx power high warning (Channel 3) : Off Laser tx power low warning (Channel 3) : Off Laser tx power high alarm (Channel 4) : Off Laser tx power low alarm (Channel 4) : Off Laser tx power high warning (Channel 4) : Off Laser tx power low warning (Channel 4) : Off Laser rx power high alarm (Channel 1) : Off Laser rx power low alarm (Channel 1) : Off Laser rx power high warning (Channel 1) : Off Laser rx power low warning (Channel 1) : Off Laser rx power high alarm (Channel 2) : Off Laser rx power low alarm (Channel 2) : Off Laser rx power high warning (Channel 2) : Off Laser rx power low warning (Channel 2) : Off Laser rx power high alarm (Channel 3) : Off Laser rx power low alarm (Channel 3) : Off Laser rx power high warning (Channel 3) : Off Laser rx power low warning (Channel 3) : Off Laser rx power high alarm (Channel 4) : Off Laser rx power low alarm (Channel 4) : Off Laser rx power high warning (Channel 4) : Off Laser rx power low warning (Channel 4) : Off Laser bias current high alarm threshold : 0.000 mA Laser bias current low alarm threshold : 0.000 mA Laser bias current high warning threshold : 0.000 mA Laser bias current low warning threshold : 0.000 mA Laser output power high alarm threshold : 0.0000 mW / -inf dBm Laser output power low alarm threshold : 0.0000 mW / -inf dBm Laser output power high warning threshold : 0.0000 mW / -inf dBm Laser output power low warning threshold : 0.0000 mW / -inf dBm Module temperature high alarm threshold : 0.00 degrees C / 32.00 degrees F Module temperature low alarm threshold : 0.00 degrees C / 32.00 degrees F Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F Module temperature low warning threshold : 0.00 degrees C / 32.00 degrees F Module voltage high alarm threshold : 0.0000 V Module voltage low alarm threshold : 0.0000 V Module voltage high warning threshold : 0.0000 V Module voltage low warning threshold : 0.0000 V Laser rx power high alarm threshold : 0.0000 mW / -inf dBm Laser rx power low alarm threshold : 0.0000 mW / -inf dBm Laser rx power high warning threshold : 0.0000 mW / -inf dBm Laser rx power low warning threshold : 0.0000 mW / -inf dBm This is parsed output after the fix: Identifier : 0x11 (QSFP28) Extended identifier : 0xcc Extended identifier description : 3.5W max. Power consumption Extended identifier description : CDR present in TX, CDR present in RX Extended identifier description : High Power Class (> 3.5 W) not enabled Connector : 0x07 (LC) Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Transceiver type : 100G Ethernet: 100G Base-LR4 Encoding : 0x05 (64B/66B) BR, Nominal : 25500Mbps Rate identifier : 0x00 Length (SMF,km) : 10km Length (OM3 50um) : 0m Length (OM2 50um) : 0m Length (OM1 62.5um) : 0m Length (Copper or Active cable) : 0m Transmitter technology : 0x40 (1310 nm DFB) Laser wavelength : 1302.350nm Laser wavelength tolerance : 1.030nm Vendor name : Mellanox Vendor OUI : 00:02:c9 Vendor PN : MMA1L10-CR Vendor rev : A1 Vendor SN : MT1639DM00026 Date code : 160926 Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7 Module temperature : 42.16 degrees C / 107.90 degrees F Module voltage : 3.2523 V Alarm/warning flags implemented : No Laser tx bias current (Channel 1) : 36.462 mA Laser tx bias current (Channel 2) : 36.668 mA Laser tx bias current (Channel 3) : 37.000 mA Laser tx bias current (Channel 4) : 37.416 mA Transmit avg optical power (Channel 1) : 1.4812 mW / 1.71 dBm Transmit avg optical power (Channel 2) : 1.3940 mW / 1.44 dBm Transmit avg optical power (Channel 3) : 1.4829 mW / 1.71 dBm Transmit avg optical power (Channel 4) : 1.4866 mW / 1.72 dBm Rcvr signal avg optical power(Channel 1) : 1.4518 mW / 1.62 dBm Rcvr signal avg optical power(Channel 2) : 1.5938 mW / 2.02 dBm Rcvr signal avg optical power(Channel 3) : 1.1211 mW / 0.50 dBm Rcvr signal avg optical power(Channel 4) : 1.6378 mW / 2.14 dBm Major diff: * Alarm/warning flags implemented : No * All alarm data is not presented. Driver return 256 bytes (reading it correctly, I verified it, no overruns), however the extra bytes are presented due to this bug (expecting to parse 640 bytes). Do you see another bug here? Am I missing something? Eran > > Thanks > Andrew > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-27 15:20 ` Eran Ben Elisha @ 2018-09-27 15:32 ` Andrew Lunn 2018-09-27 16:08 ` Chris Preimesberger 2018-10-02 7:10 ` Eran Ben Elisha 0 siblings, 2 replies; 16+ messages in thread From: Andrew Lunn @ 2018-09-27 15:32 UTC (permalink / raw) To: Eran Ben Elisha Cc: Neil Horman, Chris Preimesberger, linville@tuxdriver.com, netdev@vger.kernel.org > Driver return 256 bytes (reading it correctly, I verified it, no overruns), > however the extra bytes are presented due to this bug (expecting to parse > 640 bytes). > > Do you see another bug here? Am I missing something? Hi Erin Please could you try ethtool -m raw on so you get a binary dump. The file which Chris provided had more bytes in it than 256. Thanks Andrew ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-27 15:32 ` Andrew Lunn @ 2018-09-27 16:08 ` Chris Preimesberger 2018-09-27 16:38 ` Andrew Lunn 2018-10-02 7:10 ` Eran Ben Elisha 1 sibling, 1 reply; 16+ messages in thread From: Chris Preimesberger @ 2018-09-27 16:08 UTC (permalink / raw) To: Andrew Lunn, Eran Ben Elisha Cc: Neil Horman, linville@tuxdriver.com, netdev@vger.kernel.org Please correct me if I'm wrong, but... It looks like Eran's proposed fix would remove all warning and alarm indications from ethtool's output. It's worth mentioning that for me, the following fields always reported correctly as Off while no alarm condition was present and On while alarm condition(s) were present *per the QSFP's true/programmed threshold values* *not per the incorrectly reported threshold values* Laser bias current high alarm (Chan 1) : Off Laser bias current low alarm (Chan 1) : Off Laser bias current high warning (Chan 1) : Off Laser bias current low warning (Chan 1) : Off Laser bias current high alarm (Chan 2) : Off Laser bias current low alarm (Chan 2) : Off Laser bias current high warning (Chan 2) : Off Laser bias current low warning (Chan 2) : Off Laser bias current high alarm (Chan 3) : Off Laser bias current low alarm (Chan 3) : Off Laser bias current high warning (Chan 3) : Off Laser bias current low warning (Chan 3) : Off Laser bias current high alarm (Chan 4) : Off Laser bias current low alarm (Chan 4) : Off Laser bias current high warning (Chan 4) : Off Laser bias current low warning (Chan 4) : Off Module temperature high alarm : Off Module temperature low alarm : Off Module temperature high warning : Off Module temperature low warning : Off Module voltage high alarm : Off Module voltage low alarm : Off Module voltage high warning : Off Module voltage low warning : Off Laser tx power high alarm (Channel 1) : Off Laser tx power low alarm (Channel 1) : Off Laser tx power high warning (Channel 1) : Off Laser tx power low warning (Channel 1) : Off Laser tx power high alarm (Channel 2) : Off Laser tx power low alarm (Channel 2) : Off Laser tx power high warning (Channel 2) : Off Laser tx power low warning (Channel 2) : Off Laser tx power high alarm (Channel 3) : Off Laser tx power low alarm (Channel 3) : Off Laser tx power high warning (Channel 3) : Off Laser tx power low warning (Channel 3) : Off Laser tx power high alarm (Channel 4) : Off Laser tx power low alarm (Channel 4) : Off Laser tx power high warning (Channel 4) : Off Laser tx power low warning (Channel 4) : Off Laser rx power high alarm (Channel 1) : Off Laser rx power low alarm (Channel 1) : Off Laser rx power high warning (Channel 1) : Off Laser rx power low warning (Channel 1) : Off Laser rx power high alarm (Channel 2) : Off Laser rx power low alarm (Channel 2) : Off Laser rx power high warning (Channel 2) : Off Laser rx power low warning (Channel 2) : Off Laser rx power high alarm (Channel 3) : Off Laser rx power low alarm (Channel 3) : Off Laser rx power high warning (Channel 3) : Off Laser rx power low warning (Channel 3) : Off Laser rx power high alarm (Channel 4) : Off Laser rx power low alarm (Channel 4) : Off Laser rx power high warning (Channel 4) : Off Laser rx power low warning (Channel 4) : Off I would like to request that any fix keeps the above information included in the ethtool -m output because it is working and valuable. The only values that report incorrectly can be seen by issuing the command: ethtool -m interfaceXXX | grep threshold Ideally, any fix would display the thresholds correctly instead of omit them. Thank you and best regards, Chris ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-27 16:08 ` Chris Preimesberger @ 2018-09-27 16:38 ` Andrew Lunn 2018-09-27 18:56 ` Chris Preimesberger 0 siblings, 1 reply; 16+ messages in thread From: Andrew Lunn @ 2018-09-27 16:38 UTC (permalink / raw) To: Chris Preimesberger Cc: Eran Ben Elisha, Neil Horman, linville@tuxdriver.com, netdev@vger.kernel.org On Thu, Sep 27, 2018 at 04:08:24PM +0000, Chris Preimesberger wrote: > Please correct me if I'm wrong, but... > It looks like Eran's proposed fix would remove all warning and > alarm indications from ethtool's output. It's worth mentioning > that for me, the following fields always reported correctly > as Off while no alarm condition was present > and On while alarm condition(s) were present > *per the QSFP's true/programmed threshold values* > *not per the incorrectly reported threshold values* These alarm values are in the first page. So the information the driver returns does contain this information. What is missing is the thresholds, which are not provided by the driver. But there is a comment in the code: /* * There is no clear identifier to signify the existence of * optical diagnostics similar to SFF-8472. So checking existence * of page 3, will provide the gurantee for existence of alarms * and thresholds * If pagging support exists, then supports_alarms is marked as 1 */ These alarm values are optional. The spec says so. So in order to decide if they are implemented, ethtool looks to see if the thresholds are available. If there are thresholds, it makes sense the alarms are implemented. Unfortunately, the driver never returns the thresholds. So ethtool has no real choice and won't display the alarms since it cannot determine if they are valid. In order to get alarms, the driver needs to be extended to return all the pages. Andrew ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-27 16:38 ` Andrew Lunn @ 2018-09-27 18:56 ` Chris Preimesberger 2018-09-27 20:17 ` Chris Preimesberger 0 siblings, 1 reply; 16+ messages in thread From: Chris Preimesberger @ 2018-09-27 18:56 UTC (permalink / raw) To: 'Andrew Lunn', Eran Ben Elisha Cc: Neil Horman, linville@tuxdriver.com, netdev@vger.kernel.org I greatly appreciate everyone's work on this. Thank you to all. I've had Mellanox support case # 00508027 open for this issue, and just now requested an updated driver from them to resolve, explaining that really smart ethtool developers figured out this was due to the Mellanox driver not reporting thresholds to ethtool. I intend to post back here for posterity if/when I get an updated driver that fixes the issue. Thanks again!! Chris Preimesberger | Test & Validation Engineer Transition Networks, Inc. chrisp@transition.com direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com -----Original Message----- From: Andrew Lunn [mailto:andrew@lunn.ch] Sent: Thursday, September 27, 2018 11:38 AM To: Chris Preimesberger Cc: Eran Ben Elisha; Neil Horman; linville@tuxdriver.com; netdev@vger.kernel.org Subject: Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers On Thu, Sep 27, 2018 at 04:08:24PM +0000, Chris Preimesberger wrote: > Please correct me if I'm wrong, but... > It looks like Eran's proposed fix would remove all warning and alarm > indications from ethtool's output. It's worth mentioning that for me, > the following fields always reported correctly as Off while no alarm > condition was present and On while alarm condition(s) were present > *per the QSFP's true/programmed threshold values* *not per the > incorrectly reported threshold values* These alarm values are in the first page. So the information the driver returns does contain this information. What is missing is the thresholds, which are not provided by the driver. But there is a comment in the code: /* * There is no clear identifier to signify the existence of * optical diagnostics similar to SFF-8472. So checking existence * of page 3, will provide the gurantee for existence of alarms * and thresholds * If pagging support exists, then supports_alarms is marked as 1 */ These alarm values are optional. The spec says so. So in order to decide if they are implemented, ethtool looks to see if the thresholds are available. If there are thresholds, it makes sense the alarms are implemented. Unfortunately, the driver never returns the thresholds. So ethtool has no real choice and won't display the alarms since it cannot determine if they are valid. In order to get alarms, the driver needs to be extended to return all the pages. Andrew ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-27 18:56 ` Chris Preimesberger @ 2018-09-27 20:17 ` Chris Preimesberger 0 siblings, 0 replies; 16+ messages in thread From: Chris Preimesberger @ 2018-09-27 20:17 UTC (permalink / raw) To: 'Andrew Lunn', 'Eran Ben Elisha' Cc: 'Neil Horman', 'linville@tuxdriver.com', 'netdev@vger.kernel.org' Update for posterity- Mellanox support provided a work-around of using mlxcables instead of ethtool to read alarm/warning info for an installed transceiver. I was told that a couple of their engineers are currently looking into the discrepancy between threshold reporting by mlxcables and ethtool, and that they are deciding what to do about it... Work-around steps: 1. add a cable with "sudo mst cable add". 2. find the cable name with "sudo mlxcables". The name of my cable is 01:00.0_cable_0 so I copy that name for insertion into the next command. 3. probe the cable for DDM with "sudo mlxcables -d 01:00.0_cable_0 --DDM". Example copied/pasted from my CLI here. All reported thresholds appear to be correct. tech1@D7:~$ tech1@D7:~$ tech1@D7:~$ sudo mst cable add -I- Added 1 cable devices .. tech1@D7:~$ sudo mlxcables Querying Cables .... Cable #1: --------- Cable name : 01:00.0_cable_0 >> No FW data to show -------- Cable EEPROM -------- Identifier : QSFP28 (11h) Technology : 850 nm VCSEL (00h) Compliance : Extended Specification Compliance is valid, 100GBASE-SR4 or 25GBASE-SR Wavelength : 850 nm OUI : 0x00c0f2 Vendor : TRANSITION Serial number : TN02000263 Part number : TN-QSFP-100G-SR4 Revision : 02 Temperature : 34 C Length : 50 m tech1@D7:~$ sudo mlxcables -d 01:00.0_cable_0 --DDM Cable DDM: ---------- Temperature : 34C Voltage : 3.2918V Channel 1: RX Power : 0.1695dBm TX Power : 0.8622dBm TX Bias : 7.0720mA Channel 2: RX Power : 0.1355dBm TX Power : 1.1042dBm TX Bias : 6.9240mA Channel 3: RX Power : -0.1592dBm TX Power : 0.6547dBm TX Bias : 6.9420mA Channel 4: RX Power : -0.1300dBm TX Power : 0.4653dBm TX Bias : 6.9120mA ----- Thresholds ----- Temperature: High Warning : 70C Low Warning : 0C High Alarm : 75C Low Alarm : -5C Warning mask : 0 Alarm mask : 0 Voltage: High Warning : 3.4600V Low Warning : 3.1300V High Alarm : 3.6300V Low Alarm : 2.9700V Warning mask : 0 Alarm mask : 0 Channel 1: RX Power high warn : 2.4000dBm RX Power low warn : -9.5001dBm RX Power high alarm : 5.4103dBm RX Power low alarm : -12.5104dBm RX Power Warning mask: 0 RX Power Alarm mask : 0 TX Power high warn : 2.4000dBm TX Power low warn : -7.6020dBm TX Power high alarm : 3.1917dBm TX Power low alarm : -8.5699dBm TX Power Warning mask: 0 TX Power Alarm mask : 0 TX Bias high warn : 12.0000mA TX Bias low warn : 2.0000mA TX Bias high alarm : 15.0000mA TX Bias low alarm : 1.0000mA TX Bias Warning mask : 0 TX Bias Alarm mask : 0 Channel 2: RX Power high warn : 2.4000dBm RX Power low warn : -9.5001dBm RX Power high alarm : 5.4103dBm RX Power low alarm : -12.5104dBm RX Power Warning mask: 0 RX Power Alarm mask : 0 TX Power high warn : 2.4000dBm TX Power low warn : -7.6020dBm TX Power high alarm : 3.1917dBm TX Power low alarm : -8.5699dBm TX Power Warning mask: 0 TX Power Alarm mask : 0 TX Bias high warn : 12.0000mA TX Bias low warn : 2.0000mA TX Bias high alarm : 15.0000mA TX Bias low alarm : 1.0000mA TX Bias Warning mask : 0 TX Bias Alarm mask : 0 Channel 3: RX Power high warn : 2.4000dBm RX Power low warn : -9.5001dBm RX Power high alarm : 5.4103dBm RX Power low alarm : -12.5104dBm RX Power Warning mask: 0 RX Power Alarm mask : 0 TX Power high warn : 2.4000dBm TX Power low warn : -7.6020dBm TX Power high alarm : 3.1917dBm TX Power low alarm : -8.5699dBm TX Power Warning mask: 0 TX Power Alarm mask : 0 TX Bias high warn : 12.0000mA TX Bias low warn : 2.0000mA TX Bias high alarm : 15.0000mA TX Bias low alarm : 1.0000mA TX Bias Warning mask : 0 TX Bias Alarm mask : 0 Channel 4: RX Power high warn : 2.4000dBm RX Power low warn : -9.5001dBm RX Power high alarm : 5.4103dBm RX Power low alarm : -12.5104dBm RX Power Warning mask: 0 RX Power Alarm mask : 0 TX Power high warn : 2.4000dBm TX Power low warn : -7.6020dBm TX Power high alarm : 3.1917dBm TX Power low alarm : -8.5699dBm TX Power Warning mask: 0 TX Power Alarm mask : 0 TX Bias high warn : 12.0000mA TX Bias low warn : 2.0000mA TX Bias high alarm : 15.0000mA TX Bias low alarm : 1.0000mA TX Bias Warning mask : 0 TX Bias Alarm mask : 0 tech1@D7:~$ tech1@D7:~$ tech1@D7:~$ Chris Preimesberger ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers 2018-09-27 15:32 ` Andrew Lunn 2018-09-27 16:08 ` Chris Preimesberger @ 2018-10-02 7:10 ` Eran Ben Elisha 1 sibling, 0 replies; 16+ messages in thread From: Eran Ben Elisha @ 2018-10-02 7:10 UTC (permalink / raw) To: Andrew Lunn Cc: Eran Ben Elisha, nhorman, chrisp, John W. Linville, Linux Netdev List On Thu, Sep 27, 2018 at 6:34 PM Andrew Lunn <andrew@lunn.ch> wrote: > > > Driver return 256 bytes (reading it correctly, I verified it, no overruns), > > however the extra bytes are presented due to this bug (expecting to parse > > 640 bytes). > > > > Do you see another bug here? Am I missing something? > > Hi Erin Eran... > > Please could you try ethtool -m raw on so you get a binary dump. The > file which Chris provided had more bytes in it than 256. I ran '-m raw on' on QSFP28. File size is 256 bytes. (with and without my suggested patch...) Eran > > Thanks > Andrew ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-10-02 13:52 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-09-26 19:29 bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers Chris Preimesberger 2018-09-26 19:44 ` Andrew Lunn 2018-09-26 20:47 ` Chris Preimesberger 2018-09-26 21:46 ` Andrew Lunn 2018-09-26 21:34 ` Neil Horman 2018-09-26 21:58 ` Andrew Lunn 2018-09-27 13:23 ` Neil Horman 2018-09-27 13:25 ` Eran Ben Elisha 2018-09-27 14:52 ` Andrew Lunn 2018-09-27 15:20 ` Eran Ben Elisha 2018-09-27 15:32 ` Andrew Lunn 2018-09-27 16:08 ` Chris Preimesberger 2018-09-27 16:38 ` Andrew Lunn 2018-09-27 18:56 ` Chris Preimesberger 2018-09-27 20:17 ` Chris Preimesberger 2018-10-02 7:10 ` Eran Ben Elisha
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).