* "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
@ 2024-05-21 6:26 Krzysztof Olędzki
2024-05-21 6:55 ` Michal Kubecek
0 siblings, 1 reply; 21+ messages in thread
From: Krzysztof Olędzki @ 2024-05-21 6:26 UTC (permalink / raw)
To: Michal Kubecek, Vladyslav Tarasiuk, Moshe Shemesh; +Cc: netdev@vger.kernel.org
Hi,
After upgrading from 5.4-stable to 6.6-stable I noticed that modern ethtool -m stopped working with ports where I have QSFP modules installed in my CX3 / CX3-Pro cards.
Git bisect identified the following patch as being responsible for the issue:
https://git.kernel.org/pub/scm/network/ethtool/ethtool.git/commit/?id=25b64c66f58d3df0ad7272dda91c3ab06fe7a303
Without the patch, I get the following output:
# ./ethtool --debug 3 -m eth3
Identifier : 0x0d (QSFP+)
Extended identifier : 0x00
Extended identifier description : 1.5W max. Power consumption
Extended identifier description : No CDR in TX, No CDR in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Connector : 0x0c (MPO Parallel Optic)
Transceiver codes : 0x00 0x00 0x30 0x00 0x00 0x00 0x00 0x00
Transceiver type : SAS 6.0G
Transceiver type : SAS 3.0G
Encoding : 0x01 (8B/10B)
BR, Nominal : 0Mbps
Rate identifier : 0x00
Length (SMF,km) : 0km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 0m
Transmitter technology : 0x00 (850 nm VCSEL)
Laser wavelength : 850.000nm
Laser wavelength tolerance : 10.000nm
Vendor name : AVAGO
Vendor OUI : 00:17:6a
Vendor PN : <REDACTED>
Vendor rev : A0
Vendor SN : <REDACTED>
Date code : <REDACTED>
Revision Compliance : Revision not specified
Module temperature : 40.07 degrees C / 104.13 degrees F
Module voltage : 3.2914 V
Alarm/warning flags implemented : No
Laser tx bias current (Channel 1) : 6.556 mA
Laser tx bias current (Channel 2) : 0.000 mA
Laser tx bias current (Channel 3) : 0.000 mA
Laser tx bias current (Channel 4) : 0.000 mA
Transmit avg optical power (Channel 1) : 0.8101 mW / -0.91 dBm
Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 1) : 0.5716 mW / -2.43 dBm
Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm
With the patch (included in ethtool-5.13):
# ./ethtool --debug 3 -m eth3
sending genetlink packet (32 bytes):
msg length 32 genl-ctrl
received genetlink packet (956 bytes):
msg length 956 genl-ctrl
received genetlink packet (36 bytes):
msg length 36 error errno=0
sending genetlink packet (76 bytes):
msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
received genetlink packet (176 bytes):
msg length 176 ethool ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
received genetlink packet (36 bytes):
msg length 36 error errno=0
sending genetlink packet (76 bytes):
msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
received genetlink packet (176 bytes):
msg length 176 ethool ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
received genetlink packet (36 bytes):
msg length 36 error errno=0
sending genetlink packet (76 bytes):
msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
received genetlink packet (96 bytes):
msg length 96 error errno=-22
netlink error: Invalid argument
I suspect this works on the older kernels because of the lack of netlink support (which got introduced somewhere between 5.4 and 6.6), so ethtool automatically fallbacks to the ioctl interface:
sending genetlink packet (32 bytes):
msg length 32 genl-ctrl
received genetlink packet (52 bytes):
msg length 52 error errno=-2
(...)
Also note that non-DOM SFP does work:
# ./ethtool --debug 3 -m eth2
sending genetlink packet (32 bytes):
msg length 32 genl-ctrl
received genetlink packet (956 bytes):
msg length 956 genl-ctrl
received genetlink packet (36 bytes):
msg length 36 error errno=0
sending genetlink packet (76 bytes):
msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
received genetlink packet (176 bytes):
msg length 176 ethool ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
received genetlink packet (36 bytes):
msg length 36 error errno=0
sending genetlink packet (76 bytes):
msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
received genetlink packet (176 bytes):
msg length 176 ethool ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
received genetlink packet (36 bytes):
msg length 36 error errno=0
Identifier : 0x03 (SFP)
Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
(...)
Still need to check if SFP w/DOM works, but it seems we are missing reporting "Optical diagnostics support" as sff8472_show_all() is only called only for the non-netlink interface?
Compiling ethtool with --disable-netlink also fixes both the problems, but obviously this is just a workaround.
Thanks,
Krzysztof Olędzki
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-21 6:26 "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink? Krzysztof Olędzki
@ 2024-05-21 6:55 ` Michal Kubecek
2024-05-21 7:02 ` Krzysztof Olędzki
0 siblings, 1 reply; 21+ messages in thread
From: Michal Kubecek @ 2024-05-21 6:55 UTC (permalink / raw)
To: Krzysztof Olędzki
Cc: Vladyslav Tarasiuk, Moshe Shemesh, netdev@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 619 bytes --]
On Mon, May 20, 2024 at 11:26:56PM -0700, Krzysztof Olędzki wrote:
> Hi,
>
> After upgrading from 5.4-stable to 6.6-stable I noticed that modern ethtool -m stopped working with ports where I have QSFP modules installed in my CX3 / CX3-Pro cards.
>
> Git bisect identified the following patch as being responsible for the issue:
> https://git.kernel.org/pub/scm/network/ethtool/ethtool.git/commit/?id=25b64c66f58d3df0ad7272dda91c3ab06fe7a303
Sounds like the issue that was fixed by commit 1a1dcfca4d67 ("ethtool:
Fix SFF-8472 transceiver module identification"). Can you try ethtool
version 6.7?
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-21 6:55 ` Michal Kubecek
@ 2024-05-21 7:02 ` Krzysztof Olędzki
2024-05-21 7:16 ` Krzysztof Olędzki
2024-05-21 7:34 ` Michal Kubecek
0 siblings, 2 replies; 21+ messages in thread
From: Krzysztof Olędzki @ 2024-05-21 7:02 UTC (permalink / raw)
To: Michal Kubecek; +Cc: Moshe Shemesh, netdev@vger.kernel.org
-vladyslavt@nvidia.com (User unknown).
Hi Michal,
On 20.05.2024 at 23:55, Michal Kubecek wrote:
> On Mon, May 20, 2024 at 11:26:56PM -0700, Krzysztof Olędzki wrote:
>> Hi,
>>
>> After upgrading from 5.4-stable to 6.6-stable I noticed that modern ethtool -m stopped working with ports where I have QSFP modules installed in my CX3 / CX3-Pro cards.
>>
>> Git bisect identified the following patch as being responsible for the issue:
>> https://git.kernel.org/pub/scm/network/ethtool/ethtool.git/commit/?id=25b64c66f58d3df0ad7272dda91c3ab06fe7a303
>
> Sounds like the issue that was fixed by commit 1a1dcfca4d67 ("ethtool:
> Fix SFF-8472 transceiver module identification"). Can you try ethtool
> version 6.7?
Yes, forgot to mention - this problem also exists in ethtool-6.7:
# ./ethtool --version
ethtool version 6.7
# ./ethtool --debug 3 -m eth3
sending genetlink packet (32 bytes):
msg length 32 genl-ctrl
received genetlink packet (956 bytes):
msg length 956 genl-ctrl
received genetlink packet (36 bytes):
msg length 36 error errno=0
sending genetlink packet (76 bytes):
msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
received genetlink packet (52 bytes):
msg length 52 ethool ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
received genetlink packet (36 bytes):
msg length 36 error errno=0
sending genetlink packet (76 bytes):
msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
received genetlink packet (176 bytes):
msg length 176 ethool ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
received genetlink packet (36 bytes):
msg length 36 error errno=0
sending genetlink packet (76 bytes):
msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
received genetlink packet (176 bytes):
msg length 176 ethool ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
received genetlink packet (36 bytes):
msg length 36 error errno=0
sending genetlink packet (76 bytes):
msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
received genetlink packet (96 bytes):
msg length 96 error errno=-22
netlink error: Invalid argument
Thanks,
Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-21 7:02 ` Krzysztof Olędzki
@ 2024-05-21 7:16 ` Krzysztof Olędzki
2024-05-21 7:34 ` Michal Kubecek
1 sibling, 0 replies; 21+ messages in thread
From: Krzysztof Olędzki @ 2024-05-21 7:16 UTC (permalink / raw)
To: Michal Kubecek; +Cc: Moshe Shemesh, netdev@vger.kernel.org
On 21.05.2024 at 00:02, Krzysztof Olędzki wrote:
> -vladyslavt@nvidia.com (User unknown).
>
> Hi Michal,
>
> On 20.05.2024 at 23:55, Michal Kubecek wrote:
>> On Mon, May 20, 2024 at 11:26:56PM -0700, Krzysztof Olędzki wrote:
>>> Hi,
>>>
>>> After upgrading from 5.4-stable to 6.6-stable I noticed that modern ethtool -m stopped working with ports where I have QSFP modules installed in my CX3 / CX3-Pro cards.
>>>
>>> Git bisect identified the following patch as being responsible for the issue:
>>> https://git.kernel.org/pub/scm/network/ethtool/ethtool.git/commit/?id=25b64c66f58d3df0ad7272dda91c3ab06fe7a303
>>
>> Sounds like the issue that was fixed by commit 1a1dcfca4d67 ("ethtool:
>> Fix SFF-8472 transceiver module identification"). Can you try ethtool
>> version 6.7?
>
> Yes, forgot to mention - this problem also exists in ethtool-6.7:
>
> # ./ethtool --version
> ethtool version 6.7
>
> # ./ethtool --debug 3 -m eth3
> sending genetlink packet (32 bytes):
> msg length 32 genl-ctrl
> received genetlink packet (956 bytes):
> msg length 956 genl-ctrl
> received genetlink packet (36 bytes):
> msg length 36 error errno=0
> sending genetlink packet (76 bytes):
> msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
> received genetlink packet (52 bytes):
> msg length 52 ethool ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
> received genetlink packet (36 bytes):
> msg length 36 error errno=0
> sending genetlink packet (76 bytes):
> msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
> received genetlink packet (176 bytes):
> msg length 176 ethool ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
> received genetlink packet (36 bytes):
> msg length 36 error errno=0
> sending genetlink packet (76 bytes):
> msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
> received genetlink packet (176 bytes):
> msg length 176 ethool ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
> received genetlink packet (36 bytes):
> msg length 36 error errno=0
> sending genetlink packet (76 bytes):
> msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
> received genetlink packet (96 bytes):
> msg length 96 error errno=-22
> netlink error: Invalid argument
That said, DOM seems to work since 5.19 because it includes:
https://git.kernel.org/pub/scm/network/ethtool/ethtool.git/commit/sfpid.c?id=fb92de62eeb1cfbb21f57d60491798df762556d3
Sorry, got confused with my git bisect.
So, we are down to the main issue - Invalid argument for QSFP.
Thanks,
Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-21 7:02 ` Krzysztof Olędzki
2024-05-21 7:16 ` Krzysztof Olędzki
@ 2024-05-21 7:34 ` Michal Kubecek
2024-05-21 7:38 ` Krzysztof Olędzki
1 sibling, 1 reply; 21+ messages in thread
From: Michal Kubecek @ 2024-05-21 7:34 UTC (permalink / raw)
To: Krzysztof Olędzki; +Cc: Moshe Shemesh, netdev@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 544 bytes --]
On Tue, May 21, 2024 at 12:02:47AM -0700, Krzysztof Olędzki wrote:
> # ./ethtool --version
> ethtool version 6.7
>
> # ./ethtool --debug 3 -m eth3
[...]
> sending genetlink packet (76 bytes):
> msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
> received genetlink packet (96 bytes):
> msg length 96 error errno=-22
> netlink error: Invalid argument
Can you do it with "--debug 0x12" or "--debug 0x1e"? It would tell us
which request failed and that might give some hint where does this
EINVAL come from.
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-21 7:34 ` Michal Kubecek
@ 2024-05-21 7:38 ` Krzysztof Olędzki
2024-05-21 20:21 ` Andrew Lunn
0 siblings, 1 reply; 21+ messages in thread
From: Krzysztof Olędzki @ 2024-05-21 7:38 UTC (permalink / raw)
To: Michal Kubecek; +Cc: Moshe Shemesh, netdev@vger.kernel.org
On 21.05.2024 at 00:34, Michal Kubecek wrote:
> On Tue, May 21, 2024 at 12:02:47AM -0700, Krzysztof Olędzki wrote:
>> # ./ethtool --version
>> ethtool version 6.7
>>
>> # ./ethtool --debug 3 -m eth3
> [...]
>> sending genetlink packet (76 bytes):
>> msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
>> received genetlink packet (96 bytes):
>> msg length 96 error errno=-22
>> netlink error: Invalid argument
>
> Can you do it with "--debug 0x12" or "--debug 0x1e"? It would tell us
> which request failed and that might give some hint where does this
> EINVAL come from.
Sure,
# ./ethtool --debug 0x12 -m eth3
sending genetlink packet (32 bytes):
msg length 32 genl-ctrl
CTRL_CMD_GETFAMILY
CTRL_ATTR_FAMILY_NAME = "ethtool"
received genetlink packet (956 bytes):
msg length 956 genl-ctrl
CTRL_CMD_NEWFAMILY
CTRL_ATTR_FAMILY_NAME = "ethtool"
CTRL_ATTR_FAMILY_ID = 21
CTRL_ATTR_VERSION = 1
CTRL_ATTR_HDRSIZE = 0
CTRL_ATTR_MAXATTR = 0
CTRL_ATTR_OPS
[1]
CTRL_ATTR_OP_ID = 1
CTRL_ATTR_OP_FLAGS = 0x0000000e
[2]
CTRL_ATTR_OP_ID = 2
CTRL_ATTR_OP_FLAGS = 0x0000000e
[3]
CTRL_ATTR_OP_ID = 3
CTRL_ATTR_OP_FLAGS = 0x0000001a
[4]
CTRL_ATTR_OP_ID = 4
CTRL_ATTR_OP_FLAGS = 0x0000000e
[5]
CTRL_ATTR_OP_ID = 5
CTRL_ATTR_OP_FLAGS = 0x0000001a
[6]
CTRL_ATTR_OP_ID = 6
CTRL_ATTR_OP_FLAGS = 0x0000000e
[7]
CTRL_ATTR_OP_ID = 7
CTRL_ATTR_OP_FLAGS = 0x0000000e
[8]
CTRL_ATTR_OP_ID = 8
CTRL_ATTR_OP_FLAGS = 0x0000001a
[9]
CTRL_ATTR_OP_ID = 9
CTRL_ATTR_OP_FLAGS = 0x0000001e
[10]
CTRL_ATTR_OP_ID = 10
CTRL_ATTR_OP_FLAGS = 0x0000001a
[11]
CTRL_ATTR_OP_ID = 11
CTRL_ATTR_OP_FLAGS = 0x0000000e
[12]
CTRL_ATTR_OP_ID = 12
CTRL_ATTR_OP_FLAGS = 0x0000001a
[13]
CTRL_ATTR_OP_ID = 13
CTRL_ATTR_OP_FLAGS = 0x0000000e
[14]
CTRL_ATTR_OP_ID = 14
CTRL_ATTR_OP_FLAGS = 0x0000001a
[15]
CTRL_ATTR_OP_ID = 15
CTRL_ATTR_OP_FLAGS = 0x0000000e
[16]
CTRL_ATTR_OP_ID = 16
CTRL_ATTR_OP_FLAGS = 0x0000001a
[17]
CTRL_ATTR_OP_ID = 17
CTRL_ATTR_OP_FLAGS = 0x0000000e
[18]
CTRL_ATTR_OP_ID = 18
CTRL_ATTR_OP_FLAGS = 0x0000001a
[19]
CTRL_ATTR_OP_ID = 19
CTRL_ATTR_OP_FLAGS = 0x0000000e
[20]
CTRL_ATTR_OP_ID = 20
CTRL_ATTR_OP_FLAGS = 0x0000001a
[21]
CTRL_ATTR_OP_ID = 21
CTRL_ATTR_OP_FLAGS = 0x0000000e
[22]
CTRL_ATTR_OP_ID = 22
CTRL_ATTR_OP_FLAGS = 0x0000001a
[23]
CTRL_ATTR_OP_ID = 23
CTRL_ATTR_OP_FLAGS = 0x0000000e
[24]
CTRL_ATTR_OP_ID = 24
CTRL_ATTR_OP_FLAGS = 0x0000001a
[25]
CTRL_ATTR_OP_ID = 25
CTRL_ATTR_OP_FLAGS = 0x0000000e
[26]
CTRL_ATTR_OP_ID = 26
CTRL_ATTR_OP_FLAGS = 0x0000001a
[27]
CTRL_ATTR_OP_ID = 27
CTRL_ATTR_OP_FLAGS = 0x0000001a
[28]
CTRL_ATTR_OP_ID = 28
CTRL_ATTR_OP_FLAGS = 0x0000000e
[29]
CTRL_ATTR_OP_ID = 29
CTRL_ATTR_OP_FLAGS = 0x0000000e
[30]
CTRL_ATTR_OP_ID = 30
CTRL_ATTR_OP_FLAGS = 0x0000001a
[31]
CTRL_ATTR_OP_ID = 31
CTRL_ATTR_OP_FLAGS = 0x0000001e
[32]
CTRL_ATTR_OP_ID = 32
CTRL_ATTR_OP_FLAGS = 0x0000000e
[33]
CTRL_ATTR_OP_ID = 33
CTRL_ATTR_OP_FLAGS = 0x0000000e
[34]
CTRL_ATTR_OP_ID = 34
CTRL_ATTR_OP_FLAGS = 0x0000000e
[35]
CTRL_ATTR_OP_ID = 35
CTRL_ATTR_OP_FLAGS = 0x0000001a
[36]
CTRL_ATTR_OP_ID = 36
CTRL_ATTR_OP_FLAGS = 0x0000000e
[37]
CTRL_ATTR_OP_ID = 37
CTRL_ATTR_OP_FLAGS = 0x0000001a
[38]
CTRL_ATTR_OP_ID = 38
CTRL_ATTR_OP_FLAGS = 0x0000000a
[39]
CTRL_ATTR_OP_ID = 39
CTRL_ATTR_OP_FLAGS = 0x0000000e
[40]
CTRL_ATTR_OP_ID = 40
CTRL_ATTR_OP_FLAGS = 0x0000001a
[41]
CTRL_ATTR_OP_ID = 41
CTRL_ATTR_OP_FLAGS = 0x0000000e
[42]
CTRL_ATTR_OP_ID = 42
CTRL_ATTR_OP_FLAGS = 0x0000000e
[43]
CTRL_ATTR_OP_ID = 43
CTRL_ATTR_OP_FLAGS = 0x0000001a
CTRL_ATTR_MCAST_GROUPS
[1]
CTRL_ATTR_MCAST_GRP_ID = 5
CTRL_ATTR_MCAST_GRP_NAME = "monitor"
received genetlink packet (36 bytes):
msg length 36 error errno=0
sending genetlink packet (76 bytes):
msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
ETHTOOL_MSG_MODULE_EEPROM_GET
ETHTOOL_A_MODULE_EEPROM_HEADER
ETHTOOL_A_HEADER_DEV_NAME = "eth3"
ETHTOOL_A_MODULE_EEPROM_LENGTH = 128
ETHTOOL_A_MODULE_EEPROM_OFFSET = 0
ETHTOOL_A_MODULE_EEPROM_PAGE = 0
ETHTOOL_A_MODULE_EEPROM_BANK = 0
ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS = 80
received genetlink packet (176 bytes):
msg length 176 ethool ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
ETHTOOL_A_MODULE_EEPROM_HEADER
ETHTOOL_A_HEADER_DEV_INDEX = 7
ETHTOOL_A_HEADER_DEV_NAME = "eth3"
ETHTOOL_A_MODULE_EEPROM_DATA =
0d 00 02 00 00 00 00 00 00 05 00 00 00 00 00 00
00 00 00 00 00 00 27 71 00 00 80 0e 00 00 00 00
00 00 16 48 00 00 00 00 00 00 0c f1 00 00 00 00
00 00 1f 93 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 03 01 00 00 00 00 00 00 00 00 00
received genetlink packet (36 bytes):
msg length 36 error errno=0
sending genetlink packet (76 bytes):
msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
ETHTOOL_MSG_MODULE_EEPROM_GET
ETHTOOL_A_MODULE_EEPROM_HEADER
ETHTOOL_A_HEADER_DEV_NAME = "eth3"
ETHTOOL_A_MODULE_EEPROM_LENGTH = 128
ETHTOOL_A_MODULE_EEPROM_OFFSET = 128
ETHTOOL_A_MODULE_EEPROM_PAGE = 0
ETHTOOL_A_MODULE_EEPROM_BANK = 0
ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS = 80
received genetlink packet (176 bytes):
msg length 176 ethool ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY
ETHTOOL_A_MODULE_EEPROM_HEADER
ETHTOOL_A_HEADER_DEV_INDEX = 7
ETHTOOL_A_HEADER_DEV_NAME = "eth3"
ETHTOOL_A_MODULE_EEPROM_DATA =
0d 00 0c 00 00 30 00 00 00 00 00 01 00 00 00 00
00 00 00 00 41 56 41 47 4f 20 20 20 20 20 20 20
20 20 20 20 00 00 17 6a 33 33 32 2d 30 30 33 33
35 20 20 20 20 20 20 20 41 30 42 68 07 d0 46 71
00 00 0f de 51 44 32 35 31 33 37 30 20 20 20 20
20 20 20 20 31 33 30 36 32 31 20 20 08 00 00 29
00 4e 54 41 50 03 3b 06 00 3b 06 00 3b 06 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f9
received genetlink packet (36 bytes):
msg length 36 error errno=0
sending genetlink packet (76 bytes):
msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
ETHTOOL_MSG_MODULE_EEPROM_GET
ETHTOOL_A_MODULE_EEPROM_HEADER
ETHTOOL_A_HEADER_DEV_NAME = "eth3"
ETHTOOL_A_MODULE_EEPROM_LENGTH = 128
ETHTOOL_A_MODULE_EEPROM_OFFSET = 128
ETHTOOL_A_MODULE_EEPROM_PAGE = 3
ETHTOOL_A_MODULE_EEPROM_BANK = 0
ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS = 80
received genetlink packet (96 bytes):
msg length 96 error errno=-22
netlink error: Invalid argument
offending message:
ETHTOOL_MSG_MODULE_EEPROM_GET
ETHTOOL_A_MODULE_EEPROM_HEADER
ETHTOOL_A_HEADER_DEV_NAME = "eth3"
ETHTOOL_A_MODULE_EEPROM_LENGTH = 128
ETHTOOL_A_MODULE_EEPROM_OFFSET = 128
ETHTOOL_A_MODULE_EEPROM_PAGE = 3
ETHTOOL_A_MODULE_EEPROM_BANK = 0
ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS = 80
Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-21 7:38 ` Krzysztof Olędzki
@ 2024-05-21 20:21 ` Andrew Lunn
2024-05-22 4:54 ` Krzysztof Olędzki
0 siblings, 1 reply; 21+ messages in thread
From: Andrew Lunn @ 2024-05-21 20:21 UTC (permalink / raw)
To: Krzysztof Olędzki
Cc: Michal Kubecek, Moshe Shemesh, netdev@vger.kernel.org
> sending genetlink packet (76 bytes):
> msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
> ETHTOOL_MSG_MODULE_EEPROM_GET
> ETHTOOL_A_MODULE_EEPROM_HEADER
> ETHTOOL_A_HEADER_DEV_NAME = "eth3"
> ETHTOOL_A_MODULE_EEPROM_LENGTH = 128
> ETHTOOL_A_MODULE_EEPROM_OFFSET = 128
> ETHTOOL_A_MODULE_EEPROM_PAGE = 3
> ETHTOOL_A_MODULE_EEPROM_BANK = 0
> ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS = 80
> received genetlink packet (96 bytes):
> msg length 96 error errno=-22
This is a mellanox card right?
mlx4_en_get_module_info() and mlx4_en_get_module_eeprom() implement
the old API for reading data from an SFP module. So the ethtool core
will be mapping the new API to the old API. The interesting function
is probably fallback_set_params():
https://elixir.bootlin.com/linux/latest/source/net/ethtool/eeprom.c#L29
and my guess is, you are hitting:
if (offset >= modinfo->eeprom_len)
return -EINVAL;
offset is 3 * 128 + 128 = 512.
mlx4_en_get_module_info() is probably returning eeprom_len of 256?
Could you verify this?
Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-21 20:21 ` Andrew Lunn
@ 2024-05-22 4:54 ` Krzysztof Olędzki
2024-05-22 8:40 ` Ido Schimmel
0 siblings, 1 reply; 21+ messages in thread
From: Krzysztof Olędzki @ 2024-05-22 4:54 UTC (permalink / raw)
To: Andrew Lunn; +Cc: Michal Kubecek, Moshe Shemesh, netdev@vger.kernel.org, idosch
Hi,
On 21.05.2024 at 13:21, Andrew Lunn wrote:
>> sending genetlink packet (76 bytes):
>> msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
>> ETHTOOL_MSG_MODULE_EEPROM_GET
>> ETHTOOL_A_MODULE_EEPROM_HEADER
>> ETHTOOL_A_HEADER_DEV_NAME = "eth3"
>> ETHTOOL_A_MODULE_EEPROM_LENGTH = 128
>> ETHTOOL_A_MODULE_EEPROM_OFFSET = 128
>> ETHTOOL_A_MODULE_EEPROM_PAGE = 3
>> ETHTOOL_A_MODULE_EEPROM_BANK = 0
>> ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS = 80
>> received genetlink packet (96 bytes):
>> msg length 96 error errno=-22
>
> This is a mellanox card right?
Yes, sorry. This is indeed Mellanox (now Nvidia) CX3 / CX3Pro, using the drivers/net/ethernet/mellanox/mlx4 driver.
> mlx4_en_get_module_info() and mlx4_en_get_module_eeprom() implement
> the old API for reading data from an SFP module. So the ethtool core
> will be mapping the new API to the old API. The interesting function
> is probably fallback_set_params():
>
> https://elixir.bootlin.com/linux/latest/source/net/ethtool/eeprom.c#L29
>
> and my guess is, you are hitting:
>
> if (offset >= modinfo->eeprom_len)
> return -EINVAL;
>
> offset is 3 * 128 + 128 = 512.
>
> mlx4_en_get_module_info() is probably returning eeprom_len of 256?
>
> Could you verify this?
Ah, excellent catch Andrew!
# egrep -R 'ETH_MODULE_SFF_[0-9]+_LEN' include/uapi/linux/ethtool.h
#define ETH_MODULE_SFF_8079_LEN 256
#define ETH_MODULE_SFF_8472_LEN 512
#define ETH_MODULE_SFF_8636_LEN 256
#define ETH_MODULE_SFF_8436_LEN 256
The code in mlx4_en_get_module_info (with length annotation):
switch (data[0] /* identifier */) {
case MLX4_MODULE_ID_QSFP:
modinfo->type = ETH_MODULE_SFF_8436;
modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN; // 256
break;
case MLX4_MODULE_ID_QSFP_PLUS:
if (data[1] >= 0x3) { /* revision id */
modinfo->type = ETH_MODULE_SFF_8636;
modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN; // 256
} else {
modinfo->type = ETH_MODULE_SFF_8436;
modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN; // 256
}
break;
case MLX4_MODULE_ID_QSFP28:
modinfo->type = ETH_MODULE_SFF_8636;
modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN; // 256
break;
case MLX4_MODULE_ID_SFP:
modinfo->type = ETH_MODULE_SFF_8472;
modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN; // 512
break;
default:
return -EINVAL;
}
So right, the function returns 512 for SFP and 256 for everything else, which explains why SFP does work but QSFP - not.
Following your advice, I added some debug printks to net/ethtool/eeprom.c:
@@ -33,16 +33,24 @@
u32 offset = request->offset;
u32 length = request->length;
+ printk("A: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
+
if (request->page)
offset = request->page * ETH_MODULE_EEPROM_PAGE_LEN + offset;
+ printk("B: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
+
if (modinfo->type == ETH_MODULE_SFF_8472 &&
request->i2c_address == 0x51)
offset += ETH_MODULE_EEPROM_PAGE_LEN * 2;
+ printk("C: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
+
if (offset >= modinfo->eeprom_len)
return -EINVAL;
+ printk("D: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
+
eeprom->cmd = ETHTOOL_GMODULEEEPROM;
eeprom->len = length;
eeprom->offset = offset;
Here is the result:
SFP:
A: offset=0, modinfo->eeprom_len=512
B: offset=0, modinfo->eeprom_len=512
C: offset=0, modinfo->eeprom_len=512
D: offset=0, modinfo->eeprom_len=512
A: offset=0, modinfo->eeprom_len=512
B: offset=0, modinfo->eeprom_len=512
C: offset=0, modinfo->eeprom_len=512
D: offset=0, modinfo->eeprom_len=512
QSFP:
A: offset=0, modinfo->eeprom_len=256
B: offset=0, modinfo->eeprom_len=256
C: offset=0, modinfo->eeprom_len=256
D: offset=0, modinfo->eeprom_len=256
A: offset=0, modinfo->eeprom_len=256
B: offset=0, modinfo->eeprom_len=256
C: offset=0, modinfo->eeprom_len=256
D: offset=0, modinfo->eeprom_len=256
A: offset=128, modinfo->eeprom_len=256
B: offset=128, modinfo->eeprom_len=256
C: offset=128, modinfo->eeprom_len=256
D: offset=128, modinfo->eeprom_len=256
A: offset=128, modinfo->eeprom_len=256
B: offset=512, modinfo->eeprom_len=256
C: offset=512, modinfo->eeprom_len=256
Note - no "D" as -EINVAL is returned exactly as you predicted.
BTW: there is another suspicious looking thing in this code:
- "u32 length = request->length;" is set early in the function
- length is never updated
- at the end, we have "eeprom->len = length"
In this case, the existence of length seems at least seems redundant, unless I missed something?
For the reference, the function was added in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=96d971e307cc0e434f96329b42bbd98cfbca07d2
Later https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3bb7b63813f674fb62bac321cdd897cc62de094 changed ETH_MODULE_SFF_8079 to ETH_MODULE_SFF_8472.
Thanks,
Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-22 4:54 ` Krzysztof Olędzki
@ 2024-05-22 8:40 ` Ido Schimmel
2024-05-22 12:44 ` Andrew Lunn
0 siblings, 1 reply; 21+ messages in thread
From: Ido Schimmel @ 2024-05-22 8:40 UTC (permalink / raw)
To: Krzysztof Olędzki
Cc: Andrew Lunn, Michal Kubecek, Moshe Shemesh,
netdev@vger.kernel.org, tariqt
On Tue, May 21, 2024 at 09:54:48PM -0700, Krzysztof Olędzki wrote:
> Hi,
>
> On 21.05.2024 at 13:21, Andrew Lunn wrote:
> >> sending genetlink packet (76 bytes):
> >> msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
> >> ETHTOOL_MSG_MODULE_EEPROM_GET
> >> ETHTOOL_A_MODULE_EEPROM_HEADER
> >> ETHTOOL_A_HEADER_DEV_NAME = "eth3"
> >> ETHTOOL_A_MODULE_EEPROM_LENGTH = 128
> >> ETHTOOL_A_MODULE_EEPROM_OFFSET = 128
> >> ETHTOOL_A_MODULE_EEPROM_PAGE = 3
> >> ETHTOOL_A_MODULE_EEPROM_BANK = 0
> >> ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS = 80
> >> received genetlink packet (96 bytes):
> >> msg length 96 error errno=-22
> >
> > This is a mellanox card right?
>
> Yes, sorry. This is indeed Mellanox (now Nvidia) CX3 / CX3Pro, using the drivers/net/ethernet/mellanox/mlx4 driver.
>
> > mlx4_en_get_module_info() and mlx4_en_get_module_eeprom() implement
> > the old API for reading data from an SFP module. So the ethtool core
> > will be mapping the new API to the old API. The interesting function
> > is probably fallback_set_params():
> >
> > https://elixir.bootlin.com/linux/latest/source/net/ethtool/eeprom.c#L29
> >
> > and my guess is, you are hitting:
> >
> > if (offset >= modinfo->eeprom_len)
> > return -EINVAL;
> >
> > offset is 3 * 128 + 128 = 512.
> >
> > mlx4_en_get_module_info() is probably returning eeprom_len of 256?
> >
> > Could you verify this?
>
> Ah, excellent catch Andrew!
Yes, I believe Andrew's analysis is correct.
>
> # egrep -R 'ETH_MODULE_SFF_[0-9]+_LEN' include/uapi/linux/ethtool.h
> #define ETH_MODULE_SFF_8079_LEN 256
> #define ETH_MODULE_SFF_8472_LEN 512
> #define ETH_MODULE_SFF_8636_LEN 256
> #define ETH_MODULE_SFF_8436_LEN 256
>
> The code in mlx4_en_get_module_info (with length annotation):
>
> switch (data[0] /* identifier */) {
> case MLX4_MODULE_ID_QSFP:
> modinfo->type = ETH_MODULE_SFF_8436;
> modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN; // 256
> break;
> case MLX4_MODULE_ID_QSFP_PLUS:
> if (data[1] >= 0x3) { /* revision id */
> modinfo->type = ETH_MODULE_SFF_8636;
> modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN; // 256
> } else {
> modinfo->type = ETH_MODULE_SFF_8436;
> modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN; // 256
> }
> break;
> case MLX4_MODULE_ID_QSFP28:
> modinfo->type = ETH_MODULE_SFF_8636;
> modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN; // 256
> break;
> case MLX4_MODULE_ID_SFP:
> modinfo->type = ETH_MODULE_SFF_8472;
> modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN; // 512
> break;
> default:
> return -EINVAL;
> }
>
> So right, the function returns 512 for SFP and 256 for everything else, which explains why SFP does work but QSFP - not.
Since you already did all the work and you are able to test patches, do
you want to fix it yourself and submit or report to the mlx4 maintainer
(copied)? Fix should be similar to mlx5 commit a708fb7b1f8d ("net/mlx5e:
ethtool, Add support for EEPROM high pages query").
>
> Following your advice, I added some debug printks to net/ethtool/eeprom.c:
>
> @@ -33,16 +33,24 @@
> u32 offset = request->offset;
> u32 length = request->length;
>
> + printk("A: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
> +
> if (request->page)
> offset = request->page * ETH_MODULE_EEPROM_PAGE_LEN + offset;
>
> + printk("B: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
> +
> if (modinfo->type == ETH_MODULE_SFF_8472 &&
> request->i2c_address == 0x51)
> offset += ETH_MODULE_EEPROM_PAGE_LEN * 2;
>
> + printk("C: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
> +
> if (offset >= modinfo->eeprom_len)
> return -EINVAL;
>
> + printk("D: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
> +
> eeprom->cmd = ETHTOOL_GMODULEEEPROM;
> eeprom->len = length;
> eeprom->offset = offset;
>
> Here is the result:
>
> SFP:
> A: offset=0, modinfo->eeprom_len=512
> B: offset=0, modinfo->eeprom_len=512
> C: offset=0, modinfo->eeprom_len=512
> D: offset=0, modinfo->eeprom_len=512
> A: offset=0, modinfo->eeprom_len=512
> B: offset=0, modinfo->eeprom_len=512
> C: offset=0, modinfo->eeprom_len=512
> D: offset=0, modinfo->eeprom_len=512
>
> QSFP:
> A: offset=0, modinfo->eeprom_len=256
> B: offset=0, modinfo->eeprom_len=256
> C: offset=0, modinfo->eeprom_len=256
> D: offset=0, modinfo->eeprom_len=256
>
> A: offset=0, modinfo->eeprom_len=256
> B: offset=0, modinfo->eeprom_len=256
> C: offset=0, modinfo->eeprom_len=256
> D: offset=0, modinfo->eeprom_len=256
>
> A: offset=128, modinfo->eeprom_len=256
> B: offset=128, modinfo->eeprom_len=256
> C: offset=128, modinfo->eeprom_len=256
> D: offset=128, modinfo->eeprom_len=256
>
> A: offset=128, modinfo->eeprom_len=256
> B: offset=512, modinfo->eeprom_len=256
> C: offset=512, modinfo->eeprom_len=256
> Note - no "D" as -EINVAL is returned exactly as you predicted.
>
> BTW: there is another suspicious looking thing in this code:
> - "u32 length = request->length;" is set early in the function
> - length is never updated
> - at the end, we have "eeprom->len = length"
>
> In this case, the existence of length seems at least seems redundant, unless I missed something?
Looks like it
>
> For the reference, the function was added in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=96d971e307cc0e434f96329b42bbd98cfbca07d2
> Later https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3bb7b63813f674fb62bac321cdd897cc62de094 changed ETH_MODULE_SFF_8079 to ETH_MODULE_SFF_8472.
>
> Thanks,
> Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-22 8:40 ` Ido Schimmel
@ 2024-05-22 12:44 ` Andrew Lunn
2024-05-23 5:29 ` Krzysztof Olędzki
0 siblings, 1 reply; 21+ messages in thread
From: Andrew Lunn @ 2024-05-22 12:44 UTC (permalink / raw)
To: Ido Schimmel
Cc: Krzysztof Olędzki, Michal Kubecek, Moshe Shemesh,
netdev@vger.kernel.org, tariqt
> > So right, the function returns 512 for SFP and 256 for everything else, which explains why SFP does work but QSFP - not.
>
> Since you already did all the work and you are able to test patches, do
> you want to fix it yourself and submit or report to the mlx4 maintainer
> (copied)? Fix should be similar to mlx5 commit a708fb7b1f8d ("net/mlx5e:
> ethtool, Add support for EEPROM high pages query").
Before you do that, please could you work on ethtool. I would say it
has a bug. It has been provided with 256 bytes of SPF data. It should
be able to decode that and print it in human readable format. So the
EINVAL should not be considered fatal to decoding.
Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-22 12:44 ` Andrew Lunn
@ 2024-05-23 5:29 ` Krzysztof Olędzki
2024-05-23 10:37 ` Michal Kubecek
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Krzysztof Olędzki @ 2024-05-23 5:29 UTC (permalink / raw)
To: Andrew Lunn, Ido Schimmel
Cc: Michal Kubecek, Moshe Shemesh, netdev@vger.kernel.org, tariqt
On 22.05.2024 at 05:44, Andrew Lunn wrote:
>>> So right, the function returns 512 for SFP and 256 for everything else, which explains why SFP does work but QSFP - not.
>>
>> Since you already did all the work and you are able to test patches, do
>> you want to fix it yourself and submit or report to the mlx4 maintainer
>> (copied)? Fix should be similar to mlx5 commit a708fb7b1f8d ("net/mlx5e:
>> ethtool, Add support for EEPROM high pages query").
Oh, thank you so much for the pointer! Turns out, it was way easier than I thought:
# ethtool -m eth2
Identifier : 0x0d (QSFP+)
Extended identifier : 0x00
Extended identifier description : 1.5W max. Power consumption
Extended identifier description : No CDR in TX, No CDR in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Power set : On
Power override : On
Connector : 0x0c (MPO Parallel Optic)
Transceiver codes : 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 40G Ethernet: 40G Base-SR4
Encoding : 0x05 (64B/66B)
BR, Nominal : 10300Mbps
Rate identifier : 0x00
Length (SMF,km) : 0km
Length (OM3 50um) : 100m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 0m
Transmitter technology : 0x00 (850 nm VCSEL)
Laser wavelength : 850.000nm
Laser wavelength tolerance : 10.000nm
Vendor name : AVAGO
Vendor OUI : 00:17:6a
Vendor PN : <REDACTED>
Vendor rev : 01
Vendor SN : <REDACTED>
Date code : <REDACTED>
Revision Compliance : Revision not specified
Module temperature : 42.17 degrees C / 107.90 degrees F
Module voltage : 3.2917 V
Alarm/warning flags implemented : Yes
Laser tx bias current (Channel 1) : 6.662 mA
Laser tx bias current (Channel 2) : 0.000 mA
Laser tx bias current (Channel 3) : 0.000 mA
Laser tx bias current (Channel 4) : 0.000 mA
Transmit avg optical power (Channel 1) : 0.7119 mW / -1.48 dBm
Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 1) : 0.7682 mW / -1.15 dBm
Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm
Laser bias current high alarm (Chan 1) : Off
Laser bias current low alarm (Chan 1) : Off
Laser bias current high warning (Chan 1) : Off
Laser bias current low warning (Chan 1) : Off
Laser bias current high alarm (Chan 2) : Off
Laser bias current low alarm (Chan 2) : Off
Laser bias current high warning (Chan 2) : Off
Laser bias current low warning (Chan 2) : Off
Laser bias current high alarm (Chan 3) : Off
Laser bias current low alarm (Chan 3) : Off
Laser bias current high warning (Chan 3) : Off
Laser bias current low warning (Chan 3) : Off
Laser bias current high alarm (Chan 4) : Off
Laser bias current low alarm (Chan 4) : Off
Laser bias current high warning (Chan 4) : Off
Laser bias current low warning (Chan 4) : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser tx power high alarm (Channel 1) : Off
Laser tx power low alarm (Channel 1) : Off
Laser tx power high warning (Channel 1) : Off
Laser tx power low warning (Channel 1) : Off
Laser tx power high alarm (Channel 2) : Off
Laser tx power low alarm (Channel 2) : Off
Laser tx power high warning (Channel 2) : Off
Laser tx power low warning (Channel 2) : Off
Laser tx power high alarm (Channel 3) : Off
Laser tx power low alarm (Channel 3) : Off
Laser tx power high warning (Channel 3) : Off
Laser tx power low warning (Channel 3) : Off
Laser tx power high alarm (Channel 4) : Off
Laser tx power low alarm (Channel 4) : Off
Laser tx power high warning (Channel 4) : Off
Laser tx power low warning (Channel 4) : Off
Laser rx power high alarm (Channel 1) : Off
Laser rx power low alarm (Channel 1) : Off
Laser rx power high warning (Channel 1) : Off
Laser rx power low warning (Channel 1) : Off
Laser rx power high alarm (Channel 2) : Off
Laser rx power low alarm (Channel 2) : Off
Laser rx power high warning (Channel 2) : Off
Laser rx power low warning (Channel 2) : Off
Laser rx power high alarm (Channel 3) : Off
Laser rx power low alarm (Channel 3) : Off
Laser rx power high warning (Channel 3) : Off
Laser rx power low warning (Channel 3) : Off
Laser rx power high alarm (Channel 4) : Off
Laser rx power low alarm (Channel 4) : Off
Laser rx power high warning (Channel 4) : Off
Laser rx power low warning (Channel 4) : Off
Laser bias current high alarm threshold : 10.000 mA
Laser bias current low alarm threshold : 0.500 mA
Laser bias current high warning threshold : 9.500 mA
Laser bias current low warning threshold : 1.000 mA
Laser output power high alarm threshold : 0.0000 mW / -inf dBm
Laser output power low alarm threshold : 0.0000 mW / -inf dBm
Laser output power high warning threshold : 0.0000 mW / -inf dBm
Laser output power low warning threshold : 0.0000 mW / -inf dBm
Module temperature high alarm threshold : 75.00 degrees C / 167.00 degrees F
Module temperature low alarm threshold : -5.00 degrees C / 23.00 degrees F
Module temperature high warning threshold : 70.00 degrees C / 158.00 degrees F
Module temperature low warning threshold : 0.00 degrees C / 32.00 degrees F
Module voltage high alarm threshold : 3.6300 V
Module voltage low alarm threshold : 2.9700 V
Module voltage high warning threshold : 3.4650 V
Module voltage low warning threshold : 3.1349 V
Laser rx power high alarm threshold : 2.1878 mW / 3.40 dBm
Laser rx power low alarm threshold : 0.0446 mW / -13.51 dBm
Laser rx power high warning threshold : 1.7378 mW / 2.40 dBm
Laser rx power low warning threshold : 0.1122 mW / -9.50 dBm
Looks like all we need to do is:
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 2024-04-17 02:19:38.000000000 -0700
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 2024-05-22 12:46:57.290947887 -0700
@@ -2055,15 +2055,15 @@
switch (data[0] /* identifier */) {
case MLX4_MODULE_ID_QSFP:
modinfo->type = ETH_MODULE_SFF_8436;
- modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
break;
case MLX4_MODULE_ID_QSFP_PLUS:
if (data[1] >= 0x3) { /* revision id */
modinfo->type = ETH_MODULE_SFF_8636;
- modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8636_MAX_LEN;
} else {
modinfo->type = ETH_MODULE_SFF_8436;
- modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
}
break;
case MLX4_MODULE_ID_QSFP28:
If I'm not mistaken, the rest of the logic is already there, such as:
static void mlx4_qsfp_eeprom_params_set(u8 *i2c_addr, u8 *page_num, u16 *offset)
{
/* Offsets 0-255 belong to page 0.
* Offsets 256-639 belong to pages 01, 02, 03.
* For example, offset 400 is page 02: 1 + (400 - 256) / 128 = 2
*/
if (*offset < I2C_PAGE_SIZE)
*page_num = 0;
else
*page_num = 1 + (*offset - I2C_PAGE_SIZE) / I2C_HIGH_PAGE_SIZE;
*i2c_addr = I2C_ADDR_LOW;
*offset -= *page_num * I2C_HIGH_PAGE_SIZE;
}
So, we don't need to make as many changes as in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a708fb7b1f8dcc7a8ed949839958cd5d812dd939
> Before you do that, please could you work on ethtool. I would say it
> has a bug. It has been provided with 256 bytes of SPF data. It should
> be able to decode that and print it in human readable format. So the
> EINVAL should not be considered fatal to decoding.
Yes, I was also thinking this way. Luckily, similar to the situation with the mlx4 driver, all the logic is there - sff8636_dom_parse() checks if map->page_03h is set and if not, just returns gracefully.
So, all we need to do is modify sff8636_memory_map_init_pages():
@@ -1038,7 +1039,7 @@
sff8636_request_init(&request, 0x3, SFF8636_PAGE_SIZE);
ret = nl_get_eeprom_page(ctx, &request);
if (ret < 0)
- return ret;
+ return 0;
map->page_03h = request.data - SFF8636_PAGE_SIZE;
return 0;
As you described, we get all the data except the DOM:
Identifier : 0x0d (QSFP+)
Extended identifier : 0x00
Extended identifier description : 1.5W max. Power consumption
Extended identifier description : No CDR in TX, No CDR in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Power set : On
Power override : On
Connector : 0x0c (MPO Parallel Optic)
Transceiver codes : 0x00 0x00 0x30 0x00 0x00 0x00 0x00 0x00
Transceiver type : SAS 6.0G
Transceiver type : SAS 3.0G
Encoding : 0x01 (8B/10B)
BR, Nominal : 0Mbps
Rate identifier : 0x00
Length (SMF,km) : 0km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 0m
Transmitter technology : 0x00 (850 nm VCSEL)
Laser wavelength : 850.000nm
Laser wavelength tolerance : 10.000nm
Vendor name : AVAGO
Vendor OUI : 00:17:6a
Vendor PN : <REDACTED>
Vendor rev : A0
Vendor SN : <REDACTED>
Date code : <REDACTED>
Revision Compliance : Revision not specified
Rx loss of signal : None
Tx loss of signal : None
Tx fault : None
Module temperature : 40.91 degrees C / 105.63 degrees F
Module voltage : 3.2848 V
Alarm/warning flags implemented : No
Laser tx bias current (Channel 1) : 6.634 mA
Laser tx bias current (Channel 2) : 0.000 mA
Laser tx bias current (Channel 3) : 0.000 mA
Laser tx bias current (Channel 4) : 0.000 mA
Transmit avg optical power (Channel 1) : 0.8101 mW / -0.91 dBm
Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 1) : 0.5692 mW / -2.45 dBm
Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm
Do you think it would make sense to print a warning in such situation, or just handle this silently?
Finally, as I was looking at the code in fallback_set_params() I started thinking if the length check is actually correct?
I think instead of:
if (offset >= modinfo->eeprom_len)
we may want:
if (offset + length > modinfo->eeprom_len)
I don't know if it is safe to assume we always read a single page and cross page reads are not allowed and even if so, that we should rely on this instead of checking the len explicitly? What do you think?
Once I hear from y'all I will prepare the patches.
Thanks,
Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-23 5:29 ` Krzysztof Olędzki
@ 2024-05-23 10:37 ` Michal Kubecek
2024-05-23 10:48 ` Ido Schimmel
2024-05-23 15:35 ` Andrew Lunn
2 siblings, 0 replies; 21+ messages in thread
From: Michal Kubecek @ 2024-05-23 10:37 UTC (permalink / raw)
To: Krzysztof Olędzki
Cc: Andrew Lunn, Ido Schimmel, Moshe Shemesh, netdev@vger.kernel.org,
tariqt
[-- Attachment #1: Type: text/plain, Size: 543 bytes --]
On Wed, May 22, 2024 at 10:29:43PM -0700, Krzysztof Olędzki wrote:
> Do you think it would make sense to print a warning in such situation,
> or just handle this silently?
Unless part of the data not being available is something that should be
expected to happen even if there is nothing wrong (which doesn't seem to
be the case), there should IMHO be a warning that the information shown
is incomplete.
I would also consider a short comment in the code explaining that
returning 0 after request failure is intentional.
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-23 5:29 ` Krzysztof Olędzki
2024-05-23 10:37 ` Michal Kubecek
@ 2024-05-23 10:48 ` Ido Schimmel
2024-05-23 15:35 ` Andrew Lunn
2 siblings, 0 replies; 21+ messages in thread
From: Ido Schimmel @ 2024-05-23 10:48 UTC (permalink / raw)
To: Krzysztof Olędzki
Cc: Andrew Lunn, Michal Kubecek, Moshe Shemesh,
netdev@vger.kernel.org, tariqt
On Wed, May 22, 2024 at 10:29:43PM -0700, Krzysztof Olędzki wrote:
> On 22.05.2024 at 05:44, Andrew Lunn wrote:
> >>> So right, the function returns 512 for SFP and 256 for everything else, which explains why SFP does work but QSFP - not.
> >>
> >> Since you already did all the work and you are able to test patches, do
> >> you want to fix it yourself and submit or report to the mlx4 maintainer
> >> (copied)? Fix should be similar to mlx5 commit a708fb7b1f8d ("net/mlx5e:
> >> ethtool, Add support for EEPROM high pages query").
>
> Oh, thank you so much for the pointer! Turns out, it was way easier than I thought:
>
> # ethtool -m eth2
[..]
> Looks like all we need to do is:
>
> --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 2024-04-17 02:19:38.000000000 -0700
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 2024-05-22 12:46:57.290947887 -0700
> @@ -2055,15 +2055,15 @@
> switch (data[0] /* identifier */) {
> case MLX4_MODULE_ID_QSFP:
> modinfo->type = ETH_MODULE_SFF_8436;
> - modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
> + modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
> break;
> case MLX4_MODULE_ID_QSFP_PLUS:
> if (data[1] >= 0x3) { /* revision id */
> modinfo->type = ETH_MODULE_SFF_8636;
> - modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
> + modinfo->eeprom_len = ETH_MODULE_SFF_8636_MAX_LEN;
> } else {
> modinfo->type = ETH_MODULE_SFF_8436;
> - modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
> + modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
> }
> break;
> case MLX4_MODULE_ID_QSFP28:
Need to update QSFP28 to use ETH_MODULE_SFF_8636_MAX_LEN as well. Looks
OK otherwise.
>
> If I'm not mistaken, the rest of the logic is already there, such as:
>
> static void mlx4_qsfp_eeprom_params_set(u8 *i2c_addr, u8 *page_num, u16 *offset)
> {
> /* Offsets 0-255 belong to page 0.
> * Offsets 256-639 belong to pages 01, 02, 03.
> * For example, offset 400 is page 02: 1 + (400 - 256) / 128 = 2
> */
> if (*offset < I2C_PAGE_SIZE)
> *page_num = 0;
> else
> *page_num = 1 + (*offset - I2C_PAGE_SIZE) / I2C_HIGH_PAGE_SIZE;
> *i2c_addr = I2C_ADDR_LOW;
> *offset -= *page_num * I2C_HIGH_PAGE_SIZE;
> }
>
> So, we don't need to make as many changes as in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a708fb7b1f8dcc7a8ed949839958cd5d812dd939
OK.
> > Before you do that, please could you work on ethtool. I would say it
> > has a bug. It has been provided with 256 bytes of SPF data. It should
> > be able to decode that and print it in human readable format. So the
> > EINVAL should not be considered fatal to decoding.
>
> Yes, I was also thinking this way. Luckily, similar to the situation with the mlx4 driver, all the logic is there - sff8636_dom_parse() checks if map->page_03h is set and if not, just returns gracefully.
>
> So, all we need to do is modify sff8636_memory_map_init_pages():
> @@ -1038,7 +1039,7 @@
> sff8636_request_init(&request, 0x3, SFF8636_PAGE_SIZE);
> ret = nl_get_eeprom_page(ctx, &request);
> if (ret < 0)
> - return ret;
> + return 0;
> map->page_03h = request.data - SFF8636_PAGE_SIZE;
>
> return 0;
>
> As you described, we get all the data except the DOM:
[...]
> Do you think it would make sense to print a warning in such situation, or just handle this silently?
Yes, I suggest adding some kind of warning. For example:
"Page 03h could not be retrieved"
Otherwise we would be papering over bugs like the above.
We don't need the same treatment in the CMIS parser because drivers
supporting CMIS modules support the get_module_eeprom_by_page() ethtool
operation and don't go via the fallback path.
> Finally, as I was looking at the code in fallback_set_params() I started thinking if the length check is actually correct?
>
> I think instead of:
> if (offset >= modinfo->eeprom_len)
> we may want:
> if (offset + length > modinfo->eeprom_len)
>
> I don't know if it is safe to assume we always read a single page and cross page reads are not allowed and even if so, that we should rely on this instead of checking the len explicitly? What do you think?
Yea, it seems the current check only checks that we do not start to read
after the EEPROM boundary, but not that we finish reading before it as
well.
> Once I hear from y'all I will prepare the patches.
Thanks!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
2024-05-23 5:29 ` Krzysztof Olędzki
2024-05-23 10:37 ` Michal Kubecek
2024-05-23 10:48 ` Ido Schimmel
@ 2024-05-23 15:35 ` Andrew Lunn
2024-07-08 3:41 ` [PATCH] net/mlx4: Add support for EEPROM high pages query for QSFP/QSFP+/QSFP28 Krzysztof Olędzki
2024-07-08 3:41 ` [PATCH] qsfp: Better handling of Page 03h netlink read failure Krzysztof Olędzki
2 siblings, 2 replies; 21+ messages in thread
From: Andrew Lunn @ 2024-05-23 15:35 UTC (permalink / raw)
To: Krzysztof Olędzki
Cc: Ido Schimmel, Michal Kubecek, Moshe Shemesh,
netdev@vger.kernel.org, tariqt
> > Before you do that, please could you work on ethtool. I would say it
> > has a bug. It has been provided with 256 bytes of SPF data. It should
> > be able to decode that and print it in human readable format. So the
> > EINVAL should not be considered fatal to decoding.
>
> Yes, I was also thinking this way. Luckily, similar to the situation with the mlx4 driver, all the logic is there - sff8636_dom_parse() checks if map->page_03h is set and if not, just returns gracefully.
>
> So, all we need to do is modify sff8636_memory_map_init_pages():
> @@ -1038,7 +1039,7 @@
> sff8636_request_init(&request, 0x3, SFF8636_PAGE_SIZE);
> ret = nl_get_eeprom_page(ctx, &request);
> if (ret < 0)
> - return ret;
> + return 0;
> map->page_03h = request.data - SFF8636_PAGE_SIZE;
Nice. Please submit a patch.
I see there is a discussion about if there should be a warning here or
not. The problem we have is that some NICs offload getting data from
the SFP to firmware, and the firmware does not support more than
getting the first page. The IOCTL API returned whatever the was
available, and ethtool would decode what it was given. With the change
to the newer API, ethtool can ask for any page. Those devices using
limited firmware cannot fulfil the request, and are going to return an
error, either -EOPNOTSUPP, or maybe -EINVAL. We don't want ethtool to
regress, so being silent is probably the correct thing to do.
It does however hide bugs in drivers, but we can maybe find them via
testing. It would be nice to have CI/CD test which runs ethtool using
both the IOCTL interface and the netlink and compares the output. The
IOCTL output should be a subset of the netlink output. In this case,
it clearly is not a subset.
> Finally, as I was looking at the code in fallback_set_params() I started thinking if the length check is actually correct?
>
> I think instead of:
> if (offset >= modinfo->eeprom_len)
> we may want:
> if (offset + length > modinfo->eeprom_len)
>
> I don't know if it is safe to assume we always read a single page
> and cross page reads are not allowed and even if so, that we should
> rely on this instead of checking the len explicitly? What do you
> think?
Cross page reads are not allowed by the netlink API. SFPs do funny
things when you do a cross page read. I forget the details, but it is
something like it wraps around within the same 1/2 page. However, SFPs
vendors like to ignore the standard, don't feel they need to conform
to it, and i would not be surprised if some do continue the read in
the next page because that is simpler, or they could not be bothered
to read the standard in detail. By defining the API to not allow cross
page reads, or even cross 1/2 page reads, we avoid this mess. This
validation should happen at a higher level, where the netlink
parameters are validated for the call.
You can probably test this:
ethtool -m|--dump-module-eeprom|--module-info devname [raw on|off]
[hex on|off] [offset N] [length N]
You probably need hex on, and then can then set offset and length and
see if the validation works.
Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] net/mlx4: Add support for EEPROM high pages query for QSFP/QSFP+/QSFP28
2024-05-23 15:35 ` Andrew Lunn
@ 2024-07-08 3:41 ` Krzysztof Olędzki
2024-07-08 16:28 ` Ido Schimmel
2024-07-09 11:17 ` Dan Merillat
2024-07-08 3:41 ` [PATCH] qsfp: Better handling of Page 03h netlink read failure Krzysztof Olędzki
1 sibling, 2 replies; 21+ messages in thread
From: Krzysztof Olędzki @ 2024-07-08 3:41 UTC (permalink / raw)
To: Ido Schimmel, Andrew Lunn, Michal Kubecek
Cc: Moshe Shemesh, netdev@vger.kernel.org, tariqt, Dan Merillat
Enable reading additional EEPROM information from high pages such as
thresholds and alarms on QSFP/QSFP+/QSFP28 modules.
The fix is similar to a708fb7b1f8dcc7a8ed949839958cd5d812dd939 but given
all the required logic already exists in mlx4_qsfp_eeprom_params_set()
only s/_LEN/MAX_LEN/ is needed.
---
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 619e1c3ef7f9..aca968b4dc15 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -2055,20 +2055,20 @@ static int mlx4_en_get_module_info(struct net_device *dev,
switch (data[0] /* identifier */) {
case MLX4_MODULE_ID_QSFP:
modinfo->type = ETH_MODULE_SFF_8436;
- modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
break;
case MLX4_MODULE_ID_QSFP_PLUS:
if (data[1] >= 0x3) { /* revision id */
modinfo->type = ETH_MODULE_SFF_8636;
- modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8636_MAX_LEN;
} else {
modinfo->type = ETH_MODULE_SFF_8436;
- modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
}
break;
case MLX4_MODULE_ID_QSFP28:
modinfo->type = ETH_MODULE_SFF_8636;
- modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8636_MAX_LEN;
break;
case MLX4_MODULE_ID_SFP:
modinfo->type = ETH_MODULE_SFF_8472;
--
2.45.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH] qsfp: Better handling of Page 03h netlink read failure
2024-05-23 15:35 ` Andrew Lunn
2024-07-08 3:41 ` [PATCH] net/mlx4: Add support for EEPROM high pages query for QSFP/QSFP+/QSFP28 Krzysztof Olędzki
@ 2024-07-08 3:41 ` Krzysztof Olędzki
2024-07-08 16:12 ` Ido Schimmel
1 sibling, 1 reply; 21+ messages in thread
From: Krzysztof Olędzki @ 2024-07-08 3:41 UTC (permalink / raw)
To: Ido Schimmel, Andrew Lunn, Michal Kubecek
Cc: Moshe Shemesh, netdev@vger.kernel.org, tariqt, Dan Merillat
Page 03h may not be available due to a bug in a driver.
This is a non-fatal error and sff8636_dom_parse() handles this
correctly, allowing it to provide the remaining information.
Also, add an error message to clarify otherwise cryptic
"netlink error: Invalid argument" message.
---
qsfp.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/qsfp.c b/qsfp.c
index a2921fb..208eddc 100644
--- a/qsfp.c
+++ b/qsfp.c
@@ -1038,8 +1038,16 @@ sff8636_memory_map_init_pages(struct cmd_context *ctx,
sff8636_request_init(&request, 0x3, SFF8636_PAGE_SIZE);
ret = nl_get_eeprom_page(ctx, &request);
- if (ret < 0)
- return ret;
+ if (ret < 0) {
+ /* Page 03h is not available due to a bug in the driver.
+ * This is a non-fatal error and sff8636_dom_parse()
+ * handles this correctly.
+ */
+
+ fprintf(stderr, "Failed to read Upper Page 03h, driver error?\n");
+ return 0;
+ }
+
map->page_03h = request.data - SFF8636_PAGE_SIZE;
return 0;
--
2.45.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] qsfp: Better handling of Page 03h netlink read failure
2024-07-08 3:41 ` [PATCH] qsfp: Better handling of Page 03h netlink read failure Krzysztof Olędzki
@ 2024-07-08 16:12 ` Ido Schimmel
2024-07-31 0:55 ` Krzysztof Olędzki
0 siblings, 1 reply; 21+ messages in thread
From: Ido Schimmel @ 2024-07-08 16:12 UTC (permalink / raw)
To: Krzysztof Olędzki
Cc: Andrew Lunn, Michal Kubecek, Moshe Shemesh,
netdev@vger.kernel.org, tariqt, Dan Merillat
Hi, thanks for the patch. Some comments below (mainly about the
process).
Subject prefix should be "[PATCH ethtool]" instead of "[PATCH]"
Best to post as a separate thread instead of as a reply to a thread.
Looks weird in lore (https://lore.kernel.org/netdev/) otherwise.
On Sun, Jul 07, 2024 at 08:41:38PM -0700, Krzysztof Olędzki wrote:
> Page 03h may not be available due to a bug in a driver.
> This is a non-fatal error and sff8636_dom_parse() handles this
> correctly, allowing it to provide the remaining information.
In addition to the problem, need to describe what the patch is doing in
imperative mood. Something like:
"
When dumping the EEPROM contents of a QSFP transceiver module, ethtool
will only ask the kernel to retrieve Upper Page 03h if the module
advertised it as supported.
However, some kernel drivers like mlx4 are currently unable to provide
the page, resulting in the kernel returning an error. Since Upper Page
03h is optional, do not treat the error as fatal. Instead, print an
error message and allow ethtool to continue and parse / print the
contents of the other pages.
Before:
# ethtool -m eth3
<OUTPUT BEFORE>
After:
# ethtool -m eth3
<OUTPUT AFTER>
Fixes: 25b64c66f58d ("ethtool: Add netlink handler for getmodule (-m)")
Signed-off-by: Krzysztof Olędzki <ole@ans.pl>
"
>
> Also, add an error message to clarify otherwise cryptic
> "netlink error: Invalid argument" message.
> ---
> qsfp.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/qsfp.c b/qsfp.c
> index a2921fb..208eddc 100644
> --- a/qsfp.c
> +++ b/qsfp.c
> @@ -1038,8 +1038,16 @@ sff8636_memory_map_init_pages(struct cmd_context *ctx,
>
> sff8636_request_init(&request, 0x3, SFF8636_PAGE_SIZE);
> ret = nl_get_eeprom_page(ctx, &request);
> - if (ret < 0)
> - return ret;
> + if (ret < 0) {
> + /* Page 03h is not available due to a bug in the driver.
> + * This is a non-fatal error and sff8636_dom_parse()
> + * handles this correctly.
> + */
> +
Unnecessary blank line
> + fprintf(stderr, "Failed to read Upper Page 03h, driver error?\n");
> + return 0;
> + }
> +
> map->page_03h = request.data - SFF8636_PAGE_SIZE;
>
> return 0;
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] net/mlx4: Add support for EEPROM high pages query for QSFP/QSFP+/QSFP28
2024-07-08 3:41 ` [PATCH] net/mlx4: Add support for EEPROM high pages query for QSFP/QSFP+/QSFP28 Krzysztof Olędzki
@ 2024-07-08 16:28 ` Ido Schimmel
2024-07-09 11:17 ` Dan Merillat
1 sibling, 0 replies; 21+ messages in thread
From: Ido Schimmel @ 2024-07-08 16:28 UTC (permalink / raw)
To: Krzysztof Olędzki
Cc: Andrew Lunn, Michal Kubecek, Moshe Shemesh,
netdev@vger.kernel.org, tariqt, Dan Merillat
Hi, similar comments as on the other patch. Subject prefix should be
"[PATCH net-next]" instead of "[PATCH]". Need to copy maintainers
according to the output of "scripts/get_maintainer.pl". Missing SoB
(scripts/checkpatch.pl should flag it). I suggest reading the following
before submitting v2:
https://www.kernel.org/doc/html/latest/process/submitting-patches.html
https://docs.kernel.org/process/maintainer-netdev.html
On Sun, Jul 07, 2024 at 08:41:31PM -0700, Krzysztof Olędzki wrote:
> Enable reading additional EEPROM information from high pages such as
> thresholds and alarms on QSFP/QSFP+/QSFP28 modules.
>
> The fix is similar to a708fb7b1f8dcc7a8ed949839958cd5d812dd939 but given
Need to include the subject of the commit:
"This is similar to commit a708fb7b1f8d ("net/mlx5e: ethtool, Add
support for EEPROM high pages query") [...]"
> all the required logic already exists in mlx4_qsfp_eeprom_params_set()
> only s/_LEN/MAX_LEN/ is needed.
> ---
> drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
Code LGTM
Thanks
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] net/mlx4: Add support for EEPROM high pages query for QSFP/QSFP+/QSFP28
2024-07-08 3:41 ` [PATCH] net/mlx4: Add support for EEPROM high pages query for QSFP/QSFP+/QSFP28 Krzysztof Olędzki
2024-07-08 16:28 ` Ido Schimmel
@ 2024-07-09 11:17 ` Dan Merillat
1 sibling, 0 replies; 21+ messages in thread
From: Dan Merillat @ 2024-07-09 11:17 UTC (permalink / raw)
To: Krzysztof Olędzki, Ido Schimmel, Andrew Lunn, Michal Kubecek
Cc: Moshe Shemesh, netdev@vger.kernel.org, tariqt, Dan Merillat
On 7/7/24 11:41 PM, Krzysztof Olędzki wrote:
> Enable reading additional EEPROM information from high pages such as
> thresholds and alarms on QSFP/QSFP+/QSFP28 modules.
>
> The fix is similar to a708fb7b1f8dcc7a8ed949839958cd5d812dd939 but given
> all the required logic already exists in mlx4_qsfp_eeprom_params_set()
> only s/_LEN/MAX_LEN/ is needed.
> ---
> drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> index 619e1c3ef7f9..aca968b4dc15 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> @@ -2055,20 +2055,20 @@ static int mlx4_en_get_module_info(struct net_device *dev,
> switch (data[0] /* identifier */) {
> case MLX4_MODULE_ID_QSFP:
> modinfo->type = ETH_MODULE_SFF_8436;
> - modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
> + modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
> break;
> case MLX4_MODULE_ID_QSFP_PLUS:
> if (data[1] >= 0x3) { /* revision id */
> modinfo->type = ETH_MODULE_SFF_8636;
> - modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
> + modinfo->eeprom_len = ETH_MODULE_SFF_8636_MAX_LEN;
> } else {
> modinfo->type = ETH_MODULE_SFF_8436;
> - modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
> + modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
> }
> break;
> case MLX4_MODULE_ID_QSFP28:
> modinfo->type = ETH_MODULE_SFF_8636;
> - modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
> + modinfo->eeprom_len = ETH_MODULE_SFF_8636_MAX_LEN;
> break;
> case MLX4_MODULE_ID_SFP:
> modinfo->type = ETH_MODULE_SFF_8472;
Patched this into kernel 6.9.8 and I now get the alarm/warning data from this module as expected when reverting to stock ethtool 6.9. I tested on a ConnectX3-Pro.
Thanks for the fast turnaround!
Tested-By: Dan Merillat <git@dan.merillat.org>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qsfp: Better handling of Page 03h netlink read failure
2024-07-08 16:12 ` Ido Schimmel
@ 2024-07-31 0:55 ` Krzysztof Olędzki
2024-07-31 8:48 ` Ido Schimmel
0 siblings, 1 reply; 21+ messages in thread
From: Krzysztof Olędzki @ 2024-07-31 0:55 UTC (permalink / raw)
To: Ido Schimmel
Cc: Andrew Lunn, Michal Kubecek, Moshe Shemesh,
netdev@vger.kernel.org, tariqt, Dan Merillat
On 08.07.2024 at 09:12, Ido Schimmel wrote:
> Hi, thanks for the patch. Some comments below (mainly about the
> process).
Thank you so much Ido for your careful review, detailed feedback and
patience. I just re-sent both patches.
BTW: get_maintainer.pl for drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
also returned "linux-kernel@vger.kernel.org", should I be also including it
in the future?
Thanks,
Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qsfp: Better handling of Page 03h netlink read failure
2024-07-31 0:55 ` Krzysztof Olędzki
@ 2024-07-31 8:48 ` Ido Schimmel
0 siblings, 0 replies; 21+ messages in thread
From: Ido Schimmel @ 2024-07-31 8:48 UTC (permalink / raw)
To: Krzysztof Olędzki
Cc: Andrew Lunn, Michal Kubecek, Moshe Shemesh,
netdev@vger.kernel.org, tariqt, Dan Merillat
On Tue, Jul 30, 2024 at 05:55:00PM -0700, Krzysztof Olędzki wrote:
> BTW: get_maintainer.pl for drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> also returned "linux-kernel@vger.kernel.org", should I be also including it
> in the future?
I never copied this list and nobody ever complained, so omitting it
should be fine
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-07-31 8:48 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-21 6:26 "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink? Krzysztof Olędzki
2024-05-21 6:55 ` Michal Kubecek
2024-05-21 7:02 ` Krzysztof Olędzki
2024-05-21 7:16 ` Krzysztof Olędzki
2024-05-21 7:34 ` Michal Kubecek
2024-05-21 7:38 ` Krzysztof Olędzki
2024-05-21 20:21 ` Andrew Lunn
2024-05-22 4:54 ` Krzysztof Olędzki
2024-05-22 8:40 ` Ido Schimmel
2024-05-22 12:44 ` Andrew Lunn
2024-05-23 5:29 ` Krzysztof Olędzki
2024-05-23 10:37 ` Michal Kubecek
2024-05-23 10:48 ` Ido Schimmel
2024-05-23 15:35 ` Andrew Lunn
2024-07-08 3:41 ` [PATCH] net/mlx4: Add support for EEPROM high pages query for QSFP/QSFP+/QSFP28 Krzysztof Olędzki
2024-07-08 16:28 ` Ido Schimmel
2024-07-09 11:17 ` Dan Merillat
2024-07-08 3:41 ` [PATCH] qsfp: Better handling of Page 03h netlink read failure Krzysztof Olędzki
2024-07-08 16:12 ` Ido Schimmel
2024-07-31 0:55 ` Krzysztof Olędzki
2024-07-31 8:48 ` Ido Schimmel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).