netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ethtool fails to read some QSFP+ modules.
@ 2024-06-30 17:27 Dan Merillat
  2024-07-01  7:28 ` Ido Schimmel
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Merillat @ 2024-06-30 17:27 UTC (permalink / raw)
  To: Michal Kubecek, netdev

[-- Attachment #1: Type: text/plain, Size: 268 bytes --]


I was testing an older Kaiam XQX2502 40G-LR4 and ethtool -m failed with netlink error.  It's treating a failure to read
the optional page3 data as a hard failure.

This patch allows ethtool to read qsfp modules that don't implement the voltage/temperature alarm data.

[-- Attachment #2: 0001-Some-qsfp-modules-do-not-support-page-3.patch --]
[-- Type: text/x-patch, Size: 1007 bytes --]

From 3144fbfc08fbfb90ecda4848fc9356bde8933d4a Mon Sep 17 00:00:00 2001
From: Dan Merillat <git@dan.eginity.com>
Date: Sun, 30 Jun 2024 13:11:51 -0400
Subject: [PATCH] Some qsfp modules do not support page 3

Tested on an older Kaiam XQX2502 40G-LR4 module.
ethtool -m aborts with netlink error due to page 3
not existing on the module. Ignore the error and
leave map->page_03h NULL.
---
 qsfp.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/qsfp.c b/qsfp.c
index a2921fb..0a16b42 100644
--- a/qsfp.c
+++ b/qsfp.c
@@ -1037,9 +1037,15 @@ sff8636_memory_map_init_pages(struct cmd_context *ctx,
 		return 0;
 
 	sff8636_request_init(&request, 0x3, SFF8636_PAGE_SIZE);
+
+	/* Some modules are paged but do not have page 03h.  This
+	 * is a non-fatal error, and sff8636_dom_parse() handles this
+	 * correctly.
+	 */
 	ret = nl_get_eeprom_page(ctx, &request);
 	if (ret < 0)
-		return ret;
+		return 0;
+
 	map->page_03h = request.data - SFF8636_PAGE_SIZE;
 
 	return 0;
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: ethtool fails to read some QSFP+ modules.
  2024-06-30 17:27 ethtool fails to read some QSFP+ modules Dan Merillat
@ 2024-07-01  7:28 ` Ido Schimmel
  2024-07-01  7:41   ` Ido Schimmel
  0 siblings, 1 reply; 8+ messages in thread
From: Ido Schimmel @ 2024-07-01  7:28 UTC (permalink / raw)
  To: Dan Merillat; +Cc: Michal Kubecek, netdev

On Sun, Jun 30, 2024 at 01:27:07PM -0400, Dan Merillat wrote:
> 
> I was testing an older Kaiam XQX2502 40G-LR4 and ethtool -m failed with netlink error.  It's treating a failure to read
> the optional page3 data as a hard failure.
> 
> This patch allows ethtool to read qsfp modules that don't implement the voltage/temperature alarm data.

Thanks for the report and the patch. Krzysztof Olędzki reported the same
issue earlier this year:
https://lore.kernel.org/netdev/9e757616-0396-4573-9ea9-3cb5ef5c901a@ans.pl/

Krzysztof, are you going to submit the ethtool and mlx4 patches?

> From 3144fbfc08fbfb90ecda4848fc9356bde8933d4a Mon Sep 17 00:00:00 2001
> From: Dan Merillat <git@dan.eginity.com>
> Date: Sun, 30 Jun 2024 13:11:51 -0400
> Subject: [PATCH] Some qsfp modules do not support page 3
> 
> Tested on an older Kaiam XQX2502 40G-LR4 module.
> ethtool -m aborts with netlink error due to page 3
> not existing on the module. Ignore the error and
> leave map->page_03h NULL.

User space only tries to read this page because the module advertised it
as supported. It is more likely that the NIC driver does not return all
the pages. Which driver is it?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ethtool fails to read some QSFP+ modules.
  2024-07-01  7:28 ` Ido Schimmel
@ 2024-07-01  7:41   ` Ido Schimmel
  2024-07-03 14:36     ` Krzysztof Olędzki
  0 siblings, 1 reply; 8+ messages in thread
From: Ido Schimmel @ 2024-07-01  7:41 UTC (permalink / raw)
  To: Dan Merillat, ole; +Cc: Michal Kubecek, netdev

Forgot to add Krzysztof :p

On Mon, Jul 01, 2024 at 10:28:39AM +0300, Ido Schimmel wrote:
> On Sun, Jun 30, 2024 at 01:27:07PM -0400, Dan Merillat wrote:
> > 
> > I was testing an older Kaiam XQX2502 40G-LR4 and ethtool -m failed with netlink error.  It's treating a failure to read
> > the optional page3 data as a hard failure.
> > 
> > This patch allows ethtool to read qsfp modules that don't implement the voltage/temperature alarm data.
> 
> Thanks for the report and the patch. Krzysztof Olędzki reported the same
> issue earlier this year:
> https://lore.kernel.org/netdev/9e757616-0396-4573-9ea9-3cb5ef5c901a@ans.pl/
> 
> Krzysztof, are you going to submit the ethtool and mlx4 patches?
> 
> > From 3144fbfc08fbfb90ecda4848fc9356bde8933d4a Mon Sep 17 00:00:00 2001
> > From: Dan Merillat <git@dan.eginity.com>
> > Date: Sun, 30 Jun 2024 13:11:51 -0400
> > Subject: [PATCH] Some qsfp modules do not support page 3
> > 
> > Tested on an older Kaiam XQX2502 40G-LR4 module.
> > ethtool -m aborts with netlink error due to page 3
> > not existing on the module. Ignore the error and
> > leave map->page_03h NULL.
> 
> User space only tries to read this page because the module advertised it
> as supported. It is more likely that the NIC driver does not return all
> the pages. Which driver is it?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ethtool fails to read some QSFP+ modules.
@ 2024-07-02 19:30 Dan Merillat
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Merillat @ 2024-07-02 19:30 UTC (permalink / raw)
  To: netdev; +Cc: Dan Merillat, Ido Schimmel

Sorry, I'm not subscribed to netdev and didn't get a CC so I'm trying to follow via the web archives
and I can't see who was CC'd.

On Mon, Jul 01, 2024 at 10:28:39AM +0300, Ido Schimmel wrote:
> On Sun, Jun 30, 2024 at 01:27:07PM -0400, Dan Merillat wrote:
> > 
> > I was testing an older Kaiam XQX2502 40G-LR4 and ethtool -m failed with netlink error.  It's treating a failure to read
> > the optional page3 data as a hard failure.
> > 
> > This patch allows ethtool to read qsfp modules that don't implement the voltage/temperature alarm data.
> 
> Thanks for the report and the patch. Krzysztof Olędzki reported the same
> issue earlier this year:
> https://lore.kernel.org/netdev/9e757616-0396-4573-9ea9-3cb5ef5c901a@ans.pl/
> 
> Krzysztof, are you going to submit the ethtool and mlx4 patches?
> 
> > From 3144fbfc08fbfb90ecda4848fc9356bde8933d4a Mon Sep 17 00:00:00 2001
> > From: Dan Merillat <git@dan.eginity.com>
> > Date: Sun, 30 Jun 2024 13:11:51 -0400
> > Subject: [PATCH] Some qsfp modules do not support page 3
> > 
> > Tested on an older Kaiam XQX2502 40G-LR4 module.
> > ethtool -m aborts with netlink error due to page 3
> > not existing on the module. Ignore the error and
> > leave map->page_03h NULL.
> 
> User space only tries to read this page because the module advertised it
> as supported. It is more likely that the NIC driver does not return all
> the pages. Which driver is it?

Same as Kyrzysztof, a connectx3 board.  I'm using mlx4_core/en in the stock 6.9.3 kernel.

Raw dump:
# ethtool -m enp65s0d1 raw on | hexdump -C 
00000000  0d 05 00 0f 00 00 00 00  00 44 44 00 00 44 44 00  |.........DD..DD.|
00000010  00 00 00 00 00 00 30 a4  00 00 82 3b 00 00 00 00  |......0....;....|
00000020  00 00 00 10 00 11 00 31  00 00 36 ff 36 ff 36 ff  |.......1..6.6.6.|
00000030  36 83 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |6...............|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 03 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 1f 00 00 00 00 00  |................|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  0d 80 07 02 00 00 00 00  00 00 00 00 67 00 02 00  |............g...|
00000090  00 00 00 40 4b 41 49 41  4d 20 43 4f 52 50 20 20  |...@KAIAM CORP  |
000000a0  20 20 20 20 00 14 ed e4  58 51 58 32 35 30 32 20  |    ....XQX2502 |
000000b0  20 20 20 20 20 20 20 20  31 41 66 58 05 14 46 14  |        1AfX..F.|
000000c0  00 00 00 92 4b 44 37 30  32 30 31 31 35 39 20 20  |....KD70201159  |
000000d0  20 20 20 20 31 37 30 32  30 31 30 30 08 00 00 0d  |    17020100....|
000000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100


The ioctl path handles it properly when ethtool is built with --disable-netlink, since it checks the
total buffer size returned by the kernel and does not set map->page_03h if the size does not match.

Status byte 2, bit 2 is 'Upper memory flat or paged.   0x1b: Flat memory.  0x0b: Paging (at least upper
page 0x03h supported)'

SFF8636_STATUS_FLAT_MEMORY would be clearer, as page_00h[STATUS_2] & PAGE_3_PRESENT is really a
negative test.

I confirmed that status 2 is 0x00, which would indicate the module supports paged memory.

So either this cheap module is non-compliant or there's a problem in the driver/firmware for
connectx3/3pro cards.  Either way, it would be better to warn about the non-compliance and still
display  the optical information gathered rather than a fatal error that shows nothing.  If it can
also be fixed in the mlx4_core driver to properly read page 3, that would be better, but that
will require someone with more knowledge of mellanox/nVidia internals than I have.

If anyone wants to test with a different adapter, these xqx2502 modules are dirt cheap on ebay, only $4 when 
I bought mine a few weeks ago.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ethtool fails to read some QSFP+ modules.
  2024-07-01  7:41   ` Ido Schimmel
@ 2024-07-03 14:36     ` Krzysztof Olędzki
  2024-07-04 12:27       ` Krzysztof Olędzki
  0 siblings, 1 reply; 8+ messages in thread
From: Krzysztof Olędzki @ 2024-07-03 14:36 UTC (permalink / raw)
  To: Ido Schimmel, Dan Merillat; +Cc: Michal Kubecek, netdev

Good morning,

On 01.07.2024 at 00:41, Ido Schimmel wrote:
> Forgot to add Krzysztof :p
> 
> On Mon, Jul 01, 2024 at 10:28:39AM +0300, Ido Schimmel wrote:
>> On Sun, Jun 30, 2024 at 01:27:07PM -0400, Dan Merillat wrote:
>>>
>>> I was testing an older Kaiam XQX2502 40G-LR4 and ethtool -m failed with netlink error.  It's treating a failure to read
>>> the optional page3 data as a hard failure.
>>>
>>> This patch allows ethtool to read qsfp modules that don't implement the voltage/temperature alarm data.
>>
>> Thanks for the report and the patch. Krzysztof Olędzki reported the same
>> issue earlier this year:
>> https://lore.kernel.org/netdev/9e757616-0396-4573-9ea9-3cb5ef5c901a@ans.pl/
>>
>> Krzysztof, are you going to submit the ethtool and mlx4 patches?

Yes, and I apologize for the delay - I have been traveling with my family and was unable to get into it.

I should be able to work on the patches later this week, so please expect something from me around the weekend.

>>> From 3144fbfc08fbfb90ecda4848fc9356bde8933d4a Mon Sep 17 00:00:00 2001
>>> From: Dan Merillat <git@dan.eginity.com>
>>> Date: Sun, 30 Jun 2024 13:11:51 -0400
>>> Subject: [PATCH] Some qsfp modules do not support page 3
>>>
>>> Tested on an older Kaiam XQX2502 40G-LR4 module.
>>> ethtool -m aborts with netlink error due to page 3
>>> not existing on the module. Ignore the error and
>>> leave map->page_03h NULL.
>>
>> User space only tries to read this page because the module advertised it
>> as supported. It is more likely that the NIC driver does not return all
>> the pages. Which driver is it?
> 

Thanks,
 Krzysztof

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ethtool fails to read some QSFP+ modules.
  2024-07-03 14:36     ` Krzysztof Olędzki
@ 2024-07-04 12:27       ` Krzysztof Olędzki
  2024-07-07  8:35         ` Ido Schimmel
  0 siblings, 1 reply; 8+ messages in thread
From: Krzysztof Olędzki @ 2024-07-04 12:27 UTC (permalink / raw)
  To: Ido Schimmel, Dan Merillat; +Cc: Michal Kubecek, netdev

On 03.07.2024 at 07:36, Krzysztof Olędzki wrote:
> Good morning,
> 
> On 01.07.2024 at 00:41, Ido Schimmel wrote:
>> Forgot to add Krzysztof :p
>>
>> On Mon, Jul 01, 2024 at 10:28:39AM +0300, Ido Schimmel wrote:
>>> On Sun, Jun 30, 2024 at 01:27:07PM -0400, Dan Merillat wrote:
>>>>
>>>> I was testing an older Kaiam XQX2502 40G-LR4 and ethtool -m failed with netlink error.  It's treating a failure to read
>>>> the optional page3 data as a hard failure.
>>>>
>>>> This patch allows ethtool to read qsfp modules that don't implement the voltage/temperature alarm data.
>>>
>>> Thanks for the report and the patch. Krzysztof Olędzki reported the same
>>> issue earlier this year:
>>> https://lore.kernel.org/netdev/9e757616-0396-4573-9ea9-3cb5ef5c901a@ans.pl/
>>>
>>> Krzysztof, are you going to submit the ethtool and mlx4 patches?
> 
> Yes, and I apologize for the delay - I have been traveling with my family and was unable to get into it.
> 
> I should be able to work on the patches later this week, so please expect something from me around the weekend.
> 
>>>> From 3144fbfc08fbfb90ecda4848fc9356bde8933d4a Mon Sep 17 00:00:00 2001
>>>> From: Dan Merillat <git@dan.eginity.com>
>>>> Date: Sun, 30 Jun 2024 13:11:51 -0400
>>>> Subject: [PATCH] Some qsfp modules do not support page 3
>>>>
>>>> Tested on an older Kaiam XQX2502 40G-LR4 module.
>>>> ethtool -m aborts with netlink error due to page 3
>>>> not existing on the module. Ignore the error and
>>>> leave map->page_03h NULL.

BTW - the code change does what we have discussed, but I think the comment
may be incorrect? Before we call nl_get_eeprom_page, there is a check to
verify that Page 03h is present:

        /* Page 03h is only present when the module memory model is paged and
         * not flat.
         */
        if (map->lower_memory[SFF8636_STATUS_2_OFFSET] &
            SFF8636_STATUS_PAGE_3_PRESENT)
                return 0;

In this case, it seems to me that the failure can only be caused by either
HW issues (NIC or SFP) *or* a bug in the driver. Assuming we want to provide
some details in the code, maybe something like this may be better?

+	/* Page 03h is not available due to either HW issue or a bug
+	 * in the driver. This is a non-fatal error and sff8636_dom_parse()
+	 * handles this correctly.
+	 */

We were also discussing if printing a warning in such situation may make sense.
As I was thinking about this more, I wonder if we can just use the same check
in sff8636_show_dom() and if map->page_03h is NULL print for
 "Alarm/warning flags implemented"
something like:
 "Failed (Page 03h access error, HW issue or kernel driver bug?)"

We would get it in addition to "netlink error: Invalid argument" that comes from:
./netlink/nlsock.c:             perror("netlink error");


>>> User space only tries to read this page because the module advertised it
>>> as supported. It is more likely that the NIC driver does not return all
>>> the pages. Which driver is it?
>>

Krzysztof


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ethtool fails to read some QSFP+ modules.
  2024-07-04 12:27       ` Krzysztof Olędzki
@ 2024-07-07  8:35         ` Ido Schimmel
  2024-07-08  3:42           ` Krzysztof Olędzki
  0 siblings, 1 reply; 8+ messages in thread
From: Ido Schimmel @ 2024-07-07  8:35 UTC (permalink / raw)
  To: Krzysztof Olędzki; +Cc: Dan Merillat, Michal Kubecek, netdev

On Thu, Jul 04, 2024 at 05:27:49AM -0700, Krzysztof Olędzki wrote:
> On 03.07.2024 at 07:36, Krzysztof Olędzki wrote:
> > Good morning,
> > 
> > On 01.07.2024 at 00:41, Ido Schimmel wrote:
> >> Forgot to add Krzysztof :p
> >>
> >> On Mon, Jul 01, 2024 at 10:28:39AM +0300, Ido Schimmel wrote:
> >>> On Sun, Jun 30, 2024 at 01:27:07PM -0400, Dan Merillat wrote:
> >>>>
> >>>> I was testing an older Kaiam XQX2502 40G-LR4 and ethtool -m failed with netlink error.  It's treating a failure to read
> >>>> the optional page3 data as a hard failure.
> >>>>
> >>>> This patch allows ethtool to read qsfp modules that don't implement the voltage/temperature alarm data.
> >>>
> >>> Thanks for the report and the patch. Krzysztof Olędzki reported the same
> >>> issue earlier this year:
> >>> https://lore.kernel.org/netdev/9e757616-0396-4573-9ea9-3cb5ef5c901a@ans.pl/
> >>>
> >>> Krzysztof, are you going to submit the ethtool and mlx4 patches?
> > 
> > Yes, and I apologize for the delay - I have been traveling with my family and was unable to get into it.
> > 
> > I should be able to work on the patches later this week, so please expect something from me around the weekend.
> > 
> >>>> From 3144fbfc08fbfb90ecda4848fc9356bde8933d4a Mon Sep 17 00:00:00 2001
> >>>> From: Dan Merillat <git@dan.eginity.com>
> >>>> Date: Sun, 30 Jun 2024 13:11:51 -0400
> >>>> Subject: [PATCH] Some qsfp modules do not support page 3
> >>>>
> >>>> Tested on an older Kaiam XQX2502 40G-LR4 module.
> >>>> ethtool -m aborts with netlink error due to page 3
> >>>> not existing on the module. Ignore the error and
> >>>> leave map->page_03h NULL.
> 
> BTW - the code change does what we have discussed, but I think the comment
> may be incorrect? Before we call nl_get_eeprom_page, there is a check to
> verify that Page 03h is present:
> 
>         /* Page 03h is only present when the module memory model is paged and
>          * not flat.
>          */
>         if (map->lower_memory[SFF8636_STATUS_2_OFFSET] &
>             SFF8636_STATUS_PAGE_3_PRESENT)
>                 return 0;
> 
> In this case, it seems to me that the failure can only be caused by either
> HW issues (NIC or SFP) *or* a bug in the driver. Assuming we want to provide
> some details in the code, maybe something like this may be better?
> 
> +	/* Page 03h is not available due to either HW issue or a bug
> +	 * in the driver. This is a non-fatal error and sff8636_dom_parse()
> +	 * handles this correctly.
> +	 */

Looks fine to me although I believe an error in this case will always be
returned because of a driver bug. Reading a page that does not exist
should not result in an error, but in the module returning Upper Page
00h. Yet to encounter a module that works in a different way. From
SFF-8636 Section 6.1:

"Writing the value of a non-supported page shall not be accepted by the
slave. The Page Select byte shall revert to 0h and read/write operations
shall be to Upper Page 00h. Because Upper Page 00h is read-only, this
scheme prevents the inadvertent corruption of module memory by a host
attempting to write to a non-supported location."

> 
> We were also discussing if printing a warning in such situation may make sense.
> As I was thinking about this more, I wonder if we can just use the same check
> in sff8636_show_dom() and if map->page_03h is NULL print for
>  "Alarm/warning flags implemented"
> something like:
>  "Failed (Page 03h access error, HW issue or kernel driver bug?)"
> 
> We would get it in addition to "netlink error: Invalid argument" that comes from:
> ./netlink/nlsock.c:             perror("netlink error");

I think it's better to print it in sff8636_memory_map_init_pages() as
that way the user can more easily understand the reason for "netlink
error: Invalid argument":

# ./ethtool -m swp13                                                                                                                                                                                                                                                                 
netlink error: Invalid argument
Failed to read Upper Page 03h
        Identifier                                : 0x11 (QSFP28)
        Extended identifier                       : 0xcf
[...]

BTW, just to be sure, you are going to post patches for both ethtool and
mlx4, right?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ethtool fails to read some QSFP+ modules.
  2024-07-07  8:35         ` Ido Schimmel
@ 2024-07-08  3:42           ` Krzysztof Olędzki
  0 siblings, 0 replies; 8+ messages in thread
From: Krzysztof Olędzki @ 2024-07-08  3:42 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Dan Merillat, Michal Kubecek, netdev

On 07.07.2024 at 01:35, Ido Schimmel wrote:
> On Thu, Jul 04, 2024 at 05:27:49AM -0700, Krzysztof Olędzki wrote:
>> On 03.07.2024 at 07:36, Krzysztof Olędzki wrote:
>>> Good morning,
>>>
>>> On 01.07.2024 at 00:41, Ido Schimmel wrote:
>>>> Forgot to add Krzysztof :p
>>>>
>>>> On Mon, Jul 01, 2024 at 10:28:39AM +0300, Ido Schimmel wrote:
>>>>> On Sun, Jun 30, 2024 at 01:27:07PM -0400, Dan Merillat wrote:
>>>>>>
>>>>>> I was testing an older Kaiam XQX2502 40G-LR4 and ethtool -m failed with netlink error.  It's treating a failure to read
>>>>>> the optional page3 data as a hard failure.
>>>>>>
>>>>>> This patch allows ethtool to read qsfp modules that don't implement the voltage/temperature alarm data.
>>>>>
>>>>> Thanks for the report and the patch. Krzysztof Olędzki reported the same
>>>>> issue earlier this year:
>>>>> https://lore.kernel.org/netdev/9e757616-0396-4573-9ea9-3cb5ef5c901a@ans.pl/
>>>>>
>>>>> Krzysztof, are you going to submit the ethtool and mlx4 patches?
>>>
>>> Yes, and I apologize for the delay - I have been traveling with my family and was unable to get into it.
>>>
>>> I should be able to work on the patches later this week, so please expect something from me around the weekend.
>>>
>>>>>> From 3144fbfc08fbfb90ecda4848fc9356bde8933d4a Mon Sep 17 00:00:00 2001
>>>>>> From: Dan Merillat <git@dan.eginity.com>
>>>>>> Date: Sun, 30 Jun 2024 13:11:51 -0400
>>>>>> Subject: [PATCH] Some qsfp modules do not support page 3
>>>>>>
>>>>>> Tested on an older Kaiam XQX2502 40G-LR4 module.
>>>>>> ethtool -m aborts with netlink error due to page 3
>>>>>> not existing on the module. Ignore the error and
>>>>>> leave map->page_03h NULL.
>>
>> BTW - the code change does what we have discussed, but I think the comment
>> may be incorrect? Before we call nl_get_eeprom_page, there is a check to
>> verify that Page 03h is present:
>>
>>         /* Page 03h is only present when the module memory model is paged and
>>          * not flat.
>>          */
>>         if (map->lower_memory[SFF8636_STATUS_2_OFFSET] &
>>             SFF8636_STATUS_PAGE_3_PRESENT)
>>                 return 0;
>>
>> In this case, it seems to me that the failure can only be caused by either
>> HW issues (NIC or SFP) *or* a bug in the driver. Assuming we want to provide
>> some details in the code, maybe something like this may be better?
>>
>> +	/* Page 03h is not available due to either HW issue or a bug
>> +	 * in the driver. This is a non-fatal error and sff8636_dom_parse()
>> +	 * handles this correctly.
>> +	 */
> 
> Looks fine to me although I believe an error in this case will always be
> returned because of a driver bug. Reading a page that does not exist
> should not result in an error, but in the module returning Upper Page
> 00h. Yet to encounter a module that works in a different way. From
> SFF-8636 Section 6.1:
> 
> "Writing the value of a non-supported page shall not be accepted by the
> slave. The Page Select byte shall revert to 0h and read/write operations
> shall be to Upper Page 00h. Because Upper Page 00h is read-only, this
> scheme prevents the inadvertent corruption of module memory by a host
> attempting to write to a non-supported location."

Fair, I updated the comment based on your feedback.

>>
>> We were also discussing if printing a warning in such situation may make sense.
>> As I was thinking about this more, I wonder if we can just use the same check
>> in sff8636_show_dom() and if map->page_03h is NULL print for
>>  "Alarm/warning flags implemented"
>> something like:
>>  "Failed (Page 03h access error, HW issue or kernel driver bug?)"
>>
>> We would get it in addition to "netlink error: Invalid argument" that comes from:
>> ./netlink/nlsock.c:             perror("netlink error");
> 
> I think it's better to print it in sff8636_memory_map_init_pages() as
> that way the user can more easily understand the reason for "netlink
> error: Invalid argument":
> 
> # ./ethtool -m swp13                                                                                                                                                                                                                                                                 
> netlink error: Invalid argument
> Failed to read Upper Page 03h
>         Identifier                                : 0x11 (QSFP28)
>         Extended identifier                       : 0xcf
> [...]

Great idea, done.


> BTW, just to be sure, you are going to post patches for both ethtool and
> mlx4, right?

Yes, I just sent both.
 
Krzysztof

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-07-08  3:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-30 17:27 ethtool fails to read some QSFP+ modules Dan Merillat
2024-07-01  7:28 ` Ido Schimmel
2024-07-01  7:41   ` Ido Schimmel
2024-07-03 14:36     ` Krzysztof Olędzki
2024-07-04 12:27       ` Krzysztof Olędzki
2024-07-07  8:35         ` Ido Schimmel
2024-07-08  3:42           ` Krzysztof Olędzki
  -- strict thread matches above, loose matches on Subject: below --
2024-07-02 19:30 Dan Merillat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).