public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* ethtool 5.16 release / ethtool -m bug fix
@ 2022-01-18 22:51 Jakub Kicinski
  2022-01-18 23:14 ` Michal Kubecek
  2022-01-19  9:45 ` Ido Schimmel
  0 siblings, 2 replies; 7+ messages in thread
From: Jakub Kicinski @ 2022-01-18 22:51 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Ido Schimmel, netdev@vger.kernel.org, michel, dcavalca

Hi Michal!

Sorry to hasten but I'm wondering if there is a plan to cut the 5.16
ethtool release? Looks like there is a problem in SFP EEPROM parsing
code, at least with QSFP28s, user space always requests page 3 now.
This ends in an -EINVAL (at least for drivers not supporting the paged
mode).

By the looks of it - Ido fixed this in 6e2b32a0d0ea ("sff-8636: Request
specific pages for parsing in netlink path") but it may be too much code 
to backport so I'm thinking it's easiest for distros to move to v5.16.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ethtool 5.16 release / ethtool -m bug fix
  2022-01-18 22:51 ethtool 5.16 release / ethtool -m bug fix Jakub Kicinski
@ 2022-01-18 23:14 ` Michal Kubecek
  2022-01-18 23:54   ` Jakub Kicinski
  2022-01-19  9:45 ` Ido Schimmel
  1 sibling, 1 reply; 7+ messages in thread
From: Michal Kubecek @ 2022-01-18 23:14 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Ido Schimmel, netdev@vger.kernel.org, michel, dcavalca

On Tue, Jan 18, 2022 at 02:51:59PM -0800, Jakub Kicinski wrote:
> Hi Michal!
> 
> Sorry to hasten but I'm wondering if there is a plan to cut the 5.16
> ethtool release? Looks like there is a problem in SFP EEPROM parsing
> code, at least with QSFP28s, user space always requests page 3 now.
> This ends in an -EINVAL (at least for drivers not supporting the paged
> mode).
> 
> By the looks of it - Ido fixed this in 6e2b32a0d0ea ("sff-8636: Request
> specific pages for parsing in netlink path") but it may be too much code 
> to backport so I'm thinking it's easiest for distros to move to v5.16.

Accidentally, I'm working on it right now. I need to run few more tests,
should be done in 20-30 minutes (if there are no complications).

Michal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ethtool 5.16 release / ethtool -m bug fix
  2022-01-18 23:14 ` Michal Kubecek
@ 2022-01-18 23:54   ` Jakub Kicinski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2022-01-18 23:54 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Ido Schimmel, netdev@vger.kernel.org, michel, dcavalca

On Wed, 19 Jan 2022 00:14:23 +0100 Michal Kubecek wrote:
> On Tue, Jan 18, 2022 at 02:51:59PM -0800, Jakub Kicinski wrote:
> > Hi Michal!
> > 
> > Sorry to hasten but I'm wondering if there is a plan to cut the 5.16
> > ethtool release? Looks like there is a problem in SFP EEPROM parsing
> > code, at least with QSFP28s, user space always requests page 3 now.
> > This ends in an -EINVAL (at least for drivers not supporting the paged
> > mode).
> > 
> > By the looks of it - Ido fixed this in 6e2b32a0d0ea ("sff-8636: Request
> > specific pages for parsing in netlink path") but it may be too much code 
> > to backport so I'm thinking it's easiest for distros to move to v5.16.  
> 
> Accidentally, I'm working on it right now. I need to run few more tests,
> should be done in 20-30 minutes (if there are no complications).

Perfect! That's lucky or unlucky depending on how you look at it :)

I see the tag already, thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ethtool 5.16 release / ethtool -m bug fix
  2022-01-18 22:51 ethtool 5.16 release / ethtool -m bug fix Jakub Kicinski
  2022-01-18 23:14 ` Michal Kubecek
@ 2022-01-19  9:45 ` Ido Schimmel
  2022-01-19 15:39   ` Jakub Kicinski
  1 sibling, 1 reply; 7+ messages in thread
From: Ido Schimmel @ 2022-01-19  9:45 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Michal Kubecek, netdev@vger.kernel.org, michel, dcavalca

On Tue, Jan 18, 2022 at 02:51:59PM -0800, Jakub Kicinski wrote:
> Hi Michal!
> 
> Sorry to hasten but I'm wondering if there is a plan to cut the 5.16
> ethtool release? Looks like there is a problem in SFP EEPROM parsing
> code, at least with QSFP28s, user space always requests page 3 now.
> This ends in an -EINVAL (at least for drivers not supporting the paged
> mode).

Jakub, are you sure you are dealing with QSFP and not SFP? I'm asking
because I assume the driver in question is mlx5 that has this code in
its implementation of get_module_eeprom_by_page():

```
switch (module_id) {
case MLX5_MODULE_ID_SFP:
	if (params->page > 0)
		return -EINVAL;
	break;
```

And indeed, ethtool(8) commit fc47fdb7c364 ("ethtool: Refactor
human-readable module EEPROM output for new API") always asks for Upper
Page 03h, regardless of the module type.

It is not optimal for ethtool(8) to ask for unsupported pages and I made
sure it's not doing it anymore, but I believe it's wrong for the kernel
to return an error. All the specifications that I'm aware of mandate
that when an unsupported page is requested, the Page Select byte will
revert to 0. That is why Upper Page 00h is always read-only.

For reference, see section 10.3 in SFF-8472, section 6.2.11 in SFF-8636
and section 8.2.13 in CMIS.

Also, the entire point of the netlink interface is that the kernel can
remain ignorant of the EEPROM layout and keep all the logic in user
space.

> 
> By the looks of it - Ido fixed this in 6e2b32a0d0ea ("sff-8636: Request
> specific pages for parsing in netlink path") but it may be too much code 
> to backport so I'm thinking it's easiest for distros to move to v5.16.

I did target fixes at 'ethtool' and features at 'ethtool-next', but I
wasn't aware of this bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ethtool 5.16 release / ethtool -m bug fix
  2022-01-19  9:45 ` Ido Schimmel
@ 2022-01-19 15:39   ` Jakub Kicinski
  2022-01-20  7:54     ` Ido Schimmel
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2022-01-19 15:39 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Michal Kubecek, netdev@vger.kernel.org, michel, dcavalca,
	Andrew Lunn

On Wed, 19 Jan 2022 11:45:41 +0200 Ido Schimmel wrote:
> On Tue, Jan 18, 2022 at 02:51:59PM -0800, Jakub Kicinski wrote:
> > Hi Michal!
> > 
> > Sorry to hasten but I'm wondering if there is a plan to cut the 5.16
> > ethtool release? Looks like there is a problem in SFP EEPROM parsing
> > code, at least with QSFP28s, user space always requests page 3 now.
> > This ends in an -EINVAL (at least for drivers not supporting the paged
> > mode).  
> 
> Jakub, are you sure you are dealing with QSFP and not SFP? I'm asking
> because I assume the driver in question is mlx5 that has this code in
> its implementation of get_module_eeprom_by_page():
> 
> ```
> switch (module_id) {
> case MLX5_MODULE_ID_SFP:
> 	if (params->page > 0)
> 		return -EINVAL;
> 	break;
> ```

Yup, it's a QSFP28 / SFF-8636, the report was with a different NIC.

> And indeed, ethtool(8) commit fc47fdb7c364 ("ethtool: Refactor
> human-readable module EEPROM output for new API") always asks for Upper
> Page 03h, regardless of the module type.
> 
> It is not optimal for ethtool(8) to ask for unsupported pages and I made
> sure it's not doing it anymore, but I believe it's wrong for the kernel
> to return an error. All the specifications that I'm aware of mandate
> that when an unsupported page is requested, the Page Select byte will
> revert to 0. That is why Upper Page 00h is always read-only.
> 
> For reference, see section 10.3 in SFF-8472, section 6.2.11 in SFF-8636
> and section 8.2.13 in CMIS.
> 
> Also, the entire point of the netlink interface is that the kernel can
> remain ignorant of the EEPROM layout and keep all the logic in user
> space.

The EINVAL came from fallback_set_params().

As far as I can see user space will call sff8636_show_dom() regardless
of what we return from the kernel, so returning first page again should
not confuse anything.. as long as the fields read from page 3 happen to
be 0 in page 0?

What about drivers which do implement get_module_eeprom_by_page? Can we
somehow ensure they DTRT and are consistent with the legacy / flat API?

> > By the looks of it - Ido fixed this in 6e2b32a0d0ea ("sff-8636: Request
> > specific pages for parsing in netlink path") but it may be too much code 
> > to backport so I'm thinking it's easiest for distros to move to v5.16.  
> 
> I did target fixes at 'ethtool' and features at 'ethtool-next', but I
> wasn't aware of this bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ethtool 5.16 release / ethtool -m bug fix
  2022-01-19 15:39   ` Jakub Kicinski
@ 2022-01-20  7:54     ` Ido Schimmel
  2022-01-20 17:30       ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Ido Schimmel @ 2022-01-20  7:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Michal Kubecek, netdev@vger.kernel.org, michel, dcavalca,
	Andrew Lunn

On Wed, Jan 19, 2022 at 07:39:02AM -0800, Jakub Kicinski wrote:
> On Wed, 19 Jan 2022 11:45:41 +0200 Ido Schimmel wrote:
> > On Tue, Jan 18, 2022 at 02:51:59PM -0800, Jakub Kicinski wrote:
> > > Hi Michal!
> > > 
> > > Sorry to hasten but I'm wondering if there is a plan to cut the 5.16
> > > ethtool release? Looks like there is a problem in SFP EEPROM parsing
> > > code, at least with QSFP28s, user space always requests page 3 now.
> > > This ends in an -EINVAL (at least for drivers not supporting the paged
> > > mode).  
> > 
> > Jakub, are you sure you are dealing with QSFP and not SFP? I'm asking
> > because I assume the driver in question is mlx5 that has this code in
> > its implementation of get_module_eeprom_by_page():
> > 
> > ```
> > switch (module_id) {
> > case MLX5_MODULE_ID_SFP:
> > 	if (params->page > 0)
> > 		return -EINVAL;
> > 	break;
> > ```
> 
> Yup, it's a QSFP28 / SFF-8636, the report was with a different NIC.
> 
> > And indeed, ethtool(8) commit fc47fdb7c364 ("ethtool: Refactor
> > human-readable module EEPROM output for new API") always asks for Upper
> > Page 03h, regardless of the module type.
> > 
> > It is not optimal for ethtool(8) to ask for unsupported pages and I made
> > sure it's not doing it anymore, but I believe it's wrong for the kernel
> > to return an error. All the specifications that I'm aware of mandate
> > that when an unsupported page is requested, the Page Select byte will
> > revert to 0. That is why Upper Page 00h is always read-only.
> > 
> > For reference, see section 10.3 in SFF-8472, section 6.2.11 in SFF-8636
> > and section 8.2.13 in CMIS.
> > 
> > Also, the entire point of the netlink interface is that the kernel can
> > remain ignorant of the EEPROM layout and keep all the logic in user
> > space.
> 
> The EINVAL came from fallback_set_params().

I see. I was fixated on get_module_eeprom_by_page(), but you are using
the fallback.

> 
> As far as I can see user space will call sff8636_show_dom() regardless
> of what we return from the kernel, so returning first page again should
> not confuse anything.. as long as the fields read from page 3 happen to
> be 0 in page 0?

sff8636_show_dom() parses and prints module-level and channel-level
thresholds from page 3. It will not try to parse or print this
information if the page isn't available. This is determined based on the
'Flat_mem' bit in the lower page.

> 
> What about drivers which do implement get_module_eeprom_by_page? Can we
> somehow ensure they DTRT and are consistent with the legacy / flat API?

Not sure what you mean by that (I believe they are already doing the
right thing). Life is much easier for drivers that implement
get_module_eeprom_by_page() because they only need to fetch the
information user space is asking for. They need not perform any parsing
of the data, unlike in the legacy callbacks.

> 
> > > By the looks of it - Ido fixed this in 6e2b32a0d0ea ("sff-8636: Request
> > > specific pages for parsing in netlink path") but it may be too much code 
> > > to backport so I'm thinking it's easiest for distros to move to v5.16.  
> > 
> > I did target fixes at 'ethtool' and features at 'ethtool-next', but I
> > wasn't aware of this bug.
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ethtool 5.16 release / ethtool -m bug fix
  2022-01-20  7:54     ` Ido Schimmel
@ 2022-01-20 17:30       ` Jakub Kicinski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2022-01-20 17:30 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Michal Kubecek, netdev@vger.kernel.org, michel, dcavalca,
	Andrew Lunn

On Thu, 20 Jan 2022 09:54:21 +0200 Ido Schimmel wrote:
> > What about drivers which do implement get_module_eeprom_by_page? Can we
> > somehow ensure they DTRT and are consistent with the legacy / flat API?  
> 
> Not sure what you mean by that (I believe they are already doing the
> right thing). Life is much easier for drivers that implement
> get_module_eeprom_by_page() because they only need to fetch the
> information user space is asking for. They need not perform any parsing
> of the data, unlike in the legacy callbacks.

I see, my concern was that overzealous driver or FW may try to validate
the page number rather than just performing the read.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-20 17:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-18 22:51 ethtool 5.16 release / ethtool -m bug fix Jakub Kicinski
2022-01-18 23:14 ` Michal Kubecek
2022-01-18 23:54   ` Jakub Kicinski
2022-01-19  9:45 ` Ido Schimmel
2022-01-19 15:39   ` Jakub Kicinski
2022-01-20  7:54     ` Ido Schimmel
2022-01-20 17:30       ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox