* ethtool 5.16 release / ethtool -m bug fix
@ 2022-01-18 22:51 Jakub Kicinski
2022-01-18 23:14 ` Michal Kubecek
2022-01-19 9:45 ` Ido Schimmel
0 siblings, 2 replies; 7+ messages in thread
From: Jakub Kicinski @ 2022-01-18 22:51 UTC (permalink / raw)
To: Michal Kubecek; +Cc: Ido Schimmel, netdev@vger.kernel.org, michel, dcavalca
Hi Michal!
Sorry to hasten but I'm wondering if there is a plan to cut the 5.16
ethtool release? Looks like there is a problem in SFP EEPROM parsing
code, at least with QSFP28s, user space always requests page 3 now.
This ends in an -EINVAL (at least for drivers not supporting the paged
mode).
By the looks of it - Ido fixed this in 6e2b32a0d0ea ("sff-8636: Request
specific pages for parsing in netlink path") but it may be too much code
to backport so I'm thinking it's easiest for distros to move to v5.16.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: ethtool 5.16 release / ethtool -m bug fix 2022-01-18 22:51 ethtool 5.16 release / ethtool -m bug fix Jakub Kicinski @ 2022-01-18 23:14 ` Michal Kubecek 2022-01-18 23:54 ` Jakub Kicinski 2022-01-19 9:45 ` Ido Schimmel 1 sibling, 1 reply; 7+ messages in thread From: Michal Kubecek @ 2022-01-18 23:14 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Ido Schimmel, netdev@vger.kernel.org, michel, dcavalca On Tue, Jan 18, 2022 at 02:51:59PM -0800, Jakub Kicinski wrote: > Hi Michal! > > Sorry to hasten but I'm wondering if there is a plan to cut the 5.16 > ethtool release? Looks like there is a problem in SFP EEPROM parsing > code, at least with QSFP28s, user space always requests page 3 now. > This ends in an -EINVAL (at least for drivers not supporting the paged > mode). > > By the looks of it - Ido fixed this in 6e2b32a0d0ea ("sff-8636: Request > specific pages for parsing in netlink path") but it may be too much code > to backport so I'm thinking it's easiest for distros to move to v5.16. Accidentally, I'm working on it right now. I need to run few more tests, should be done in 20-30 minutes (if there are no complications). Michal ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ethtool 5.16 release / ethtool -m bug fix 2022-01-18 23:14 ` Michal Kubecek @ 2022-01-18 23:54 ` Jakub Kicinski 0 siblings, 0 replies; 7+ messages in thread From: Jakub Kicinski @ 2022-01-18 23:54 UTC (permalink / raw) To: Michal Kubecek; +Cc: Ido Schimmel, netdev@vger.kernel.org, michel, dcavalca On Wed, 19 Jan 2022 00:14:23 +0100 Michal Kubecek wrote: > On Tue, Jan 18, 2022 at 02:51:59PM -0800, Jakub Kicinski wrote: > > Hi Michal! > > > > Sorry to hasten but I'm wondering if there is a plan to cut the 5.16 > > ethtool release? Looks like there is a problem in SFP EEPROM parsing > > code, at least with QSFP28s, user space always requests page 3 now. > > This ends in an -EINVAL (at least for drivers not supporting the paged > > mode). > > > > By the looks of it - Ido fixed this in 6e2b32a0d0ea ("sff-8636: Request > > specific pages for parsing in netlink path") but it may be too much code > > to backport so I'm thinking it's easiest for distros to move to v5.16. > > Accidentally, I'm working on it right now. I need to run few more tests, > should be done in 20-30 minutes (if there are no complications). Perfect! That's lucky or unlucky depending on how you look at it :) I see the tag already, thanks! ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ethtool 5.16 release / ethtool -m bug fix 2022-01-18 22:51 ethtool 5.16 release / ethtool -m bug fix Jakub Kicinski 2022-01-18 23:14 ` Michal Kubecek @ 2022-01-19 9:45 ` Ido Schimmel 2022-01-19 15:39 ` Jakub Kicinski 1 sibling, 1 reply; 7+ messages in thread From: Ido Schimmel @ 2022-01-19 9:45 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Michal Kubecek, netdev@vger.kernel.org, michel, dcavalca On Tue, Jan 18, 2022 at 02:51:59PM -0800, Jakub Kicinski wrote: > Hi Michal! > > Sorry to hasten but I'm wondering if there is a plan to cut the 5.16 > ethtool release? Looks like there is a problem in SFP EEPROM parsing > code, at least with QSFP28s, user space always requests page 3 now. > This ends in an -EINVAL (at least for drivers not supporting the paged > mode). Jakub, are you sure you are dealing with QSFP and not SFP? I'm asking because I assume the driver in question is mlx5 that has this code in its implementation of get_module_eeprom_by_page(): ``` switch (module_id) { case MLX5_MODULE_ID_SFP: if (params->page > 0) return -EINVAL; break; ``` And indeed, ethtool(8) commit fc47fdb7c364 ("ethtool: Refactor human-readable module EEPROM output for new API") always asks for Upper Page 03h, regardless of the module type. It is not optimal for ethtool(8) to ask for unsupported pages and I made sure it's not doing it anymore, but I believe it's wrong for the kernel to return an error. All the specifications that I'm aware of mandate that when an unsupported page is requested, the Page Select byte will revert to 0. That is why Upper Page 00h is always read-only. For reference, see section 10.3 in SFF-8472, section 6.2.11 in SFF-8636 and section 8.2.13 in CMIS. Also, the entire point of the netlink interface is that the kernel can remain ignorant of the EEPROM layout and keep all the logic in user space. > > By the looks of it - Ido fixed this in 6e2b32a0d0ea ("sff-8636: Request > specific pages for parsing in netlink path") but it may be too much code > to backport so I'm thinking it's easiest for distros to move to v5.16. I did target fixes at 'ethtool' and features at 'ethtool-next', but I wasn't aware of this bug. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ethtool 5.16 release / ethtool -m bug fix 2022-01-19 9:45 ` Ido Schimmel @ 2022-01-19 15:39 ` Jakub Kicinski 2022-01-20 7:54 ` Ido Schimmel 0 siblings, 1 reply; 7+ messages in thread From: Jakub Kicinski @ 2022-01-19 15:39 UTC (permalink / raw) To: Ido Schimmel Cc: Michal Kubecek, netdev@vger.kernel.org, michel, dcavalca, Andrew Lunn On Wed, 19 Jan 2022 11:45:41 +0200 Ido Schimmel wrote: > On Tue, Jan 18, 2022 at 02:51:59PM -0800, Jakub Kicinski wrote: > > Hi Michal! > > > > Sorry to hasten but I'm wondering if there is a plan to cut the 5.16 > > ethtool release? Looks like there is a problem in SFP EEPROM parsing > > code, at least with QSFP28s, user space always requests page 3 now. > > This ends in an -EINVAL (at least for drivers not supporting the paged > > mode). > > Jakub, are you sure you are dealing with QSFP and not SFP? I'm asking > because I assume the driver in question is mlx5 that has this code in > its implementation of get_module_eeprom_by_page(): > > ``` > switch (module_id) { > case MLX5_MODULE_ID_SFP: > if (params->page > 0) > return -EINVAL; > break; > ``` Yup, it's a QSFP28 / SFF-8636, the report was with a different NIC. > And indeed, ethtool(8) commit fc47fdb7c364 ("ethtool: Refactor > human-readable module EEPROM output for new API") always asks for Upper > Page 03h, regardless of the module type. > > It is not optimal for ethtool(8) to ask for unsupported pages and I made > sure it's not doing it anymore, but I believe it's wrong for the kernel > to return an error. All the specifications that I'm aware of mandate > that when an unsupported page is requested, the Page Select byte will > revert to 0. That is why Upper Page 00h is always read-only. > > For reference, see section 10.3 in SFF-8472, section 6.2.11 in SFF-8636 > and section 8.2.13 in CMIS. > > Also, the entire point of the netlink interface is that the kernel can > remain ignorant of the EEPROM layout and keep all the logic in user > space. The EINVAL came from fallback_set_params(). As far as I can see user space will call sff8636_show_dom() regardless of what we return from the kernel, so returning first page again should not confuse anything.. as long as the fields read from page 3 happen to be 0 in page 0? What about drivers which do implement get_module_eeprom_by_page? Can we somehow ensure they DTRT and are consistent with the legacy / flat API? > > By the looks of it - Ido fixed this in 6e2b32a0d0ea ("sff-8636: Request > > specific pages for parsing in netlink path") but it may be too much code > > to backport so I'm thinking it's easiest for distros to move to v5.16. > > I did target fixes at 'ethtool' and features at 'ethtool-next', but I > wasn't aware of this bug. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ethtool 5.16 release / ethtool -m bug fix 2022-01-19 15:39 ` Jakub Kicinski @ 2022-01-20 7:54 ` Ido Schimmel 2022-01-20 17:30 ` Jakub Kicinski 0 siblings, 1 reply; 7+ messages in thread From: Ido Schimmel @ 2022-01-20 7:54 UTC (permalink / raw) To: Jakub Kicinski Cc: Michal Kubecek, netdev@vger.kernel.org, michel, dcavalca, Andrew Lunn On Wed, Jan 19, 2022 at 07:39:02AM -0800, Jakub Kicinski wrote: > On Wed, 19 Jan 2022 11:45:41 +0200 Ido Schimmel wrote: > > On Tue, Jan 18, 2022 at 02:51:59PM -0800, Jakub Kicinski wrote: > > > Hi Michal! > > > > > > Sorry to hasten but I'm wondering if there is a plan to cut the 5.16 > > > ethtool release? Looks like there is a problem in SFP EEPROM parsing > > > code, at least with QSFP28s, user space always requests page 3 now. > > > This ends in an -EINVAL (at least for drivers not supporting the paged > > > mode). > > > > Jakub, are you sure you are dealing with QSFP and not SFP? I'm asking > > because I assume the driver in question is mlx5 that has this code in > > its implementation of get_module_eeprom_by_page(): > > > > ``` > > switch (module_id) { > > case MLX5_MODULE_ID_SFP: > > if (params->page > 0) > > return -EINVAL; > > break; > > ``` > > Yup, it's a QSFP28 / SFF-8636, the report was with a different NIC. > > > And indeed, ethtool(8) commit fc47fdb7c364 ("ethtool: Refactor > > human-readable module EEPROM output for new API") always asks for Upper > > Page 03h, regardless of the module type. > > > > It is not optimal for ethtool(8) to ask for unsupported pages and I made > > sure it's not doing it anymore, but I believe it's wrong for the kernel > > to return an error. All the specifications that I'm aware of mandate > > that when an unsupported page is requested, the Page Select byte will > > revert to 0. That is why Upper Page 00h is always read-only. > > > > For reference, see section 10.3 in SFF-8472, section 6.2.11 in SFF-8636 > > and section 8.2.13 in CMIS. > > > > Also, the entire point of the netlink interface is that the kernel can > > remain ignorant of the EEPROM layout and keep all the logic in user > > space. > > The EINVAL came from fallback_set_params(). I see. I was fixated on get_module_eeprom_by_page(), but you are using the fallback. > > As far as I can see user space will call sff8636_show_dom() regardless > of what we return from the kernel, so returning first page again should > not confuse anything.. as long as the fields read from page 3 happen to > be 0 in page 0? sff8636_show_dom() parses and prints module-level and channel-level thresholds from page 3. It will not try to parse or print this information if the page isn't available. This is determined based on the 'Flat_mem' bit in the lower page. > > What about drivers which do implement get_module_eeprom_by_page? Can we > somehow ensure they DTRT and are consistent with the legacy / flat API? Not sure what you mean by that (I believe they are already doing the right thing). Life is much easier for drivers that implement get_module_eeprom_by_page() because they only need to fetch the information user space is asking for. They need not perform any parsing of the data, unlike in the legacy callbacks. > > > > By the looks of it - Ido fixed this in 6e2b32a0d0ea ("sff-8636: Request > > > specific pages for parsing in netlink path") but it may be too much code > > > to backport so I'm thinking it's easiest for distros to move to v5.16. > > > > I did target fixes at 'ethtool' and features at 'ethtool-next', but I > > wasn't aware of this bug. > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ethtool 5.16 release / ethtool -m bug fix 2022-01-20 7:54 ` Ido Schimmel @ 2022-01-20 17:30 ` Jakub Kicinski 0 siblings, 0 replies; 7+ messages in thread From: Jakub Kicinski @ 2022-01-20 17:30 UTC (permalink / raw) To: Ido Schimmel Cc: Michal Kubecek, netdev@vger.kernel.org, michel, dcavalca, Andrew Lunn On Thu, 20 Jan 2022 09:54:21 +0200 Ido Schimmel wrote: > > What about drivers which do implement get_module_eeprom_by_page? Can we > > somehow ensure they DTRT and are consistent with the legacy / flat API? > > Not sure what you mean by that (I believe they are already doing the > right thing). Life is much easier for drivers that implement > get_module_eeprom_by_page() because they only need to fetch the > information user space is asking for. They need not perform any parsing > of the data, unlike in the legacy callbacks. I see, my concern was that overzealous driver or FW may try to validate the page number rather than just performing the read. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-01-20 17:30 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-01-18 22:51 ethtool 5.16 release / ethtool -m bug fix Jakub Kicinski 2022-01-18 23:14 ` Michal Kubecek 2022-01-18 23:54 ` Jakub Kicinski 2022-01-19 9:45 ` Ido Schimmel 2022-01-19 15:39 ` Jakub Kicinski 2022-01-20 7:54 ` Ido Schimmel 2022-01-20 17:30 ` Jakub Kicinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox