public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@nvidia.com>
To: "Krzysztof Olędzki" <ole@ans.pl>
Cc: Andrew Lunn <andrew@lunn.ch>, Michal Kubecek <mkubecek@suse.cz>,
	Moshe Shemesh <moshe@nvidia.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	tariqt@nvidia.com
Subject: Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink?
Date: Thu, 23 May 2024 13:48:43 +0300	[thread overview]
Message-ID: <Zk8fC-wZlLT6hSKl@shredder> (raw)
In-Reply-To: <1bee73de-d4c3-456d-8cee-f76eee7194b0@ans.pl>

On Wed, May 22, 2024 at 10:29:43PM -0700, Krzysztof Olędzki wrote:
> On 22.05.2024 at 05:44, Andrew Lunn wrote:
> >>> So right, the function returns 512 for SFP and 256 for everything else, which explains why SFP does work but QSFP - not.
> >>
> >> Since you already did all the work and you are able to test patches, do
> >> you want to fix it yourself and submit or report to the mlx4 maintainer
> >> (copied)? Fix should be similar to mlx5 commit a708fb7b1f8d ("net/mlx5e:
> >> ethtool, Add support for EEPROM high pages query").
> 
> Oh, thank you so much for the pointer! Turns out, it was way easier than I thought:
> 
> # ethtool -m eth2

[..]

> Looks like all we need to do is:
> 
> --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c        2024-04-17 02:19:38.000000000 -0700
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c        2024-05-22 12:46:57.290947887 -0700
> @@ -2055,15 +2055,15 @@
>         switch (data[0] /* identifier */) {
>         case MLX4_MODULE_ID_QSFP:
>                 modinfo->type = ETH_MODULE_SFF_8436;
> -               modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
> +               modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
>                 break;
>         case MLX4_MODULE_ID_QSFP_PLUS:
>                 if (data[1] >= 0x3) { /* revision id */
>                         modinfo->type = ETH_MODULE_SFF_8636;
> -                       modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
> +                       modinfo->eeprom_len = ETH_MODULE_SFF_8636_MAX_LEN;
>                 } else {
>                         modinfo->type = ETH_MODULE_SFF_8436;
> -                       modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
> +                       modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
>                 }
>                 break;
>         case MLX4_MODULE_ID_QSFP28:

Need to update QSFP28 to use ETH_MODULE_SFF_8636_MAX_LEN as well. Looks
OK otherwise.

> 
> If I'm not mistaken, the rest of the logic is already there, such as:
> 
> static void mlx4_qsfp_eeprom_params_set(u8 *i2c_addr, u8 *page_num, u16 *offset)
> {
>         /* Offsets 0-255 belong to page 0.
>          * Offsets 256-639 belong to pages 01, 02, 03.
>          * For example, offset 400 is page 02: 1 + (400 - 256) / 128 = 2
>          */
>         if (*offset < I2C_PAGE_SIZE)
>                 *page_num = 0;
>         else
>                 *page_num = 1 + (*offset - I2C_PAGE_SIZE) / I2C_HIGH_PAGE_SIZE;
>         *i2c_addr = I2C_ADDR_LOW;
>         *offset -= *page_num * I2C_HIGH_PAGE_SIZE;
> }
> 
> So, we don't need to make as many changes as in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a708fb7b1f8dcc7a8ed949839958cd5d812dd939

OK.

> > Before you do that, please could you work on ethtool. I would say it
> > has a bug. It has been provided with 256 bytes of SPF data. It should
> > be able to decode that and print it in human readable format. So the
> > EINVAL should not be considered fatal to decoding.
> 
> Yes, I was also thinking this way. Luckily, similar to the situation with the mlx4 driver, all the logic is there - sff8636_dom_parse() checks if map->page_03h is set and if not, just returns gracefully.
> 
> So, all we need to do is modify sff8636_memory_map_init_pages():
> @@ -1038,7 +1039,7 @@
>         sff8636_request_init(&request, 0x3, SFF8636_PAGE_SIZE);
>         ret = nl_get_eeprom_page(ctx, &request);
>         if (ret < 0)
> -               return ret;
> +               return 0;
>         map->page_03h = request.data - SFF8636_PAGE_SIZE;
> 
>         return 0;
> 
> As you described, we get all the data except the DOM:

[...]

> Do you think it would make sense to print a warning in such situation, or just handle this silently?

Yes, I suggest adding some kind of warning. For example:

"Page 03h could not be retrieved"

Otherwise we would be papering over bugs like the above.

We don't need the same treatment in the CMIS parser because drivers
supporting CMIS modules support the get_module_eeprom_by_page() ethtool
operation and don't go via the fallback path.

> Finally, as I was looking at the code in fallback_set_params() I started thinking if the length check is actually correct?
> 
> I think instead of:
>  if (offset >= modinfo->eeprom_len)
> we may want:
>  if (offset + length > modinfo->eeprom_len)
> 
> I don't know if it is safe to assume we always read a single page and cross page reads are not allowed and even if so, that we should rely on this instead of checking the len explicitly? What do you think?

Yea, it seems the current check only checks that we do not start to read
after the EEPROM boundary, but not that we finish reading before it as
well.

> Once I hear from y'all I will prepare the patches.

Thanks!

  parent reply	other threads:[~2024-05-23 10:48 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-21  6:26 "netlink error: Invalid argument" with ethtool-5.13+ on recent kernels due to "ethtool: Add netlink handler for getmodule (-m)" - 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via netlink? Krzysztof Olędzki
2024-05-21  6:55 ` Michal Kubecek
2024-05-21  7:02   ` Krzysztof Olędzki
2024-05-21  7:16     ` Krzysztof Olędzki
2024-05-21  7:34     ` Michal Kubecek
2024-05-21  7:38       ` Krzysztof Olędzki
2024-05-21 20:21         ` Andrew Lunn
2024-05-22  4:54           ` Krzysztof Olędzki
2024-05-22  8:40             ` Ido Schimmel
2024-05-22 12:44               ` Andrew Lunn
2024-05-23  5:29                 ` Krzysztof Olędzki
2024-05-23 10:37                   ` Michal Kubecek
2024-05-23 10:48                   ` Ido Schimmel [this message]
2024-05-23 15:35                   ` Andrew Lunn
2024-07-08  3:41                     ` [PATCH] net/mlx4: Add support for EEPROM high pages query for QSFP/QSFP+/QSFP28 Krzysztof Olędzki
2024-07-08 16:28                       ` Ido Schimmel
2024-07-09 11:17                       ` Dan Merillat
2024-07-08  3:41                     ` [PATCH] qsfp: Better handling of Page 03h netlink read failure Krzysztof Olędzki
2024-07-08 16:12                       ` Ido Schimmel
2024-07-31  0:55                         ` Krzysztof Olędzki
2024-07-31  8:48                           ` Ido Schimmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zk8fC-wZlLT6hSKl@shredder \
    --to=idosch@nvidia.com \
    --cc=andrew@lunn.ch \
    --cc=mkubecek@suse.cz \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=ole@ans.pl \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox