netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: saeedm@nvidia.com, tariqt@nvidia.com
Cc: Til Kaiser <mail@tk154.de>,
	leonro@nvidia.com, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
Subject: Re: [BUG] net/mlx5: missing sysfs hwmon entry for ConnectX-4 cards
Date: Tue, 15 Oct 2024 08:06:17 -0700	[thread overview]
Message-ID: <20241015080617.79e90a06@kernel.org> (raw)
In-Reply-To: <bc8ba1b7-e4ad-40b5-b69d-9ea1e7a18a40@tk154.de>

On Thu, 10 Oct 2024 17:11:21 +0200 Til Kaiser wrote:
> I noticed on our dual-port 100G ConnectX-4 cards (MT27700 Family) 
> running Linux Kernel version 6.6.56 and the latest ConnectX-4 firmware 
> version 12.28.2302 that we do not have a sysfs hwmon entry for reading 
> temperature values.
> When running Kernel version 6.6.32, the hwmon entry is there again, and 
> I can read the temperature values of those cards.
> Strangely, this problem doesn't occur on our ConnectX-4 Lx cards 
> (MT27710 Family), regardless of which Kernel version I use.
> 
> I looked into the mlx5 core driver and noticed that it is checking the 
> MCAM register here: 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c?h=v6.6.56#n380.
> When I removed that check, the hwmon entry reappeared again.
> 
> Looking into recent mlx5 commits regarding this MCAM register, I found 
> this commit: 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.6.56&id=fb035aa9a3f8fd327ab83b15a94929d2b9045995.
> When I reverted this commit, the hwmon entry also reappeared again.
> 
> I also found a firmware bug fix regarding that register inside the 
> ConnectX-4 Lx bug fix history here (Ref. 2339971): 
> https://docs.nvidia.com/networking/display/connectx4lxfirmwarev14321900/bug+fixes+history.
> I couldn't find such a firmware fix for the non-Lx ConnectX-4 cards. So, 
> I'm unsure whether this might be a mlx5 driver or firmware issue.

Hi, any word on this? Sounds like a fairly straightforward problem.

  reply	other threads:[~2024-10-15 15:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-10 15:11 [BUG] net/mlx5: missing sysfs hwmon entry for ConnectX-4 cards Til Kaiser
2024-10-15 15:06 ` Jakub Kicinski [this message]
2024-10-16  7:05   ` Tariq Toukan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241015080617.79e90a06@kernel.org \
    --to=kuba@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mail@tk154.de \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).